Gojko on Gherkin
Gojko Adzic held a super introduction to the Cucumber testing tool the other night at SkillsMatter. Below are some notes from the session. We’re considering taking up Cucumber in a couple of ways, including using it as part of our evaluation of new folks who want to join us as testing experts. (We want to know how good a candidate is likely to be at writing automated tests, and even if you have never done test automation, you should be able to write a Cucumber test as it’s just plain English. Or Welsh.)
The Gherkin building in London - thanks to Rudolf Schuba
When it first appeared, it seemed the main advantage of Cucumber was that you could use tables in addition to RSpec given/when/then, which didn’t seem that cool at first. But as Gojko’s worked on it, he’s found that it’s great for staying out of your way as you build your tests:
- Generates code for you
- Supports lists and tabular data
- Tagging for types of tests
- JUnit XML output, so easy to integrate with CI tools and friendly to non-Ruby languages
In fact it integrates with almost anything! Gojko showed a slide with over twenty different tools including Watir, Selenium, and many more.
The basic structure of a Cucumber test is plain text, like this:
Feature: Hello World
In order to ensure that my installation works
As a developer
I want to run a quick Cucumber test
Scenario: Hello World
Given The Action is Hello
When The Subject is World
Then The Greeting is Hello, World.
Cucumber forces you to use the good Given/When/Then structure that keeps a clear distinction among components:
- Your scenario is readable by humans (see above) and is scripted by…
- …the step definitions, which are in Ruby (or your favourite alternative language) and can be reused to test many…
- …domain classes, which do the actual work.
Your step definitions can be refactored along with the domain classes to help insulate the tests against code changes. Another way to say this is that you get to separate your high-level activities (the scenario) from your implementation of those activities (the step definitions).
You can write your scenario first with no step definitions and run Cucumber with no errors! Instead it will tell you that your tests are undefined and suggest how to start creating the appropriate step definitions. This makes real behaviour-driven development possible - write the scenario, then the step definitions, then finally get to your domain objects.
Cucumber will give you various output formats so you can see the red/green loveliness in PDF, XML, HTML, and more. It lets you use your favourite language including Java, Ruby, C#, and even LOLCODE!
You can avoid wordy repetition of the same scenario again and again in some clever ways, including scenario outlines (which let you use a compact set of rows to show a group of scenarios using a common format), lists of objects (which specify multiple values per Given/When/Then steps), and common setup (using a “Background” keyword).
Some limitations:
- Console only. But there are plenty of ways to run it in a GUI.
- No IDE support (yet). Apparently there is a VisualStudio plugin. Can Eclipse be far behind?
- No way to add images or other documentation to your test, as you can in Fit and Fitnesse.
You can get long-term maintenance problems if you don’t keep the semantics straight - your scenarios tell you *what* is being tested, not *how* it is done. Antony Marcano points out that you have to watch out for step definitions that drift from their titles - don’t let your “send message” step get defined to log or write to the database.
Math Is Hard, Let's Write Code!
I’ve come across two useful numerical methods for thinking about software development. You can stop reading if numbers frighten you (but then what are you doing reading a coding blog anyway)?
Dunbar’s Number
It’s 148. Let your organisation size grow larger than this number, says primate researcher Dunbar, and you’re going to have serious communication problems. Not a current or near-term problem for youDevise, though I’ve recently chatted to someone who runs an organisation of 180 who has exactly the communication headaches the theory predicts.
What’s bugging me is that I’m sure there’s a similar theory for much smaller teams, which I learnt as “split as soon as you grow larger than a cricket team (11 members)". I keep hoping someone will tell me where this idea comes from and what the theory is behind it (pretty sure it won’t involve gorillas, anyway).
Little’s Law
Little’s Law says
L = λW
where L = number of items in process, λ = arrival rate, and W = average wait time. It dates back to an original publication by Little in 1961 but a recent paper by Little and Graves explains it very nicely including the mathematical justification.
Those of us who just want to use the law find it useful when you know two of the figures and not the third. For instance, you may know the rough arrival rate of bug reports, and it’s usually pretty easy to check the number that are open (and you probably know if that number is stable or not). So to estimate the length of time it takes to fix a bug, divide number open by arrival rate.
I also like the law because it helps explain the lean software development theory of limiting work in progress. If your goal is to reduce L, the number of outstanding items (say bugs in the backlog) then you quickly see from the formula that you have to reduce arrival rate (by improving quality) and also bring down wait time (by improving efficiency). So among other things, limiting work in progress encourages two good behaviours on your team.
(Thanks to Laurie of New Bamboo for telling me about Dunbar’s Number, and to Donald Reintertsen for tweeting about Little’s Law.)
Why Lean?
Had a great lunchtime chat with Benjamin Mitchell and asked him to argue for lean methods in software development, as if I’d never heard of them. Here’s what I heard:
- Apply the Theory of Constraints. If a lane shuts down on a stretch of motorway, that bottleneck is the limit on flow for all cars on the road (assuming there are no other obstacles). If you want to keep the maximum throughput, keep cars arriving at the start of the closure at a constant rate - batched delivery of cars to the bottleneck mouth will reduce throughput. This is why you see lots of signs miles before roadworks - you don’t want cars slamming to a halt and trying to switch lanes suddenly just before the lane closure.
- Use pull techniques. It’s better to focus on what you can finish rather than what you can start (the latter gives you lots of half-finished items you can’t release). Allow idleness if it increases number of items you can finish. Even better if you can pull based on real money committed by a customer, as then you know exactly how to measure success (and you’ve reduced the risk of failure).
- Work in Progress limits also help you focus on what you can finish and avoid startitis, where you have too much work in your system and everyone churns switching among the overloaded tasks.
- On idleness: visible, easily-computed metrics like “lines of code typed daily” may look like measures of productivity, but the real measures are generally invisible (though good teams work to make them visible). For example, you really want to know how many of the features you are typing in will actually be used by customers, but you have to change your processes in order to measure this. Lean software development should help you find these measures and make them obvious.
- Ideas you generate but never develop are another example of real productivity measures (in this case, you want to minimise the metric). Benjamin refers to these “stored” ideas as “bananas” and reminds us that you can store bananas in a warehouse, but only for so long. Similarly, you should dump old ideas that you haven’t gotten around to building (rather than letting them rot in a wiki or ticketing system).
- You can measure the relative cost of delay in building the features in your backlog, and build those with the highest delay cost (even if they didn’t arrive first, or aren’t the favourite feature of an influential stakeholder). Paired with pulling by committed cash, this should substantially increase revenue.
- Lean management argues that few constraints are imposed by workers; nearly all are brought about by problems in the process used by those workers. See the Red Bead Experiment.
- Reacting to obstacles quickly lets you clear them and restore good throughput. Lean techniques like a kanban board help you to do this. (Benjamin is particularly skilled at extracting all the statistical value he can from the available information - ask to see his amazing charts!)
Other useful product-management tips from Benjamin:
- Take a daily poll on what wasted developers’ time yesterday. Track the results. Top items are worth addressing promptly.
- Many organisations (often big ones, but not always) reward non-risk-takers. Some organisations (startups, sports teams) reward risk-takers. Few (any?) reward based on end-to-end value, which is really the only important measure.
Creative Info Sharing
Finally got a chance to write up the Creative Info Sharing session from XP Day. Surprising how productive stuffed animals can be!
Code Dojo VIII - It's full of stars!
Rather than having a Code Dojo focused on a good practice like many of our past dojos, I decided to take us down the path of exploring Google Web Toolkit (GWT).
Many of us at YouDevise have been flexing our web-development skills and been writing very interactive, AJAX-filled web pages. Lately, we have been fed up with the standard path of server-side Java and servlets spitting out HTML with a whole lot more code written in Javascript and CSS/HTML hackery. These web app setups are a pain to test and certainly hard to get right in all browsers, all the time. So, I decided to plunge us into the monolith and see what was like when you wrote (nearly) no JS/HTML code and left that to GWT.
After a very brief intro into GWT and the tools, I introduced the problem: I asked everyone to enhance a sample web app by pairing up. Every pair of developers was given a different enhancement to the system to add. (There were 6 groups enhancing the system concurrently, for the curious.) In coding up the example app, I had done my best to give the initial code base some decent unit tests (both mocking and GWT-style,) so that people could explore the testability of GWT. I had also tried to carve out small enough classes with enough coherent and singular purpose to allow people to explore maintainability of the code as well.
After the dust had settled, the last check-ins had been merged, and the pizza arrived, the verdict was in: people really liked working with GWT. Given maybe 2 hours to code up a non-trivial feature, nearly all of the groups had succeeded. Some had enough time to go on to contemplate better designing of the system and best practices for testing this code base.
Now, every rose has its thorns. GWT was slow to compile and the error messaging is cryptic, but given the speed of development, ease of use of the platform, and all that you get for free, everyone was bullish on further use of GWT. All were looking forward to a chance with working with GWT again.
Trust and Project Culture with Rachel Davies at XPDay
Earlier this week, myself and some other youDevisers attended the XPDay conference in London. There were many good choices in terms of sessions, so many in fact, that we had to all coordinate to try to make sure we didn’t miss anything.
One of the sessions I attended was Trust And Project Culture. It was about how essential trust is to productivity and the things that can either increase or decrease trust. The session included a workshop session where we broke into small groups and tried to come up with examples of good (trust-increasing) and bad (trust-reducing) behavior in agile organizations.
Examples of bad behaviors:
- Office politics
- Mind games
- Competition
- Lack of Enthusiasm
- Hiding Information
- Cliques (in groups)
- Lack of respect or recognition
- Punitive environment
Examples of good behaviors:
- Colocation
- Building rapport
- Transparency
- Communication
- Setting clear goals
- Good environment
- Listening
- Diversity
- Encouraging asking for help
- Honesty
- Freedom to self-manage
There was also a discussion of what kinds of tools might be used to build trust in teams. Most people agreed vehemently that just ‘team building exercises’ were contrived and not very helpful. The tools are most effective when consistently used as part of an overall environment, as a compliment, not as an occasional patch to a failing environment.
Tools to help build trust:
- Create a charter for team behavior (and discuss it!)
- Frequent retrospectives
- Informative workspaces (radiators)
- Lunch and learn sessions
- Show and tell at the end of iterations
- Structure meetings to allow time for asking questions and listening
- Continuous self-improvement (provide work time for self-improvement)
- Reduce competition, reward and encourage cooperation
- Set clear standards, measure by them as well as use them
- Mood Calendar (radiator for team sentiment on a day to day basis)
- Team mother / Scrum master / (Virgil)
- Pair as part of the hiring process
- Clean floor layout, don’t hide your developers
- Utilize visual feedback (radiators)
- Look for and implement industry best practices
Testing YUI Menu Button with Selenium
Recently we encountered a snag using YUI’s menu button widget. The button doesn’t respond to Selenium click commands. Given its untestability we seriously questioned whether it was worth using the widget.
However, there is a solution! In Selenium you can select a button’s menu item by clicking the underlying <a> element. For example given the following HTML for a menu button:
<span id="yui-gen3" class="yui-button yui-menu-button">
<span class="first-child">
<button id="yui-gen3-button" type="button">Two</button>
</span>
</span>
<div id="yui-gen4" style="z-index: 1; position: absolute; visibility: hidden;">
<div class="bd">
<ul class="first-of-type">
<li id="yui-gen5" class="yuimenuitem first-of-type" groupindex="0" index="0">
<a class="yuimenuitemlabel" href="#">One</a>
</li>
<li id="yui-gen6" class="yuimenuitem yui-button-selectedmenuitem" groupindex="0" index="1">
<a class="yuimenuitemlabel" href="#">Two</a>
</li>
....
Select “One” from the menu using Selenium:
Command: click
Target: link=One
Once the <a> has been clicked the selected menu item is accessible in JavaScript via the menu button instance, e.g.
var menuButton = new YAHOO.widget.Button({...});
...
menuButton.get("selectedMenuItem");
.
Don’t use menuButton.getMenu().activeItem as that doesn’t get set when the <a> is clicked via Selenium.
Continuous Quality at uswitch
We were glad to get a visit from our friends at uswitch recently, after Hemal’s very successful skillsmatter presentation. It looks like they’re a little way ahead of us in most areas like functional testing, continuous-integration stability, and continuous deployment. Here’s a summary of what they told us about their cool processes and tools.
Evolution of functional tests
Originally uswitch had completely separate QA and development teams that exchanged little except code and bugs. For instance, QA used a testing tool called QTP that developers had never heard of - they certainly couldn’t run the QA scripts.
Their first step to improve co-operation involved developers and QA staff working together to create Selenium tests, one for each type of user and that user’s typical path through the application (what they call a “moneymaking journey"). These were good for starting the co-operation but they found them to be so flaky that they could only run them manually. It was also impossible to read the Selenium test code, even for developers.
They have now switched to Cucumber and Watir. They find that these two work well together and the biggest advantage is that business people can read the tests (they’ve even had product owners print the tests and edit them on paper to show what they want instead). They still find that the tests flicker: Internet Explorer locks up, the number of processes overwhelm the machine, and more. They have made recent progress in stabilising the tests.
They run a small set of these tests, one suite per business area, automatically during continuous integration; it takes fifteen minutes to run all the functional tests for a relatively large business area. They make failures the team’s top priority; they find that they achieve stability for awhile (for instance with an IE-killer application to eliminate hangs) but then the flickers start again, at which point they have to apply more work to increase reliability.
Some of the methods they use to increase stability include using special access methods that avoid cookies, writing certain pages that accept special query strings, and introducing custom test pages to isolate certain features.
Life of a feature
A uswitch feature starts with a business person (there is one per dev team) defining what the system should do and working with developers to write this up in Cucumber’s expressive given-when-then format. Business people check these tests and are free to add more as they think of important cases.
Each defined feature passes through these stages:
- Backlog
- Ready
- In Progress
- Inventory
- Done [i.e. released to production]
Every feature passes through these steps independently - so they can release a new feature the moment it’s finished. They can release many times a day if needed.
There is a backlog of at most ten items in the Backlog state; anything more than this is not worth tracking. uswitch don’t use a bugtracking system; if a feature is still valuable later, they won’t forget it. Instead of identifying test cases in a separate document, they just write the tests in Cucumber and can immediately execute them. (It helps that Cucumber will run a test even if it doesn’t compile.)
Each feature in the backlog is releaseable and valuable on its own - one rule of thumb is that they need to be able to send a press release for each one (it has been hard, but worth it, to train themselves to keep to this). The value may be technical; for instance, they built a new look for some pages, turned it on in production only for internal users until it was ready, then turned it on atomically for everyone. Occasionally they really do need to build a collection of related features that don’t make sense independently, and so they switch temporarily to a standard agile iteration workflow for just that set of features.
Roles and Staffing
uswitch have around 15 developers and 4 QAs. These are divided into four teams, each of which has a business person attached. QAs aren’t really distinguished from devs - they are just devs with a testing focus.
When the business person is away, the team work on tasks that are in the Ready state when they are confident this won’t lead to blockage; if there are none of these, they work on infrastructure tasks or other blockage-reducing activities. They don’t try to develop features without business advice - they find this leads to blockage and rework.
Testers do not act as intermediary between developers and business people. They find a side benefit of the close working is that business people improve their domain expertise.
Dialog of Doom
Our friend the Build Doctor graciously let us write a guest post on our current challenges with an evil dialog in Internet Explorer running under Selenium. Check it out!