Skip to content

Hiccupps - James Thomas
Syndicate content
James Thomas
Updated: 7 hours 16 min ago

On Being a Test Charter

Sat, 03/21/2015 - 08:53
Managing a complex set of variables, of variables that interact, of interactions that are interdependent, of interdependencies that are opaque, of opacities that are ... well, you get the idea. That can be hard. And that's just the job some days.

Investigating an apparent performance issue recently, I had variables including platform, product version, build type (debug vs release), compiler options, hardware, machine configuration, data sources and more. I was working with both our Dev and Ops teams to determine which of these seemed most likely in what combination to be able to explain the observations I'd made.

Up to my neck in a potential combinatorial explosion, it occurred to me that in order to proceed I was adopting an approach similar to the ideas behind chart parsing in linguistics. Essentially:
  • keep track of all findings to date, but don't necessarily commit to them (yet)
  • maintain multiple potentially contradictory analyses (hopefully efficiently)
  • pack all of the variables that are consistent to some level in some scenario together while looking at other factors
Some background: parsing is the process of analysing a sequence of symbols for conformance to a set of grammatical rules. You've probably come across this in the context of computer programs - when the compiler or interpreter rejects your carefully crafted code by pointing at a stupid schoolboy syntax error, it's a parser that's laughing at you.

Programming languages will generally be engineered to reduce ambiguity in their syntax in order to reduce the scope for ambiguity in the meaning of any statement. It's advantageous to a programmer if they can be reasonably certain that the compiler or interpreter will understand the same thing that they do for any given program. (And in that respect Perl haters should get a chuckle from this.)

But natural languages such as English are a whole different story. These kinds of languages are by definition not designed and much effort has been expended by linguists to create grammars that describe them. The task is difficult for several reasons, amongst which is the sheer number of possible syntactic analyses in general. And this is a decent analogy for open-ended investigations.

Here's an incomplete syntactic analysis of the simple sentence Sam saw a baby with a telescope - note that the PP node is not attached to the rest.

The parse combines words in the sentence together into a structures according to grammatical rules like these which are conceptually very similar to the kind of grammar you'll see for programming languages such as Python or in, say, the XML specs:
 NP -> DET N
 VP -> V NP
 PP -> P NP
 NP -> Det N PP
 VP -> V NP PP
 S -> NP VP
The bottom level of these structures are the grammatical category of each word in the sentence e.g. nouns (N), verbs (V), determiners such as "a" or "the" (DET) and prepositions like "in" or "with" (P).

Above this level, a noun phrase (NP) can be a determiner followed by a noun (e.g. the product) and a verb phrase (VP) can be a verb followed by a noun phrase (tested the product) and a sentence can be a noun phrase followed by a verb phrase (A tester tested the product).

The sentence we're considering is taken from a paper by Doug Arnold:
 Sam saw the baby with the telescopeIn a first pass, looking only at the words we can see that saw is be ambiguous between a noun and a verb. Perhaps you'd think that because you understand the sentence it'd be easy to reject the noun interpretation, but there are similar examples with the same structure which are probably acceptable to you such as:
 Bill Boot the gangster with the gunSo, on the basis of simple syntax alone, we probably don't want to reject anything yet - although we might assign a higher or lower weight to the possibilities. In the case of chart parsing, both are preserved in a single chart data structure which will aggregate information through the parse:
In the analogy with an exploratory investigation, this corresponds to an experimental result with multiple potential causes. We need to keep both in mind but we can prefer one over the other to some extent at any stage, and change our minds as new information is discovered.

As a parser attempts to fit some subset of its rules to a sentence there's a chance that it'll discover the same potential analyses multiple times. For efficiency reasons we'd prefer not to spend time working out that a baby is a noun phrase from first principles over and over.

The chart data structure achieves this by holding information discovered in previous passes for reuse in subsequent ones, but crucially doesn't preclude some other analysis also being found by some other rule. So, although a baby fits one rule well, another rule might say that baby with is a potential, if contradictory, analysis. Both will be available in the chart.

Mapping this to testing, we might say that multiple experiments can generate data which supports a particular analysis and we should provide ourselves the opportunity to recognise when data does this, but not be side-tracked into thinking that there are not other interpretations which cross-cut one another.

In some cases of ambiguity in parsing we'll find that high-level analyses can be satisfied by multiple different lower-level analyses. Recall that the example syntactic analysis given above did not have the PP with the telescope incorporated into it. How might it fit? Well, two possible interpretations involve seeing a baby through a telescope or seeing a baby who has a telescope.

This kind of ambiguity comes from the problem of prepositional phrase attachment: which other element in the parse does the PP with the telescope modify: the seeing (so it attaches to the VP) or the baby (so NP)?

Interestingly, at the syntactic level, both of these result in a verb phrase covering the words saw the baby with the telescope and so in any candidate parse we can consider the rest of the sentence without reference to any of the internal structure below the VP. Here's a chart showing just the two VP interpretations:

You can think of this as a kind of "temporary black box" approach that can usefully reduce complexity when coping with permutations of variables in experiments.

The example sentence and grammar used here are trivial: real natural language grammars might have hundreds of rules and real-life sentences can have hundreds of potential parses. In the course of generating, running and interpreting experiments, however, we don't necessarily yet know the words in the sentence, or know that we have the complete set of rules, so there's another dimension of complexity to consider.

I've tried to restrict to simple syntax in this discussion, but other factors will come into play when determining whether or not a potential parse is plausible - knowledge of the frequencies with which particular sets of words occur in combination would be one. The same will be true in the experimental context, for example you won't always need to complete an experiment to know that the results are going to be useless because you have knowledge from elsewhere.

Also, in making this analogy I'm not suggesting that any particular chart parsing algorithm provides a useful way through the experimental complexity, although there's probably some mapping between such algorithms and ways of exploring the test space.  I am suggesting that being aware of data structures that are designed to cope with complexity can be useful when confronted with it.
Images: Doug Arnold
Categories: Blogs

Why Not a Testing Standard?

Tue, 03/10/2015 - 07:01

The Cambridge Tester Meetup last night was a discussion on testing standards. Not on the specific question of ISO 29119 (for which see Huib Schoots' excellent resource) but more generally on the possibility of there being a standard at all. It was structured along the lines of Lean Coffee with thoughts and questions being thrown down on post-its and then grouped together for brief discussion.

I've recorded the content of the stickies here with just a little post-hoc editing to remove some duplication or occasionally disambiguate. The titles were just handles to attach stickies to once we had a few in an area and I haven't tried to rationalise them or rearrange the content.

Enhance/Inhibit Testing
  • Testing is a creative process so can't be standardised.
  • Testing doesn't fit into a standard format, so how can there be a standard for it? (Do we mean "good testing" whatever that is?)
  • New tools, technology might not fit into a standard.
  • Standardisation destroys great design ideas by encouraging/forcing overly broad application.
  • Can a general standard really fit specific project constraints?
  • Each tester is different.
  • A standard limits new thinking.
  • Could a standard be simply "Do the best you can in the time you have"?
Who Benefits?
  • Who do certifications serve anyway? What do they want from them?
  • As litigation becomes more prevalent who is protected by a standard? Customers, producers, users?
  • With a standard, companies can be "trusted" (QA-approved sticker).
  • People outside of test are usually very opinionated. Do standards help or hinder?
  • End users care because of the possible added costs.
  • A testing standard would provide false reassurance for companies.
  • How does an agile team fit in the standard?
  • Too much documentation? Standards may cause the need for more documentation to show compliance.
  • Standard language for communicating test ideas.
  • Divide the testing community - good or bad?
  • Respond to feedback and criticism.
How Much? or Alternatives
  • Do we need an alternative at all?
  • Where are the standards for science, consultancy, product management, development?
  • Use as much or as little of a standards as needed?
  • Could a standard be subjective?
  • Standards for products, or the process of creating products?
  • What else do we need or want instead?
  • Could a standard cover the minimum at least?
  • A standard should be flexible to adapt to project constraints.
Useful Subsets
  • Can a single standard fit different products? (Angry Birds vs nuclear reactor).
  • Uniformisation of some testing (bring up the baseline).
  • There are already some government standards.
  • Infinite space of testing. Can a standard capture that?
  • Can some aspects of testing be covered by standards? If so, which?
Can't we Just Explore?
  • Scientists do. Why can't we? (But what about mandated science?)
  • Approaches in methodologies used set out in a common understood format could help consistency.
Fear of Being Assessed?
  • Are testers just scared of being evaluated or taking responsibility?
  • I'm too shy.
  • Could it open up law suites, blame and other consequences?
  • Should you insure yourself or your company against any not conforming to the standard?
  • Anything unstructured used as an addition to, rather than part of, the primary approach. Stops people hiding?
Show Me the Money?
  • What is the motivation of those seeking to create certification? (Rent-seekers?) 
  • It's just to make money for ISO companies.
  • Adds organisation to a "messy" activity.
Certify Testers Not Testing
  • Can you differentiate certifications for testers from certifications for pieces of work? (c.f. Kaner
  • Can you say "product tester by a tester certified XYZ"?
  • How would recruitment distinguish between testers and checkers?
  • An independent body to audit the testing/tester on real project work? (Who audits the auditor?)
  • Qualification vs certification vs standardisation.
Standards in Other Industries
  • Learn more about standards in other industries and how they dealt with their first standard.
  • Standards in e.g. car safety are on the result of the work not the methodology? 
  • Universities and schools start teaching testing. Should they teach about the standards?
  • Standards to help produce evidence of testing not just test plans, which are usually fiction.
  • "Informed" standards (courses, talks etc), "in-house" standards?
  • Are objections to certification objections to theoretical risks but in practice it's possible to have something good enough?
  • Would companies without testers need a testing standard?
  • Development standards to be closely linked to testing standards.
  • Easy to find jobs abroad (if there were standardisation).
  • A standard would be good as a product.
  • Would a standard really impact our day-to-day job?
  • Is the standard simply a reason to justify testing?
  • Is the idea of a standard predicated on an outdated idea of testing?
As you can see there was no shortage of ground to cover but, with only a couple of hours, plenty was necessarily shallow or not dug into at all.

To pull out a handful of  points that I found particularly interesting: we were not shy about asking questions and we were prepared to aim them at ourselves; we bumped into the distinction between certifying product, tester and testing multiple times; we didn't really explore what we meant by standards, certification and qualification and what the differences between them might be; while the discussion was entered into with an open mind (which was the remit) there were sometimes implicit assumptions about what a standard must entail (inflexibility; lots of documentation etc) which were mostly negative and where positives were proposed they tended to be viewed more as possibilities.

P.S. There's a few photos.
Categories: Blogs

The Rule of Three and Me

Sat, 02/21/2015 - 10:20
You can find Weinberg's famous Rule of Three in a variety of formulations. Here's a couple that showed up when I went looking just now (1, 2):
If you can't think of at least three things that might be wrong with your understanding of the problem, you don't understand the problem. If I can’t think of at least three different interpretations of what I received, I haven’t thought enough about what it might mean. At work I challenge myself to come up with ideas in threes and, to keep that facility oiled, when I'm not testing I challenge myself to turn the crap joke I've just thought of into a triad. 
By providing constraints on the problem I find the intellectual joking challenge usefully similar to the intellectual testing challenge. Here's an example from last week where, after I happened onto the first one, I constrained the structure of the others to be the same and restricted the subject matter to animals:
  • If I had three pet lions I would call them Direct, Contour and What's My.
  • If I had three pet ducks I would call them Via, Aqua and Paroled On The Basis Of Good Con.
  • If I had three pet mice I would call them Enor, Hippopoto and That Would Be Like Turkeys Voting For Chris.
As an additional motivational aid I've taken to punting the gags onto Twitter. You can see some of them here ... but remember: I never said they were good.Image: De La Soul
Categories: Blogs

Why that Way?

Thu, 02/19/2015 - 09:01
Most working days I go for a walk round the block after I've eaten my dinner. As I leave our office I've got two primary choices: lift or stairs. I say primary because there's obviously many things I could do at that point (although I have promised my wife that naked star jumps will not feature in my daily exercise regimen ... again). In any case I go for the stairs.

At the bottom of the stairs I have two more choices: left or right. Each takes me to a different exit from the building but both open onto the circuit that I stroll round, and if I leave by one of them I will naturally arrive back at the other so there's (again, to the level of granularity that I care about) no significant difference between them.

I can't go straight on at the bottom of the stairs because the lift is there and a u-turn sends me back into work so every day I am forced to make the choice - left or right.  And every day until recently I've been making that choice without any conscious thought.

But when I realised I was doing it, I started looking for patterns. Philosophical aspects of the observer effect to one side I discovered that, over the period I watched, I tended to choose left and right roughly equally and that (ignoring extraneous circumstances such as deliveries being in the way) I have a tendency to go in the direction closest to the side of the stairs I happen to be on.

For instance,  if I've moved left to let someone else up on my right, I'll tend to go left at the bottom. If I've swung round the corner between flights a bit faster than normal and ended up on the right hand side, I'll naturally hang a right when I get to the ground floor too.

On my walk I listen to podcasts. A couple of weeks ago, while I was stair-spotting, Invisibilia told the story of how an experimenter's attitude towards their subjects can influence the performance of the subjects. In one landmark study, when  an experimenter was told that a set of lab rats were smart, the rats performed better and when told they were stupid, the rats performed worse.

The study concluded that the experimenter behaviour was unconsciously changed by their expectation of the animals. When told the rat was clever they might hold it more gently, talk to it more and so on. This in turn made the rat more comfortable and in a better mood to run around the maze or whatever.
Unconscious action can lead to unexpected but, crucially, predictable consequences.I don't think about which way I'd go from the bottom of the stairs but I can discern a pattern to the behaviour when I look. We're all making decisions all day every day - both in trivial matters like which way to leave a building and in more serious stuff like which way we'll test something or how we'll speak to our colleagues.

I'm never short of questions, but now I have some new ones: Why did I choose that way? Did I notice there were options? Did I know I was choosing? How did that influence my behaviour? How can I know the effects of that?
Categories: Blogs

Haiku Like You?

Sun, 02/01/2015 - 11:51
Early yesterday morning I was setting up a bunch of scripts to run a bunch of requests against a bunch of servers in a bunch of permutations over the course of a few days.
Bitter experience has taught me that this kind of thing rarely works right the first time so if I can, I find a way to start with a small version of the setup.  Here, that meant choosing a sensible subset of the requests, letting things run for a few minutes, keeping an eye on progress (files created in the right place, servers generating the kind of messages, execution times reasonable, CPU load and number of processes in the right ballpark and so on) and then inspecting logs at the end.

As I worked the bugs out of the configuration I had a little time to kill on each iteration. But what to do?

I'd stumbled across a couple of haiku I'd enjoyed just the day before and I'd been pondering something Chris George had said about factory testers during Cambridge Lean Coffee last week. So, a mission: one haiku per execution, each execution, on the topic of not-testing. Here's the results:
You like to say that
Testing is breaking software
Is that all? Really? When presented with
Something to test, first thought is
"Who will tell me how?" Following the script
Gaily sailing past issues
"Hey, not in my scope" Some boundary value
Submitting the empty string
Time for a cuppa?With apologies to poets everywhere.
Categories: Blogs