Recently in Testing Category

Ink, turpentine, paper, water

| 7 Comments

For at least 1500 years, Japanese artists have practiced suminagashi, the art of marbling paper with ink floating on water. The marbler uses brushes to place alternating drops of black calligraphy ink and turpentine on the surface of a full basin, then lays a sheet of paper down to capture the resulting patterns. They look like clouds, or smoke, or the grain of twisted trees. Each pattern is unique, unlike in Western marbling, where the creator can reproduce essentially the same design many times.

Ink, turpentine, water, paper. It seems so simple.

And it is very simple, but only after you accept one thing: you are not in control of the outcome. The ink goes where it wills, and the marbler can only follow. There are tricks to give the pattern an overall direction, such as controlling the amount of ink and turpentine or gently blowing over the surface of the water. But the heart of suminagashi is trusting what you can't predict or control.

I recently read George Oates's essay about the ways that Flickr created its community: Community: From Little Things, Big Things Grow on A List Apart. Two particular paragraphs really jumped out at me:

Embrace the idea that people will warp and stretch your site in ways you can't predict—they'll surprise you with their creativity and make something wonderful with what you provide.
There's no way to design all things for all people. When you're dealing with The Masses, it's best to try to facilitate behavior, rather than to predict it. Design, in this context, becomes more about showing what's *possible* than showing what's *there*.

Flickr's history has proven her right. There are any number of wildly varying communities on the site, many of them either accidentally or deliberately experimental. Flickr groups are even cited as a case study in Here Comes Everybody, Clay Shirkey's recent book on online community dynamics.

And now it's our turn.

Last year, my company (MediaLab, which makes a library search software package called Aqua Browser Library) released our new social library software: My Discoveries.

The essence of My Discoveries is this: allow users to add information to the library catalog. Let them tag things, make lists of related items, fill in ratings, write reviews. Then let others see what they've done. Turn the patron's interaction with the library's catalog into a conversation with the catalog, and with each other.

I've been involved in both the design and testing. One of the core principles we've kept in mind throughout the process is that we cannot predict what people will do with it1. Designing and testing in the light of that kind of uncertainty is very different, and much more interesting, than working to a known, restricted usage profile. It affects everything we do, from what characters are allowed in list names to which statistics we want to gather. How does one design metrics to detect the unpredictable?

Tags, lists, ratings, reviews. It seems so simple.


  1. Of course, we are not so naive as to think that all the new ideas that people come up with for My Discoveries will be good ones. I moderate a web community in my spare time, so I know how bad things can get. As a result, I have put a lot of attention into the administrative interface—and I expect do more on it in the future. If we give users room to innovate, we have to give librarians the wherewithal to detect and clean up misbehavior.

The very unhappy path to Terminal 5

| 15 Comments

I see that the new terminal at London's Heathrow airport is in the midst of another weekend's disruption. Problems on the terminal's opening weekend resulted in over 200 flights cancelled and a backlog of 28,000 bags. The chaos has already cost British Airways, the sole user of the terminal, £16m, and some estimates put the eventual cost around £50m.

Initial problems reported included the failure of either passengers or staff to find the car parks, slow security clearance for staff, consequent delayed opening of check-in desks, and multiple unspecified failures of the baggage handling systems. Once the initial failures occurred, a cascade of problems followed as passengers began to clog up the people-processing mechanisms of the terminal.

This weekend's disruption has been blamed on "a new glitch" in the baggage handling system. I suspect that means that when they solved one set of problems they unmasked another. A spokeswoman assures us that they're merely planning how to put an identified solution in place. Her statement doesn't include any reference to the fact that these problems often nest, like Russian dolls, and that the new solution may uncover—or introduce—new problems.

Of course, my reaction was, "Did they test the terminal before opening it?" The errors shown include both functional errors (people can't find the car park) and non-functional ones (the baggage system failed under load). No system is implemented bug-free, but the breadth of error type got me wondering.

Fortunately, the Beeb covered some of the testing performed before the terminal opened. Apparently, operation of the terminal was tested over a six month period, using 15,000 people. The testing started with small groups of 30 - 100 people walking through specific parts of the passenger experience. Later, larger groups simulated more complex situations. The maximum test group used was 2,250. BAA said these people would "try out the facilities as if they were operating live."

Do 2,250 people count as a live test? Are they numerous enough to cause the sorts of problems you're looking for in a volume test?

I plucked a few numbers off the web and passed them through a spreadsheet. T5 was designed to handle 30 million passengers per year, which comes out to an average of 82,000 per day, or 5,000-odd per hour in the 16-hour operating day (Heathrow has nighttime flight restrictions). These are wildly low numbers, because airports have to handle substantial peaks and troughs. Say that on the busiest day you get 150% of flat average, or 7,500 people per hour. Assuming 75% of the people in the terminal are either arriving from or heading toward London, and the rest are stopping over for an average of 2 hours, that's about 9,375 passengers in the terminal at a given time.

9,375 is more than 2,250. You can,however, magnify a small sample to simulate a large one (for instance, by shutting off 2/3 the terminal to compact them into a smaller space). It's not just a numbers game, but a question of how you use your resources.

Most of the testing documentation will of course be confidential. But I found an account of one of the big tests. I would expect that any such report was authorised by BAA, and would therefore be unrealistically rosy; they want passengers to look forward to using the new terminal. But still, the summary shocked me.

In fact the whole experience is probably a bit like the heyday of glamorous air travel - no queues, no borders and no hassle.

Any tester can translate that one. It means:

We didn't test the queuing mechanisms, border controls, or the way the systems deal with hassled passengers.

In software terms, there is something known as the happy path, which is what happens when all goes well. The happy path is nice to code, nice to test, nice to show to management. It is, however, not the only path through the system, and all the wretched, miserable and thorn-strewn paths must also be checked. This is particularly important in any scenario where problems are prone to snowballing. (Airport problems, of course, snowball beautifully.)

Based on the account I read, these testers were set up to walk the happy path. They were not paid for their labours, but were instead fed and rewarded with gifts. I'm sure food and goodie bags were cheaper than actual pay, but they dilute the honesty of the exchange. We're animals at heart, and we don't bite the hand that feeds us. We like people who give us presents. Getting those people—mostly British people—to act like awkward customers, simulate jet lag or disorientation, or even report problems must have been like getting water to flow uphill.

Furthermore, look at the profile of testers mentioned: an ordinary reporter and a bunch of scouts and guides. I wish I believed that the disabled, the families with cranky children, and the non-English speakers were just at another table at breakfast. But I don't. I suspect the test population was either self-selecting, or chosen to be easy to deal with. In either case, it didn't sound very realistic.

It's possible that there was another test day for people who walked the unhappy path, and that it wasn't reported. It's possible that they did clever things, like salt the crowd with paid actors to clog up the works and make trouble, and that our reporter simply missed those incidents.

But I've worked on big projects for big companies, and that's not what I'm betting. I suspect there were very good test plans, but that for reasons of cost and timing they were deemed impractical. So compromises were sought in large meetings with mediocre biscuits. Gantt charts were redrawn late at night using vague estimates that were then taken as hard facts. Tempers were lost, pecking orders maintained. People assured each other that it would be all right on the night.

It wasn't.

I wish I believed that the next time someone does something like this, they'll learn the lessons from the T5 disaster. But that's happy path thinking, and I'm a tester. I know better.

Akron and the Abi Field

| 7 Comments

When the going gets tough at work (as it is now), I often wonder why I do what I do. This is one of the little stories that remind me why I am a software tester.

Martin works for SkyScanner, a flight pricing site. He was testing out some code one evening, a couple of months ago, and ran into the sort of frozen-brain feeling you get after too long at the keyboard. So he pushed his wheely chair back from his desk, into my line of sight.

"Bun," he said, "Name me two destinations. Just any cities."

"Düsseldorf," I replied, "and Akron, Ohio."

"Thanks," he said, and wheeled back to his desk to fiddle with the new test data. taptaptap. "[insert curse word]." taptaptap. "[insert worse curse word]." taptaptap.

I looked up as he rolled back into my line of sight, looking exasperated. "How do you do that?"

Turns out that Akron, Ohio, USA, is served by two airports, Akron and Akron Canton. And some clever soul, somewhere in the ancestry of the data they were working with, had remapped Akron Canton to Guangzhou Province in China. That was giving him some...funny results.

So they had to go clean up their data. And I remembered why I'm a software tester.

Result!

| 1 Comment

A couple of months ago, I took a somewhat less than fun exam on software testing.

So last week I got the results.

84%. A pass, with distinction.

So, dear people, what do you think my reaction was?

A. Yay! I passed!
B. Meh. It's just an exam.
C. Where did I lose 16 whole marks?
D. All of the above, in turn.

Answers on a postcard, please.

How To Break Things Real Good

| 2 Comments

Martin has been absent because he's been redesigning his side of the site. (Go check it out. It's cool.) I've been absent for much less interesting* reasons.

Basically, I've been studying for a test. About testing. The Information Systems Examination Board (ISEB) Practitioner Certificate in Software Testing, or, as I think of it, How To Break Things Real Good.

After eight days of classroom instruction spread over two weeks, I had less than a month to cram the syllabus in between my ears (Only click on the link if you have persistent insomnia. Not suitable for reading whilst operating heavy machinery*). I did it - I can now go on at great length about the relative strengths of boundary value analysis and state transition testing in the design of functional tests, name 18 types of automated test tool, and describe three software development lifecycle models and how they relate to testing.

I wasn't a very good classmate, I'm afraid. I got massively insecure early on in the instruction section, when I came in on the second week to find that someone extra had turned up and taken my seat and my course materials. The instructor was mortified, but I felt deeply unwelcome, and turned to the same obnoxious behaviour I used to get through high school. When I feel out of place, I become the most annoyingly, articulately intelligent pain in the posterior ever...trying to prove that separate does not equal inferior, I guess.

I did this throughout the second week of classes, and only got worse in the revision session. I even straightened the instructor out on his understanding of one area of the syllabus. Yes, I was right and he was wrong. But that doesn't make it less obnoxious**. I hope I made up for it a little with some of the tutoring I did on the side.

The exam was a pig, but I knew it would be. I think I did OK, on balance, though I won't know for a couple of months. The pass mark is 60%, and if I get over 80% I get a distinction. (Which is, in a small community, considered rather cool.) I'll be content to pass.***

I promise, now that I'm done with that, I'll post to the blog again. I'll even go back and pick out the best photos I took over that time, tell you about the time Fionaberry did a face plant at full speed running downhill, and even update my cinnamon roll recipe. Promise.


* I don't think it's boring. But I know everyone else does.

** Peter, if you're reading this, I am sorry.

*** This is a lie. I would be marginally content to hear that I got 100%. I'll gnash my teeth over every missed point. I know I missed at least 7 marks, and it's driving me nuts.

Where's the Black Squad when we need them?

Q: What do weapons of mass destruction have to do with cot death?

A: In both areas, the "experts" evaluating the evidence and acting on their conclusions have caused enormous devastation. Then, after the fact, that evaluation has proven wrong.

Why? I have an insight that may be useful.

I've held a number of jobs in my working life. The three that I've spent the longest at, though, are paralegal (2 years), financial auditor (3 years) and software tester (7 years and counting). Though they seem quite varied, they have one common factor: they're all about the evaluation of evidence.

Lawyers and paralegals, of course, work with evidence all the time: gathering it, presenting it, writing about it. There's no pretense of neutrality. A trial lawyer's job (aided by paralegals) is to find evidence that supports one particular view, and to discount evidence that doesn't.

Financial auditors are, on the surface of it, very different from lawyers. They go into companies at the year end and check the financial accounts those companies produce. Each stage of the audit is made up of tests on certain aspects of the accounts, whether it be a stock count to ensure that the inventory numbers are correct, or a check of reconciliation procedures to allow the auditors to rely on internal financial systems. And for each stage of the audit, we used to state the specific object of the test. I still remember the format.

Object of Test
To accumulate audit evidence that stock valuations are materially accurate and correctly stated in the year end accounts.

The public used to percieve auditors as unbiased and neutral (possibly even stringent and difficult to satisfy), but of course the scandals of recent years (the Maxwell empire, Baring's, Enron) changed all that. Everyone knows the subtle, unstated pressure that the auditors are under when they go into a company, particularly one which pays the firm's consultancy arm large fees. It's almost unheard of for a Big Five firm to refuse to sign off a set of accounts.

Learning about software testing allowed me to see consciously what I knew unconsciously already. The audit process is biased in favour of approval, and any such bias makes an enormous difference to the results obtained. This is a phenomenon that testers are painfully aware of - it's the reason that software has to be independently tested.

To quote one of the foundational books on software testing (The Art of Software Testing, by Glenford J Myers1):

"Since human beings tend to be highly goal-oriented, establishing the proper goal has an important psychological effect. If our goal is to demonstrate that a program has no errors, then we shall tend to select test data that have a low probability of finding errors. On the other hand, if our goal is to demonstrate that a program has errors, our test data will have a higher probability of finding errors."

Reread the sample "object of test" above in the light of that quote. What is the goal of the test? Is it to find "bugs" in the accounts, or to establish that they aren't there? How likely does that make it that we would find errors?

The most successful software testing teams are the ones who take a skeptical, or even hostile, attitude toward code quality. IBM's infamous "Black Team" took this to extremes, dressing in sinister clothes, cheering when they found bugs, and deliberately striking fear into the hearts of the programmers whose code they tested. The reliability of mainframe operating code is their legacy - we wouldn't have a hope of achieving "six nines" (99.9999%) availaibility had they not found the bugs they did.

So if bias affects results, how then do we view Professor Meadows and his eponymous Law ("One cot death is a tragedy, two is a coincidence, and three is murder unless proven otherwise.")? His testimony has jailed women since acquitted of the deaths of their children, and caused authorities to remove babies from their parents, sometimes permanently. Yet statisticians claim that he took a "stamp collecting" attitude toward evidence, including the cases that supported his views and overlooking the others. And given the above axiom, how did he approach the deaths of children, when asked to testify at their mothers' trials?

How a similar bias could affect the officials of two governments, when considering whether to send in the tanks, is left as an exercise for the student. But it begs the question: will any enquiry that focuses on the evidence, rather than the objectives of the people evaluating that evidence, really explain the conclusions that led to war?


  1. This is listed at $150.00 on Amazon at the moment. That's pretty expensive, even for a computer book, but this one's worth it. It's 177 pages long and has been indispensible since its publication in 1979...quite a contrast to Martin's Microsoft exam books, which can run to over a thousand pages and are obsolete before the ink dries.

The IT industry has been gripped by anxiety over the last few months over the growing trend towards "offshoring". More and more companies are moving their software development to countries like India and China, where a highly educated workforce is willing to code for a fraction of the costs of North Americans and Europeans. This is a Bad Thing according to pundits, but, I suspect, an inevitable one. UK call centres and directory enquiries are already frequently staffed from the Indian subcontinent (with operators given "cultural training" so they can chat about the latest happenings on Eastenders.)

I also suspect that my own specialty, software testing, is going to see a renaissance in the US, Canada, and Europe. At present, software testing seems to be moving offshore along with the development. But I reckon a given company will try an average of one offshore implementation without onshore testing before we testers become very, very popular. Even "onshore" offsite developments need acceptance testing. How much more will projects developed across time zones, continents, and language barriers?

But some industries are supposed to be offshoring-proof. Right? Right? Wrong. sporran makers are under threat from offshoring.

Is nothing sacred?

On Craftsmanship

| 2 Comments

I went through a pretty bad patch at work last month. I was feeling annoyed at the people I work with, stressed out by a developing problem that I couldn't seem to solve, and frustrated with myself for getting into the situation at all. I was even having work stress dreams (coming into the office naked from the waist up, that sort of thing).

A lot of this was based on fear. I am performing a role pioneered by someone with vastly more experience and knowledge than I have. Even after a year, I am still scrambling to catch up, learning on the fly. But I feel like by now I should know everything I need to do my job. This made it hard to ask questions, and consequently made me defensive and unadventurous. I found myself backing away from challenges because I was afraid they'd turn into cans of worms, that people would ask me things I couldn't answer. Easier to say no than to find a way to say yes.

But I was rereading A Degree of Mastery, one of my bookbinding books. The author, Annie Tremmel Wilcox, writes about the time that she was an apprentice bookbinder. She spends a lot of time thinking about the idea of craftsmanship, particularly as embodied by the master bookbinder she is studying with. And, reading that, I understood my real problem. The lack of knowledge, the feeling of looming intimidation, was only a symptom.

I had stopped approaching my job as a craftsman. I was no longer taking pride in the innate quality of the work I was doing, but had got tied up in the politics of it all. It's easy to do in my role, where there is a lot of political give and take.

To a politician, the quality of your work is one of many negotiable items. You take shortcuts to do favours, until taking the time to do something right is seen as an imposition. A craftsman abhors this approach, and would rather do something less fancy but do it right than do more in some half-assed way.

As a craftsman, with the priority on the quality of my work, I find the barriers to asking for help have diminished. If the quality of my work is my primary concern, then the desire to save face by not appearing ignorant cannot be. That's the primary concern of a polician.

Going into work is a lot easier now. I even keep a bone folder on my keyboard (above the F keys). It's sort of a personal emblem of craftsmanship.

                              - o0o -

Grammar notes: Although I am a woman, I use the terms "craftsman" and "craftsmanship". My alternatives appear to be "crafter" / "craftership" and "craftswoman" / "craftswomanship". Now, "crafter" sounds like "crofter" to me, and I have nothing whatever to do with sheep. And while "craftswoman" is fine, "craftswomanship" is just too awkward. (Don't even get me started on "craftspersonship"...) Besides, I am confident enough in my femininity to be able to use a masculine term about myself.

The Prisoners Problem

Martin has got me involved in the Prisoner Problem. As a software tester by vocation as well as profession, I've been his gadfly, pointing out the flaw in his solution.

Continuing the gadfly/software tester/push the limits theme, I would like to propose a solution.

In addition to randomly flipping switches, each prisoner writes his name on the wall of the room when he visits it. Since the warden promised not to let anyone in the room except when the prisoners are there, the names won't be erased or added to without the prisoners' knowledge. When all the names are there, then they've all visited the room.

Illegal? Nope. The warden says nothing about doing anything else when visiting the room, as long as they flip one and only one switch.

Wrong? Of course. But what are a bunch of dumb lags to do?

About this Archive

This page is an archive of recent entries in the Testing category.

Sonnets is the previous category.

Web is the next category.

Find recent content on the main index or look in the archives to find all content.

Pages

Powered by Movable Type 5.11