Tag Archives: testing

Entomosemantics, or, how to talk about bugs

One of the skills they pay me the big bucks medium-sized Euro for at work is assessing the risks of changes going into production. To do it, I’ve become pretty good at evaluating the system that is being changed.

I could snow you with talk of checklists, metrics, and charts, but really, my most valuable analytical tools are my pattern-matching wetware and my experience. With those two things, I can usually describe the current state of the system and estimate its chances of going horribly wrong in the near future, just based on gut feel.

Below are my private terms for the various states of computer system health. I use different ones in official reporting. Usually.

  • clean: The system runs smoothly, with no visible bugs. I read the logs to calm down after stressful meetings.
  • stable: There are the occasional interface bugs, but the thing runs reliably. It feels like a melon you tap in the supermarket and decide to buy.
  • scruffy: Most users hit some kind of bug or another, but they can make it work most of the time. Regular users have workarounds the way commuters have rat-runs that avoid traffic blackspots.
  • buggy: This is when users begin to see the bugs they encounter as a pattern rather than individual occurrences. They start to wonder if the pattern of bugs indicates a deeper unreliabilty. They’re right to.
  • brittle: Bugs aside, it pretty much works…right up to the point where it shatters into little tiny pieces.
  • fragile: It falls over a lot. Ops can pretty much always get it back up again in a reasonable time. We spend a lot of time apologizing.
  • fucked: It’s broken. Again. Fortunately, we have backups, and we’re fairly sure they’ll work.
  • comprehensively fucked: The backups didn’t work. Shark time.

Entropy tells us that, barring intervention, systems tend to move down this sequence. But it’s not a linear progression. For instance, brittle and fragile, are parallel routes to fuckedness. They’re basically two different failure modes: the Big Bad Bang and Death by a Thousand Cuts.

The applicability of these categories to other matters is left as an exercise for the reader.

Cross-posted on Making Light, where any comments will live.

How To Break Things Real Good

Martin has been absent because he’s been redesigning his side of the site. (Go check it out. It’s cool.) I’ve been absent for much less interesting* reasons.

Basically, I’ve been studying for a test. About testing. The Information Systems Examination Board (ISEB) Practitioner Certificate in Software Testing, or, as I think of it, How To Break Things Real Good.

After eight days of classroom instruction spread over two weeks, I had less than a month to cram the syllabus in between my ears (Only click on the link if you have persistent insomnia. Not suitable for reading whilst operating heavy machinery*). I did it – I can now go on at great length about the relative strengths of boundary value analysis and state transition testing in the design of functional tests, name 18 types of automated test tool, and describe three software development lifecycle models and how they relate to testing.

I wasn’t a very good classmate, I’m afraid. I got massively insecure early on in the instruction section, when I came in on the second week to find that someone extra had turned up and taken my seat and my course materials. The instructor was mortified, but I felt deeply unwelcome, and turned to the same obnoxious behaviour I used to get through high school. When I feel out of place, I become the most annoyingly, articulately intelligent pain in the posterior ever…trying to prove that separate does not equal inferior, I guess.

I did this throughout the second week of classes, and only got worse in the revision session. I even straightened the instructor out on his understanding of one area of the syllabus. Yes, I was right and he was wrong. But that doesn’t make it less obnoxious**. I hope I made up for it a little with some of the tutoring I did on the side.

The exam was a pig, but I knew it would be. I think I did OK, on balance, though I won’t know for a couple of months. The pass mark is 60%, and if I get over 80% I get a distinction. (Which is, in a small community, considered rather cool.) I’ll be content to pass.***

I promise, now that I’m done with that, I’ll post to the blog again. I’ll even go back and pick out the best photos I took over that time, tell you about the time Fionaberry did a face plant at full speed running downhill, and even update my cinnamon roll recipe. Promise.

* I don’t think it’s boring. But I know everyone else does.

** Peter, if you’re reading this, I am sorry.

*** This is a lie. I would be marginally content to hear that I got 100%. I’ll gnash my teeth over every missed point. I know I missed at least 7 marks, and it’s driving me nuts.

Where’s the Black Squad when we need them?

Q: What do weapons of mass destruction have to do with cot death?

A: In both areas, the “experts” evaluating the evidence and acting on their conclusions have caused enormous devastation. Then, after the fact, that evaluation has proven wrong.

Why? I have an insight that may be useful.

I’ve held a number of jobs in my working life. The three that I’ve spent the longest at, though, are paralegal (2 years), financial auditor (3 years) and software tester (7 years and counting). Though they seem quite varied, they have one common factor: they’re all about the evaluation of evidence.

Lawyers and paralegals, of course, work with evidence all the time: gathering it, presenting it, writing about it. There’s no pretense of neutrality. A trial lawyer’s job (aided by paralegals) is to find evidence that supports one particular view, and to discount evidence that doesn’t.

Financial auditors are, on the surface of it, very different from lawyers. They go into companies at the year end and check the financial accounts those companies produce. Each stage of the audit is made up of tests on certain aspects of the accounts, whether it be a stock count to ensure that the inventory numbers are correct, or a check of reconciliation procedures to allow the auditors to rely on internal financial systems. And for each stage of the audit, we used to state the specific object of the test. I still remember the format.

Object of Test
To accumulate audit evidence that stock valuations are materially accurate and correctly stated in the year end accounts.

The public used to percieve auditors as unbiased and neutral (possibly even stringent and difficult to satisfy), but of course the scandals of recent years (the Maxwell empire, Baring’s, Enron) changed all that. Everyone knows the subtle, unstated pressure that the auditors are under when they go into a company, particularly one which pays the firm’s consultancy arm large fees. It’s almost unheard of for a Big Five firm to refuse to sign off a set of accounts.

Learning about software testing allowed me to see consciously what I knew unconsciously already. The audit process is biased in favour of approval, and any such bias makes an enormous difference to the results obtained. This is a phenomenon that testers are painfully aware of – it’s the reason that software has to be independently tested.

To quote one of the foundational books on software testing (The Art of Software Testing, by Glenford J Myers1):

“Since human beings tend to be highly goal-oriented, establishing the proper goal has an important psychological effect. If our goal is to demonstrate that a program has no errors, then we shall tend to select test data that have a low probability of finding errors. On the other hand, if our goal is to demonstrate that a program has errors, our test data will have a higher probability of finding errors.”

Reread the sample “object of test” above in the light of that quote. What is the goal of the test? Is it to find “bugs” in the accounts, or to establish that they aren’t there? How likely does that make it that we would find errors?

The most successful software testing teams are the ones who take a skeptical, or even hostile, attitude toward code quality. IBM’s infamous “Black Team” took this to extremes, dressing in sinister clothes, cheering when they found bugs, and deliberately striking fear into the hearts of the programmers whose code they tested. The reliability of mainframe operating code is their legacy – we wouldn’t have a hope of achieving “six nines” (99.9999%) availaibility had they not found the bugs they did.

So if bias affects results, how then do we view Professor Meadows and his eponymous Law (“One cot death is a tragedy, two is a coincidence, and three is murder unless proven otherwise.”)? His testimony has jailed women since acquitted of the deaths of their children, and caused authorities to remove babies from their parents, sometimes permanently. Yet statisticians claim that he took a “stamp collecting” attitude toward evidence, including the cases that supported his views and overlooking the others. And given the above axiom, how did he approach the deaths of children, when asked to testify at their mothers’ trials?

How a similar bias could affect the officials of two governments, when considering whether to send in the tanks, is left as an exercise for the student. But it begs the question: will any enquiry that focuses on the evidence, rather than the objectives of the people evaluating that evidence, really explain the conclusions that led to war?

  1. This is listed at $150.00 on Amazon at the moment. That’s pretty expensive, even for a computer book, but this one’s worth it. It’s 177 pages long and has been indispensible since its publication in 1979…quite a contrast to Martin’s Microsoft exam books, which can run to over a thousand pages and are obsolete before the ink dries.

Offshoring Redux, or, what does a sporran have to do with software?

The IT industry has been gripped by anxiety over the last few months over the growing trend towards “offshoring”. More and more companies are moving their software development to countries like India and China, where a highly educated workforce is willing to code for a fraction of the costs of North Americans and Europeans. This is a Bad Thing according to pundits, but, I suspect, an inevitable one. UK call centres and directory enquiries are already frequently staffed from the Indian subcontinent (with operators given “cultural training” so they can chat about the latest happenings on Eastenders.)

I also suspect that my own specialty, software testing, is going to see a renaissance in the US, Canada, and Europe. At present, software testing seems to be moving offshore along with the development. But I reckon a given company will try an average of one offshore implementation without onshore testing before we testers become very, very popular. Even “onshore” offsite developments need acceptance testing. How much more will projects developed across time zones, continents, and language barriers?

But some industries are supposed to be offshoring-proof. Right? Right? Wrong. sporran makers are under threat from offshoring.

Is nothing sacred?

The Prisoners Problem

Martin has got me involved in the Prisoner Problem. As a software tester by vocation as well as profession, I’ve been his gadfly, pointing out the flaw in his solution.

Continuing the gadfly/software tester/push the limits theme, I would like to propose a solution.

In addition to randomly flipping switches, each prisoner writes his name on the wall of the room when he visits it. Since the warden promised not to let anyone in the room except when the prisoners are there, the names won’t be erased or added to without the prisoners’ knowledge. When all the names are there, then they’ve all visited the room.

Illegal? Nope. The warden says nothing about doing anything else when visiting the room, as long as they flip one and only one switch.

Wrong? Of course. But what are a bunch of dumb lags to do?