Infotopia, information-gathering, and software QA

A couple of weeks ago I finished reading Cass Sunstein’s Infotopia. While certainly not a perfect book by any stretch, it gives a stimulating overview of a central problem for any society- how it collects and filters information so that it can make decisions. Being a good U of C guy, he starts with Hayek’s notion that the price mechanism is an elaborate mechanism for ‘sharing and synchronizing local and personal knowledge‘ (to quote Wikipedia), and then goes on to discuss other mechanisms for getting information out of the heads which contain it- wikis, open source, democracy, polling, deliberation, prediction markets, etc. An interesting read to frame a lot of discussions around.

One of those discussions came up today. Quite simply, the big problem in QA is getting information about the state of the software out of the software and into the hands of developers as efficiently as possible.

This has three aspects: creating the information, getting it in the hands of the QA teams, and then filtering it into a form that is useful for developers to work on. Traditional QA has a very hard time getting the information- there are a lot of lines of code to be exercised, and very few people exercising the code (relatively speaking.) It is like squeezing water out of a stone, so they have to do a lot of things (like extensive automated testing) to get that information. The output is a relatively small amount of very regularized data, which is easy to present (though hard to weight efficiently and accurately.)

In contrast, open source QA has a whole ocean of information from the legions of volunteers willing to run pre-release code; the trick is to tap into that water without drowning in it.  It isn’t regularized, but given a large enough body of users over time, you can be fairly certain that the bug reports will represent an accurate cross section of your problems, and the interaction with real users (instead of interaction with automated test tools or third-hand via the sales/customer relationship) can give you a fairly good idea of what bugs are actually important to real people.

If you’ve got one person to work on QA, I’d say you always want to swim in the ocean instead of doing any amount of automated squeezing information from the stone. This is not to say automated testing doesn’t have its place- in particular, good unit testing captures information at a very high-efficiency junction (when the original author is writing code) and then gives it back in a very compressed, efficient form that the developer should know immediately how to prioritize and deal with. Similarly, automated tests that attempt to capture regressions once a bug is fixed are also fairly efficient- they capture information which real humans in the field have identified as an important problem, and they again report simple, clear, efficient information- this bug # and commit # which were fixed are now not fixed. But generic ‘well, we’re going to write tests now because that is how we did it when we had no users willing to help us test’ testing is a very inefficient use of manpower- it is trying to dig a deep well to get information when you live next to a deep, clear, safe mountain lake.

So there you have it- proprietary QA is trying to squeeze information-water from a stone; open source QA is trying to learn how to swim in a sea of information. I know which problem I’d rather have.