I am the CADT; and advice on NEEDINFOing old bugs en masse

[Attention conservation notice: probably not of interest to lawyers; this is about my previous life in software development.]

<a href="https://commons.wikimedia.org/wiki/File:MW_Bug_Squad_Barnstar.svg">Bugsquad barnstar, under MPL 1.1</a>
Bugsquad barnstar, under MPL 1.1

Someone recently mentioned JWZ’s old post on the CADT (Cascade of Attention Deficit Teecnagers) development model, and that finally has pushed me to say:

I am the CADT.

I did the bug closure that triggered Jamie’s rant, and I wrote the text he quotes in his blog post.1

Jamie got some things right, and some things wrong. The main thing he got right is that it is entirely possible to get into a cycle where instead of seriously trying to fix bugs, you just do a rewrite and cross your fingers that it fixes old bugs. And yes, this can particularly happen when you’re young and writing code for fun, where the joy of a from-scratch rewrite can overwhelm some of your other good senses. Jamie also got right that I communicated the issue pretty poorly. Consider this post a belated explanation (as well as a reference for the next time I see someone refer to CADT).

But that wasn’t what GNOME was doing when Jamie complained about it, and I doubt it is actually something that happens very often in any project large enough to have a large bug tracking system (BTS). So what were we doing?

First, as Brendan Eich has pointed out, sometimes a rewrite really is a good idea. GNOME 2 was such a rewrite – not only was a lot of the old code a hairy mess, we decided (correctly) to radically revise the old UI. So in that sense, the rewrite was not a “CADT” decision – the core bugs being fixed were the kinds of bugs that could only be fixed with massive, non-incremental change, rather than “hey, we got bored with the old code”. (Immediately afterwards, GNOME switched to time-based releases, and stuck to that schedule for the better part of a decade, which should be further proof we weren’t cascading.)

This meant there were several thousand old bugs that had been filed against UIs that no longer existed, and often against code that no longer existed or had been radically rewritten. So you’ve got new code and old bugs. What do you do with the old bugs?

It is important to know that open bugs in a BTS are not free. Old bugs impose a cost on developers, because when they are trying to search relevant bugs, old bugs can make it harder to find the things they really should be working on. In the best case, this slows them down; in the worst case, it drives them to use other tools to track the work they want to do – making the BTS next to useless. This violates rule #1 of a BTS: it must be useful for developers, or else it all falls apart.

So why did we choose to reduce these costs by closing bugs filed against the old codebase as NEEDINFO (and asking people to reopen if they were still relevant) instead of re-testing and re-triaging them one-by-one, as Jamie would have suggested? A few reasons:

  • number of triagers v. number of bugs: there were, at the time, around a half-dozen active bug volunteers, and thousands of pre-GNOME 2 bugs. It was simply unlikely that we’d ever be able to review all the old bugs even if we did nothing else.
  • focus on new bugs: new bugs are where triagers and developers are much more likely to be relevant – those bugs are against fresh code; the original filer is much more likely to respond to clarifying questions; etc. So all else being equal, time spent on new bugs was going to be much better for the software than time spent on old bugs.
  • steady flow of new bugs: if you’ve got a small number of new bugs coming in, perhaps you split your time – but we had no shortage of new bugs, nor of motivated bug reporters. So we may have paid some cost (by demotivating some reporters) but our scarce resource (developers) greatly appreciated it.
  • relative burden: with thousands of open bugs from thousands of reporters, it made sense to ask old them to test their bug against the new code. Reviewing their old bugs was a small burden for each of them, once we distributed it.

So when isn’t it a good idea to close ask for more information about old bugs?

  • Great at keeping old bugs triaged/relevant: If you have a very small number of old bugs that haven’t been touched in a long time, then they aren’t putting much burden on developers.
  • Slow code turnover: If your development process is such that it is highly likely that old bugs are still relevant (e.g., core has remained mostly untouched for many years, or effective use of TDD has kept the number of accidental new bugs low) this might not be a good idea.
  • No triggering event: In GNOME, there was a big event, plus a new influx of triagers, that made it make sense to do radical change. I wouldn’t recommend this “just because” – it should go hand-in-hand with other large changes, like a major release or important policy changes that will make future triaging more effective.

Relatedly, the team practices mailing list has been discussing good practices for migrating bug tracking systems in the past few days, which has been interesting to follow. I don’t take a strong position on where Wikimedia’s bugzilla falls on this point – Mediawiki has a fairly stable core, and the volume of incoming bugs may make triage of old bugs more plausible. But everyone running a very large bugzilla for an active project should remember that this is a part of their toolkit.

  1. Both had help from others, but it was eventually my decision. []