I am the CADT; and advice on NEEDINFOing old bugs en masse

[Attention conservation notice: probably not of interest to lawyers; this is about my previous life in software development.]

Bugsquad barnstar, under MPL 1.1

Someone recently mentioned JWZ’s old post on the CADT (Cascade of Attention Deficit Teecnagers) development model, and that finally has pushed me to say:

I am the CADT.

I did the bug closure that triggered Jamie’s rant, and I wrote the text he quotes in his blog post.1

Jamie got some things right, and some things wrong. The main thing he got right is that it is entirely possible to get into a cycle where instead of seriously trying to fix bugs, you just do a rewrite and cross your fingers that it fixes old bugs. And yes, this can particularly happen when you’re young and writing code for fun, where the joy of a from-scratch rewrite can overwhelm some of your other good senses. Jamie also got right that I communicated the issue pretty poorly. Consider this post a belated explanation (as well as a reference for the next time I see someone refer to CADT).

But that wasn’t what GNOME was doing when Jamie complained about it, and I doubt it is actually something that happens very often in any project large enough to have a large bug tracking system (BTS). So what were we doing?

First, as Brendan Eich has pointed out, sometimes a rewrite really is a good idea. GNOME 2 was such a rewrite – not only was a lot of the old code a hairy mess, we decided (correctly) to radically revise the old UI. So in that sense, the rewrite was not a “CADT” decision – the core bugs being fixed were the kinds of bugs that could only be fixed with massive, non-incremental change, rather than “hey, we got bored with the old code”. (Immediately afterwards, GNOME switched to time-based releases, and stuck to that schedule for the better part of a decade, which should be further proof we weren’t cascading.)

This meant there were several thousand old bugs that had been filed against UIs that no longer existed, and often against code that no longer existed or had been radically rewritten. So you’ve got new code and old bugs. What do you do with the old bugs?

It is important to know that open bugs in a BTS are not free. Old bugs impose a cost on developers, because when they are trying to search relevant bugs, old bugs can make it harder to find the things they really should be working on. In the best case, this slows them down; in the worst case, it drives them to use other tools to track the work they want to do – making the BTS next to useless. This violates rule #1 of a BTS: it must be useful for developers, or else it all falls apart.

So why did we choose to reduce these costs by closing bugs filed against the old codebase as NEEDINFO (and asking people to reopen if they were still relevant) instead of re-testing and re-triaging them one-by-one, as Jamie would have suggested? A few reasons:

  • number of triagers v. number of bugs: there were, at the time, around a half-dozen active bug volunteers, and thousands of pre-GNOME 2 bugs. It was simply unlikely that we’d ever be able to review all the old bugs even if we did nothing else.
  • focus on new bugs: new bugs are where triagers and developers are much more likely to be relevant – those bugs are against fresh code; the original filer is much more likely to respond to clarifying questions; etc. So all else being equal, time spent on new bugs was going to be much better for the software than time spent on old bugs.
  • steady flow of new bugs: if you’ve got a small number of new bugs coming in, perhaps you split your time – but we had no shortage of new bugs, nor of motivated bug reporters. So we may have paid some cost (by demotivating some reporters) but our scarce resource (developers) greatly appreciated it.
  • relative burden: with thousands of open bugs from thousands of reporters, it made sense to ask old them to test their bug against the new code. Reviewing their old bugs was a small burden for each of them, once we distributed it.

So when isn’t it a good idea to close ask for more information about old bugs?

  • Great at keeping old bugs triaged/relevant: If you have a very small number of old bugs that haven’t been touched in a long time, then they aren’t putting much burden on developers.
  • Slow code turnover: If your development process is such that it is highly likely that old bugs are still relevant (e.g., core has remained mostly untouched for many years, or effective use of TDD has kept the number of accidental new bugs low) this might not be a good idea.
  • No triggering event: In GNOME, there was a big event, plus a new influx of triagers, that made it make sense to do radical change. I wouldn’t recommend this “just because” – it should go hand-in-hand with other large changes, like a major release or important policy changes that will make future triaging more effective.

Relatedly, the team practices mailing list has been discussing good practices for migrating bug tracking systems in the past few days, which has been interesting to follow. I don’t take a strong position on where Wikimedia’s bugzilla falls on this point – Mediawiki has a fairly stable core, and the volume of incoming bugs may make triage of old bugs more plausible. But everyone running a very large bugzilla for an active project should remember that this is a part of their toolkit.

  1. Both had help from others, but it was eventually my decision. []

16 thoughts on “I am the CADT; and advice on NEEDINFOing old bugs en masse”

  1. It seems to me that you have addressed the “to rewrite or not to rewrite, and what to do with old bugs when you do” question. But one could perhaps see the criticism from another angle, namely “the CADT are off considering the next rewrite while new bugs grow old”. That is, I don’t think the frustration stems from receiving a “please check that the bug still exists”-mail, but rather that nothing have happened to the bug for years and that nothing probably will even if one reopens it.

  2. Not to mention that many of the times, at least for myself personally, people don’t sit around waiting years for a bug to get acked/fixed. As users they learn not to use that software or feature because it doesn’t work. So then to come around many months later with a “is this still a problem for you?” The answer is usually, “Why no, because I’ve worked around the issue by not using it.”

    When this happens enough times, it gets to the point where the user is conditioned to not create bug reports at all. We see a bug with a feature/software and we simply work around it by not using it, and disregard futile bug reporting. We hear “it’s not a bug if it’s not reported”, however the converse of that isn’t necessarily true either.

  3. Yeah, I guess I should have made more clear that this is a drastic step – ideally you stay on top of old stuff from the moment that they are filed. But many large projects go through periods where bug filing far outpaces the resources available to do triage, leaving stretches of filed bugs that aren’t well-maintained. The question is, once that happens, what do you do about it? It isn’t helpful to say “you should have dealt with them when they came in”; you have to have some strategy to deal with them in the here-and-now. And this is one of the options for that.

  4. Do you really mean to say “bug filing far outpaces the resources available to do triage”? That is, the project do not have resources to go through every bug report and prioritize, categorize, de-dublicate, close-as-invalid, ask for clarification (and auto-close after 2 weeks if not answered), put on “good first bug”-lists? Here is a post about a team going though about 500 old bugs in a day: https://blog.mozilla.org/joe/2011/10/14/results-of-the-inaugural-bugkill-day/ (it is old, but it was just a random one at the top of my search)

    But since all the bug reports is about a bug in some of the code that have been written (or half of them, if the other half is enhancement requests) and you don’t have time to even go through and look at the bug reports, how can you believe that a rewrite of all of the code is an option?

  5. I think the problem (at least with MediaWiki & Co., can’t speak of Gnome) is more this triage-think: People file bugs and form a heap of them; then someone categorizes this heap and pushes it somewhere else; and maybe it is “assigned” to a developer to fix that (or not). Once you reach this stage (in any form of project management), you end up with developers either not being aware of the open issues and not consciously, but at most accidently fixing them, or someone mass-WONTFIXing when the date of evaluation by their managers rolls in.

    In a productive environment, the developers themselves “triage” the bugs *when* *they* *are* *filed*, and noone dares to direct them to develop new stuff (= add new bugs) while the existing bugs have not been addressed (= fixed immediately or annotated with a plan how to fix it).

    And that’s also important for how users perceive the way they are treated: If they don’t get an immediate response (as little as “I can/can’t reproduce that”), but someone mass-kills their reports on some “bugzapping day” it gives them a feeling that they produced some sort of trash that needed to be taken care of.

Comments are closed.