Blog

Come work with me – developer edition!

It has been a long time since I was able to say to developer friends “come work with me” in anything but the most abstract “come work under the same roof” kind of sense. But today I can say to developers “come work with me” and really mean it. Which is fun :)

By Supercarwaar (Own work) [CC BY-SA 3.0], via Wikimedia Commons
By Supercarwaar, CC BY-SA 3.0
Details: Wikimedia’s new community tech team is hiring for a community tech developer and a team lead. This will be extremely community-intensive work, so if you enjoy and get energy from working with a community and helping them achieve their goals, this could be a great role for you. This team will work intensely with my department to ensure that we’re correctly identifying and prioritizing the needs of our most active editors. If that sounds like fun, get in touch :)

[And I realize that I’ve been bad and not posted here, so here’s my new job announce: “my department” is the Foundation’s new Community Engagement department, where we work to support healthy contributor communities and help WMF-community collaboration. It is a detour from law, but I’ve always said law was just a way to help people do their thing — so in that sense is the same thing I’ve always been doing. It has been an intense roller coaster of a first two months, and I look forward to much more of the same.]

Free-riding and copyleft in cultural commons like Flickr

Flickr recently started selling prints of Creative Commons Attribution-Share Alike photos without sharing any of the revenue with the original photographers. When people were surprised, Flickr said “if you don’t want commercial use, switch the photo to CC non-commercial”.

This seems to have mostly caused two reactions:

  1. This is horrible! Creative Commons is horrible!”
  2. “Commercial reuse is explicitly part of the license; I don’t understand the anger.”

I think it makes sense to examine some of the assumptions those users (and many license authors) may have had, and what that tells us about license choice and design going forward.

Free ride!!, by https://www.flickr.com/photos/dhinakaran/
Free ride!!, by Dhinakaran Gajavarathan, under CC BY 2.0

Free riding is why we share-alike…

As I’ve explained before here, a major reason why people choose copyleft/share-alike licenses is to prevent free rider problems: they are OK with you using their thing, but they want the license to nudge (or push) you in the direction of sharing back/collaborating with them in the future. To quote Elinor Ostrom, who won a Nobel for her research on how commons are managed in the wild, “[i]n all recorded, long surviving, self-organized resource governance regimes, participants invest resources in monitoring the actions of each other so as to reduce the probability of free riding.” (emphasis added)

… but share-alike is not always enough

Copyleft is one of our mechanisms for this in our commons, but it isn’t enough. I think experience in free/open/libre software shows that free rider problems are best prevented when three conditions are present:

  • The work being created is genuinely collaborative — i.e., many authors who contribute similarly to the work. This reduces the cost of free riding to any one author. It also makes it more understandable/tolerable when a re-user fails to compensate specific authors, since there is so much practical difficulty for even a good-faith reuser to evaluate who should get paid and contact them.
  • There is a long-term cost to not contributing back to the parent project. In the case of Linux and many large software projects, this long-term cost is about maintenance and security: if you’re not working with upstream, you’re not going to get the benefit of new fixes, and will pay a cost in backporting security fixes.
  • The license triggers share-alike obligations for common use cases. The copyleft doesn’t need to perfectly capture all use cases. But if at least some high-profile use cases require sharing back, that helps discipline other users by making them think more carefully about their obligations (both legal and social/organizational).

Alternately, you may be able to avoid damage from free rider problems by taking the Apache/BSD approach: genuinely, deeply educating contributors, before they contribute, that they should only contribute if they are OK with a high level of free riding. It is hard to see how this can work in a situation like Flickr’s, because contributors don’t have extensive community contact.1

The most important takeaway from this list is that if you want to prevent free riding in a community-production project, the license can’t do all the work itself — other frictions that somewhat slow reuse should be present. (In fact, my first draft of this list didn’t mention the license at all — just the first two points.)

Flickr is practically designed for free riding

Flickr fails on all the points I’ve listed above — it has no frictions that might discourage free riding.

  • The community doesn’t collaborate on the works. This makes the selling a deeply personal, “expensive” thing for any author who sees their photo for sale. It is very easy for each of them to find their specific materials being reused, and see a specific price being charged by Yahoo that they’d like to see a slice of.
  • There is no cost to re-users who don’t contribute back to the author—the photo will never develop security problems, or get less useful with time.
  • The share-alike doesn’t kick in for virtually any reuses, encouraging Yahoo to look at the relationship as a purely legal one, and encouraging them to forget about the other relationships they have with Flickr users.
  • There is no community education about the expectations for commercial use, so many people don’t fully understand the licenses they’re using.

So what does this mean?

This has already gone on too long, but a quick thought: what this suggests is that if you have a community dedicated to creating a cultural commons, it needs some features that discourage free riding — and critically, mere copyleft licensing might not be good enough, because of the nature of most production of commons of cultural works. In Flickr’s case, maybe this should simply have included not doing this, or making some sort of financial arrangement despite what was legally permissible; for other communities and other circumstances other solutions to the free-rider problem may make sense too.

And I think this argues for consideration of non-commercial licenses in some circumstances as well. This doesn’t make non-commercial licenses more palatable, but since commercial free riding is typically people’s biggest concern, and other tools may not be available, it is entirely possible it should be considered more seriously than free and open source software dogma might have you believe.

  1. It is open to discussion, I think, whether this works in Wikimedia Commons, and how it can be scaled as Commons grows. []

Understanding Wikimedia, or, the Heavy Metal Umlaut, one decade on

It has been nearly a full decade since Jon Udell’s classic screencast about Wikipedia’s article on the Heavy Metal Umlaut (current textJan. 2005). In this post, written for Paul Jones’ “living and working online” class, I’d like to use the last decade’s changes to the article to illustrate some points about the modern Wikipedia.1

Measuring change

At the end of 2004, the article had been edited 294 times. As we approach the end of 2014, it has now been edited 1,908 times by 1,174 editors.2

This graph shows the number of edits by year – the blue bar is the overall number of edits in each year; the dotted line is the overall length of the article (which has remained roughly constant since a large pruning of band examples in 2007).

Edits-by-year

 

The dropoff in edits is not unusual — it reflects both a mature article (there isn’t that much more you can write about metal umlauts!) and an overall slowing in edits in English Wikipedia (from a peak of about 300,000 edits/day in 2007 to about 150,000 edits/day now).3

The overall edit count — 2000 edits, 1000 editors — can be hard to get your head around, especially if you write for a living. Implications include:

  • Style is hard. Getting this many authors on the same page, stylistically, is extremely difficult, and it shows in inconsistencies small and large. If not for the deeply acculturated Encyclopedic Style we all have in our heads, I suspect it would be borderline impossible.
  • Most people are good, most of the time. Something like 3% of edits are “reverted”; i.e., about 97% of edits are positive steps forward in some way, shape, or form, even if imperfect. This is, I think, perhaps the single most amazing fact to come out of the Wikimedia experiment. (We reflect and protect this behavior in one of our guidelines, where we recommend that all editors Assume Good Faith.)

The name change, tools, and norms

In December 2008, the article lost the “heavy” from its name and became, simply, “metal umlaut” (explanation, aka “edit summary“, highlighted in yellow):

Name change

A few take aways:

  • Talk pages: The screencast explained one key tool for understanding a Wikipedia article – the page history. This edit summary makes reference to another key tool – the talk page. Every Wikipedia article has a talk page, where people can discuss the article, propose changes, etc.. In this case, this user discussed the change (in November) and then made the change in December. If you’re reporting on an article for some reason, make sure to dig into the talk page to fully understand what is going on.
  • Sources: The user justifies the name change by reference to sources. You’ll find little reference to them in 2005, but by 2008, finding an old source using a different term is now sufficient rationale to rename the entire page. Relatedly…
  • Footnotes: In 2008, there was talk of sources, but still no footnotes. (Compare the story about Motley Crue in Germany in 2005 and now.) The emphasis on foonotes (and the ubiquitous “citation needed”) was still a growing thing. In fact, when Jon did his screencast in January 2005, the standardized/much-parodied way of saying “citation needed” did not yet exist, and would not until June of that year! (It is now used in a quarter of a million English Wikipedia pages.) Of course, the requirement to add footnotes (and our baroque way of doing so) may also explain some of the decline in editing in the graphs above.

Images, risk aversion, and boldness

Another highly visible change is to the Motörhead art, which was removed in November 2011 and replaced with a Mötley Crüe image in September 2013. The addition and removal present quite a contrast. The removal is explained like this:

remove File:Motorhead.jpg; no fair use rationale provided on the image description page as described at WP:NFCC content criteria 10c

This is clear as mud, combining legal issues (“no fair use rationale”) with Wikipedian jargon (“WP:NFCC content criteria 10c”). To translate it: the editor felt that the “non-free content” rules (abbreviated WP:NFCC) prohibited copyright content unless there was a strong explanation of why the content might be permitted under fair use.

This is both great, and sad: as a lawyer, I’m very happy that the community is pre-emptively trying to Do The Right Thing and take down content that could cause problems in the future. At the same time, it is sad that the editors involved did not try to provide the missing fair use rationale themselves. Worse, a rationale was added to the image shortly thereafter, but the image was never added back to the article.

So where did the new image come from? Simply:

boldly adding image to lead

“boldly” here links to another core guideline: “be bold”. Because we can always undo mistakes, as the original screencast showed about spam, it is best, on balance, to move forward quickly. This is in stark contrast to traditional publishing, which has to live with printed mistakes for a long time and so places heavy emphasis on Getting It Right The First Time.

In brief

There are a few other changes worth pointing out, even in a necessarily brief summary like this one.

  • Wikipedia as a reference: At one point, in discussing whether or not to use the phrase “heavy metal umlaut” instead of “metal umlaut”, an editor makes the point that Google has many search results for “heavy metal umlaut”, and another editor points out that all of those search results refer to Wikipedia. In other words, unlike in 2005, Wikipedia is now so popular, and so widely referenced, that editors must be careful not to (indirectly) be citing Wikipedia itself as the source of a fact. This is a good problem to have—but a challenge for careful authors nevertheless.
  • Bots: Careful readers of the revision history will note edits by “ClueBot NG“. Vandalism of the sort noted by Jon Udell has not gone away, but it now is often removed even faster with the aid of software tools developed by volunteers. This is part of a general trend towards software-assisted editing of the encyclopedia.NoSwagForYou
  • Translations: The left hand side of the article shows that it is in something like 14 languages, including a few that use umlauts unironically. This is not useful for this article, but for more important topics, it is always interesting to compare the perspective of authors in different languages.Languages

Other thoughts?

I look forward to discussing all of these with the class, and to any suggestions from more experienced Wikipedians for other lessons from this article that could be showcased, either in the class or (if I ever get to it) in a one-decade anniversary screencast. :)

  1. I still haven’t found a decent screencasting tool that I like, so I won’t do proper homage to the original—sorry Jon! []
  2. Numbers courtesy X’s edit counter. []
  3. It is important, when looking at Wikipedia statistics, to distinguish between stats about Wikipedia in English, and Wikipedia globally — numbers and trends will differ vastly between the two. []

My Wikimania 2014 talks

Primarily what I did during Wikimania was chew on pens.

Discussing Fluid Lobbying at Wikimania 2014, by  Sebastiaan ter Burg, under CC BY 2.0
Discussing Fluid Lobbying at Wikimania 2014, by Sebastiaan ter Burg, under CC BY 2.0

However, I also gave some talks.

The first one was on Creative Commons 4.0, with Kat Walsh. While targeted at Wikimedians, this may be of interest to others who want to learn about CC 4.0 as well.

Second one was on Open Source Hygiene, with Stephen LaPorte. This one is again Wikimedia-specific (and I’m afraid less useful without the speaker notes) but may be of interest to open source developers more generally.

The final one was on sharing; video is below (and I’ll share the slides once I figure out how best to embed the notes, which are pretty key to understanding the slides):

Wikimania 2014 Notes – very miscellaneous

A collection of semi-random notes from Wikimania London, published very late:

Gruppenfoto Wikimania 2014 London, by Ralf Roletschek, under CC BY-SA 3.0 Austria

The conference generally

  • Tone: Overall tone of the conference was very positive. It is possibly just small sample size—any one person can only talk to a small number of the few thousand at the conference—but seemed more upbeat/positive than last year.
  • Tone, 2: The one recurring negative theme was concern about community tone, from many angles, including Jimmy. I’m very curious to see how that plays out. I agree, of course, and will do my part, both at WMF and when I’m editing. But that sort of social/cultural change is very hard.
  • Speaker diversity: Heard a few complaints about gender balance and other diversity issues in the speaker lineup, and saw a lot of the same (wonderful!) faces as last year. I’m wondering if there are procedural changes (like maybe blind submissions, or other things from this list) might bring some new blood and improve diversity.
  • “Outsiders”: The conference seemed to have better representation than last year from “outside” our core community. In particular, it was great for me to see huge swathes of the open content/open access movements represented, as well as other free software projects like Mozilla. We should be a movement that works well with others, and Wikimania can/should be a key part of that, so this was a big plus for me.
  • Types of talks: It would be interesting to see what the balance was of talks (and submissions) between “us learning about the world” (e.g., me talking about CC), “us learning about ourselves” (e.g., the self-research tracks), and “the world learning about us” (e.g., aimed at outsiders). Not sure there is any particular balance we should have between the three of them, but it might be revealing to see what the current balance is.
  • Less speaking, more conversing: Next year I will probably propose mostly (only?) panels and workshops, and I wonder if I can convince others to do the same. I can do a talk+slides and stream it at any time; what I can only do in person is have deeper, higher-bandwidth conversations.
  • Physical space and production values: The hackathon space was amazingly fun for me, though I got the sense not everyone agreed. The production values (and the rest of the space) for the conference were very good. I’m torn on whether or not the high production values are a plus for us, honestly. They raise the bar for participation (bad); make the whole event feel somewhat… un-community-ish(?); but they also make us much more accessible to people who aren’t yet ready for the full-on, super-intense Wikimedian Experience.

The conference for projects I work on

  • LCA: Legal/Community Affairs was pretty awesome on many fronts—our talks, our work behind the scenes, our dealing with both the expected and unexpected, etc. Deeply proud to be part of this dedicated, creative team. Also very appreciative for everyone who thanked us—it means a lot when we hear from people we’ve helped.
  • Maps: Great seeing so much interest in Open Street Map. Had a really enjoyable time at their 10th birthday meetup; was too bad I had to leave early. Now have a better understanding of some of the technical issues after a chat with Kolossos and Katie. Also had just plain fun geeking out about “hard choices” like map boundaries—I find how communities make decisions about problems like that fascinating.
  • Software licensing: My licensing talk with Stephen went well, but probably should have been structured as part of the hackathon rather than for more general audiences. Ultimately this will only work out if engineering (WMF and volunteer) is on board, and will work best if engineering leads. (The question asked by Mako afterwards has already led to patches, which is cool.)
  • Creative Commons: My CC talk with Kat went well, and got some good questions. Ultimately the rubber will meet the road when the translations are out and we start the discussion with the full community. Also great meeting User:Multichill; looking forward to working on license templates with him and May from design.
  • Metadata: The multimedia metadata+licensing work is going to be really challenging, but very interesting and ultimately very empowering for everyone who wants to work with the material on commons. Look forward to working with a large/growing number of people on this project.
  • Advocacy: Advocacy panel was challenging, in a good way. A variety of good, useful suggestions; but more than anything else, I took away that we should probably talk about how we talk when subjects are hard, and consensus may be difficult to reach. Examples would include when there is a short timeline for a letter, or when topics are deeply controversial for good, honest reasons.

The conference for me

  • Lesson (1): Learned a lesson: never schedule a meeting for the day after Wikimania. Odds of being productive are basically zero, though we did get at least some things done.
  • Lesson (2): I badly overbooked myself; it hurt my ability to enjoy the conference and meet everyone I wanted to meet. Next year I’ll try to be more focused in my commitments so I can benefit more from spontaneity, and get to see some slightly less day-job-related (but enjoyable or inspirational) talks/presentations.
  • Research: Love that there is so much good/interesting research going on, and do deeply think that it is important to understand it so that I can apply it to my work. Did not get to see very much of it, though :/
  • Arguing with love: As tweeted about by Phoebe, one of the highlights was a vigorous discussion (violent agreement :) with Mako over dinner about the four freedoms and how they relate to just/empowering software more broadly. Also started a good, vigorous discussion with SJ about communication and product quality, but we sadly never got to finish that.
  • Recharging: Just like GUADEC in my previous life, I find these exhausting but also ultimately exhilarating and recharging. Can’t wait to get to Mexico City!

Misc.

  • London: I really enjoy London—the mix of history and modernity is amazing. Bonus: I think the beer scene has really improved since the last time I was there.
  • Movies: I hardly ever watch movies anymore, even though I love them. Knocked out 10 movies in the 22 hours in flight. On the way to London:
    • Grand Hotel Budapest (the same movie as every other one of his movies, which is enjoyable)
    • Jodorowsky’s Dune (awesome if you’re into scifi)
    • Anchorman (finally)
    • Stranger than Fiction (enjoyed it, but Adaptation was better)
    • Captain America, Winter Soldier (not bad?)
  • On the way back:
    • All About Eve (finally – completely compelling)
    • Appleseed:Alpha (weird; the awful dialogue and wooden “faces” of computer animated actors clashed particularly badly with the clasically great dialogue and acting of All About Eve)
    • Mary Poppins (having just seen London; may explain my love of magico-realism?)
    • The Philadelphia Story (great cast, didn’t engage me otherwise)
    • Her (very good)

Slide embedding from Commons

A friend of a friend asked this morning:

I suggested Wikimedia Commons, but it turns out she wanted something like Slideshare’s embedding. So here’s a test of how that works (timely, since soon Wikimanians will be uploading dozens of slide decks!)

This is what happens when you use the default Commons “Use this file on the web -> HTML/BBCode” option on a slide deck pdf:

Wikimedia Legal overview 2014-03-19

Not the worst outcome – clicking gets you to a clickable deck. No controls inline in the embed, though. And importantly nothing to show that it is clickable :/

Compare with the same deck, uploaded to Slideshare:

Some work to be done if we want to encourage people to upload to Commons and share later.

Update: a commenter points me at viewer.js, which conveniently includes a wordpress plugin! The plugin is slightly busted (I had to move some files around to get it to work in my install) but here’s a demo:

Update2: bugs are fixed upstream and in an upcoming 0.5.2 release of the plugin. Hooray!

Designers and Creative Commons: Learning Through Wikipedia Redesigns

tl;dr: Wikipedia redesigns mostly ignore attribution of Wikipedia authors, and none approach the problem creatively. This probably says as much or more about Creative Commons as it does about the designers.

disclaimer-y thing: so far, this is for fun, not work; haven’t discussed it at the office and have no particular plans to. Yes, I have a weird idea of fun.

Refresh variant from interfacesketch.com.
A mild refresh from interfacesketch.com.

It is no longer surprising when a new day brings a new redesign of Wikipedia. After seeing one this weekend with no licensing information, I started going back through seventeen of them (most of the ones listed on-wiki) to see how (if at all) they dealt with licensing, attribution, and history. Here’s a summary of what I found.

Completely missing

Perhaps not surprisingly, many designers completely remove attribution (i.e., history) and licensing information in their designs. Seven of the seventeen redesigns I surveyed were in this camp. Some of them were in response to a particular, non-licensing-related challenge, so it may not be fair to lump them into this camp, but good designers still deal with real design constraints, and licensing is one of them.

History survives – sometimes

The history link is important, because it is how we honor the people who wrote the article, and comply with our attribution obligations. Five of the seventeen redesigns lacked any licensing information, but at least kept a history link.

Several of this group included some legal information, such as links to the privacy policy, or in one case, to the Wikimedia Foundation trademark page. This suggests that our current licensing information may be presented in a worse way than some of our other legal information, since it seems to be getting cut out even by designers who are tolerant of some of our other legalese?

Same old, same old

Four of the seventeen designs keep the same old legalese, though one fails to comply by making it impossible to get to the attribution (history) page. Nothing wrong with keeping the existing language, but it could reflect a sad conclusion that licensing information isn’t worth the attention of designers; or (more generously) that they don’t understand the meaning/utility of the language, so it just gets cargo-culted around. (Credit to Hamza Erdoglu , who was the only mockup designer who specifically went out of his way to show the page footer in one of his mockups.)

A winner, sort of!

Of the seventeen sites I looked at, exactly one did something different: Wikiwand. It is pretty minimal, but it is something. The one thing: as part of the redesign, it adds a big header/splash image to the page, and then adds a new credit specifically for the author of the header/splash image down at the bottom of the page with the standard licensing information. Arguably it isn’t that creative, just complying with their obligations from adding a new image, but it’s at least a sign that not everyone is asleep at the wheel.

Observations

This is surely not a large or representative sample, so all my observations from this exercise should be taken with a grain of salt. (They’re also speculative since I haven’t talked to the designers.) That said, some thoughts besides the ones above:

  • Virtually all of the designers who wrote about why they did the redesign mentioned our public-edit-nature as one of their motivators. Given that, I expected history to be more frequently/consistently addressed. Not clear whether this should be chalked up to designers not caring about attribution, or the attribution role of history being very unclear to anyone who isn’t an expect. I suspect the latter.
  • It was evident that some of these designers had spent a great deal of time thinking about the site, and yet were unaware of licensing/attribution. This suggests that people who spend less time with the site (i.e., 99.9% of readers) are going to be even more ignorant.
  • None of the designers felt attribution and licensing was even important enough to experiment on or mention in their writeups. As I said above, this is understandable but sort of sad, and I wonder how to change it.

Postscript, added next morning:

I think it’s important to stress that I didn’t link to the individual sites here, because I don’t want to call out particular designers or focus on their failures/oversights. The important (and as I said, sad) thing to me is that designers are, historically, a culture concerned with licensing and attribution. If we can’t interest them in applying their design talents to our problem, in the context of the world’s most famously collaborative project, we (lawyers and other Commoners) need to look hard at what we’re doing, and how we can educate and engage designers to be on our side.

I should also add that the WMF design team has been a real pleasure to work with on this problem, and I look forward to doing more of it. Some stuff still hasn’t made it off the drawing board, but they’re engaged and interested in this challenge. Here is one example.

Democracy and Software Freedom

As part of a broader discussion of democracy as the basis for a just socio-economic system, Séverine Deneulin summarizes Robert Dahl’s Democracy, which says democracy requires five qualities:

First, democracy requires effective participation. Before a policy is adopted, all members must have equal and effective opportunities for making their views known to others as to what the policy should be.

Second, it is based on voting equality. When the moment arrives for the final policy decision to be made, every member should have an equal and effective opportunity to vote, and all votes should be counted as equal.

Third, it rests on ‘enlightened understanding’. Within reasonable limits, each member should have equal and effective opportunities for learning about alternative policies and their likely consequences.

Fourth, each member should have control of the agenda, that is, members should have the exclusive opportunity to decide upon the agenda and change it.

Fifth, democratic decision-making should include all adults. All (or at least most) adult permanent residents should have the full rights of citizens that are implied by the first four criteria.

From An Introduction to the Human Development and Capability Approach“, Ch. 8 – “Democracy and Political Participation”.

Poll worker explains voting process in southern Sudan referendum” by USAID Africa Bureau via Wikimedia Commons.

It is striking that, despite talking a lot about freedom, and often being interested in the question of who controls power, these five criteria might as well be (Athenian) Greek to most free software communities and participants- the question of liberty begins and ends with source code, and has nothing to say about organizational structure and decision-making – critical questions serious philosophers always address.

Our licensing, of course, means that in theory points #4 and #5 are satisfied, but saying “you can submit a patch” is, for most people, roughly as satisfying as saying “you could buy a TV ad” to an American voter concerned about the impact of wealth on our elections. Yes, we all have the theoretical option to buy a TV ad/edit our code, but for most voters/users of software that option will always remain theoretical. We’re probably even further from satisfying #1, #2, and #3 in most projects, though one could see the Ada Initiative and GNOME OPW as attempts to deal with some aspects of #1, #3, and #4

This is not to say that voting is the right way to make decisions about software development, but simply to ask: if we don’t have these checks in place, what are we doing instead? And are those alternatives good enough for us to have certainty that we’re actually enhancing freedom?

I am the CADT; and advice on NEEDINFOing old bugs en masse

[Attention conservation notice: probably not of interest to lawyers; this is about my previous life in software development.]

<a href="https://commons.wikimedia.org/wiki/File:MW_Bug_Squad_Barnstar.svg">Bugsquad barnstar, under MPL 1.1</a>
Bugsquad barnstar, under MPL 1.1

Someone recently mentioned JWZ’s old post on the CADT (Cascade of Attention Deficit Teecnagers) development model, and that finally has pushed me to say:

I am the CADT.

I did the bug closure that triggered Jamie’s rant, and I wrote the text he quotes in his blog post.1

Jamie got some things right, and some things wrong. The main thing he got right is that it is entirely possible to get into a cycle where instead of seriously trying to fix bugs, you just do a rewrite and cross your fingers that it fixes old bugs. And yes, this can particularly happen when you’re young and writing code for fun, where the joy of a from-scratch rewrite can overwhelm some of your other good senses. Jamie also got right that I communicated the issue pretty poorly. Consider this post a belated explanation (as well as a reference for the next time I see someone refer to CADT).

But that wasn’t what GNOME was doing when Jamie complained about it, and I doubt it is actually something that happens very often in any project large enough to have a large bug tracking system (BTS). So what were we doing?

First, as Brendan Eich has pointed out, sometimes a rewrite really is a good idea. GNOME 2 was such a rewrite – not only was a lot of the old code a hairy mess, we decided (correctly) to radically revise the old UI. So in that sense, the rewrite was not a “CADT” decision – the core bugs being fixed were the kinds of bugs that could only be fixed with massive, non-incremental change, rather than “hey, we got bored with the old code”. (Immediately afterwards, GNOME switched to time-based releases, and stuck to that schedule for the better part of a decade, which should be further proof we weren’t cascading.)

This meant there were several thousand old bugs that had been filed against UIs that no longer existed, and often against code that no longer existed or had been radically rewritten. So you’ve got new code and old bugs. What do you do with the old bugs?

It is important to know that open bugs in a BTS are not free. Old bugs impose a cost on developers, because when they are trying to search relevant bugs, old bugs can make it harder to find the things they really should be working on. In the best case, this slows them down; in the worst case, it drives them to use other tools to track the work they want to do – making the BTS next to useless. This violates rule #1 of a BTS: it must be useful for developers, or else it all falls apart.

So why did we choose to reduce these costs by closing bugs filed against the old codebase as NEEDINFO (and asking people to reopen if they were still relevant) instead of re-testing and re-triaging them one-by-one, as Jamie would have suggested? A few reasons:

  • number of triagers v. number of bugs: there were, at the time, around a half-dozen active bug volunteers, and thousands of pre-GNOME 2 bugs. It was simply unlikely that we’d ever be able to review all the old bugs even if we did nothing else.
  • focus on new bugs: new bugs are where triagers and developers are much more likely to be relevant – those bugs are against fresh code; the original filer is much more likely to respond to clarifying questions; etc. So all else being equal, time spent on new bugs was going to be much better for the software than time spent on old bugs.
  • steady flow of new bugs: if you’ve got a small number of new bugs coming in, perhaps you split your time – but we had no shortage of new bugs, nor of motivated bug reporters. So we may have paid some cost (by demotivating some reporters) but our scarce resource (developers) greatly appreciated it.
  • relative burden: with thousands of open bugs from thousands of reporters, it made sense to ask old them to test their bug against the new code. Reviewing their old bugs was a small burden for each of them, once we distributed it.

So when isn’t it a good idea to close ask for more information about old bugs?

  • Great at keeping old bugs triaged/relevant: If you have a very small number of old bugs that haven’t been touched in a long time, then they aren’t putting much burden on developers.
  • Slow code turnover: If your development process is such that it is highly likely that old bugs are still relevant (e.g., core has remained mostly untouched for many years, or effective use of TDD has kept the number of accidental new bugs low) this might not be a good idea.
  • No triggering event: In GNOME, there was a big event, plus a new influx of triagers, that made it make sense to do radical change. I wouldn’t recommend this “just because” – it should go hand-in-hand with other large changes, like a major release or important policy changes that will make future triaging more effective.

Relatedly, the team practices mailing list has been discussing good practices for migrating bug tracking systems in the past few days, which has been interesting to follow. I don’t take a strong position on where Wikimedia’s bugzilla falls on this point – Mediawiki has a fairly stable core, and the volume of incoming bugs may make triage of old bugs more plausible. But everyone running a very large bugzilla for an active project should remember that this is a part of their toolkit.

  1. Both had help from others, but it was eventually my decision. []

Summarizing “hacker legal education” crisply and cleanly

James Grimmelman is a better writer than I am. I already knew this, but in this commentary on Biella Coleman’s (excellent) Coding Freedom, he captures something I have struggled to express for years in two crisp, clean sentences:

Hacker legal education, with its roots in programming, is strong on formal precision and textual exegesis. But it is notably light on legal realism: coping with the open texture of the law and sorting persuasive from ineffective arguments.

This distinction is worth keeping in mind, for both sides of the professional/amateur legal discussion, to understand the relative strengths and weaknesses of their training and experience.

(Note that James says this, and I quote it, with all due love and respect, since we were both programmers before we were lawyers.)