Copyleft, attribution, and data: other considerations

Public licenses for databases don’t work well. Before going into solutions to that problem, though, I wanted to talk briefly about some things that are important to consider when thinking about solutions: real-world examples of the problems; a common, but bad, solution; and a discussion of the motivations behind public licenses.

2013-bullfrog-map-unavailable
Bullfrog map unavailable“, by Peter Desmets, under CC BY 3.0 unported

Real-world concerns, not just theoretical

When looking at solutions, it is important to understand that the practical concerns I blogged about aren’t just theoretical — they matter in practice too. For example, Peter Desmet has done a great job showing how overreaching licenses make bullfrog maps (and other data combinations) illegal. Alex Barth of OpenStreetMap has also discussed how ODbL creates problems for OSM users (though he got some Wikipedia-related facts wrong). And I’ve spoken to very well-intentioned organizations (including thoughtful, impactful non-profits) scared off from OSM for similar reasons.

On the flip side, because these rules are based on such flimsy legal grounds, sophisticated corporate legal departments often feel comfortable circumventing
the requirements by exploiting loopholes. (Needless to say, they don’t blog about the problems with the licenses – they just go ahead and use the loopholes.) So overreaching attempts to create new rights are, in many ways, the worst of both worlds: they hurt well-intentioned cooperation, and don’t dissuade parties with a significant interest in exploiting the commons.

What not to do: create new “rights”

When thinking about solutions, it is unfortunately also important to say what isn’t a good idea: create new rights, or override limitations on old ones. The Free Software Foundation, to their great credit, has always consistently said that if weakening copyright also weakens the GPL, they’ll take that tradeoff; and that vice-versa, the GPL should not ask for rights that go beyond copyright law. The most recent copyleft licenses from Creative Commons, Mozilla, and the FSF all make this explicit: limitations on copyright, like fair use, are not trumped by our licenses.

Unfortunately, many people have a good-faith desire to see copyleft-like results in other domains. As a result, they’ve gone the wrong way on this point. ODbL is probably the most blatant example of this: even at the time, Science Commons correctly pointed out that ODbL’s attempt to create database rights by contract outside of the EU was a bad idea. Unfortunately, well-intentioned people (including me!) pushed it through anyway. Similarly, open hardware proponents have tried to stretch copyright to cover functional works, with predictably messy results.

This is not just practically wrong, for the reasons I’ve explained in earlier posts. It is also ethically wrong for those of us who want to see more data sharing, because any “rights” we create by fiat are going to end up being used primarily to stop sharing, not encourage it.

Remembering why we do share-alike and attribution

Consider this section a brief sketch for a future post – if I forgot something
big, please let me know, but please don’t roast me in comments for being brief
or reductive about your favorite motivation.

It is important when writing about public licenses to remember why the idea of
placing restrictions on re-use is so intuitively appealing outside of software.
If we don’t understand why people want to do less-than-public domain, it’s hard
to come up with solutions that actually work. Motivations tend to be some
combination (varying from person to person and community to community) of:

  • Recognition: Many people want to at least be recognized for their work, even when they ask for nothing else. (When Creative Commons assessed usage after their 1.0 licenses, [97-98% of people chose attribution](https://creativecommons.org/2004/05/25/announcingandexplainingournew20licenses/).) This sentiment underlies many otherwise “permissive” licenses, as well as academic norms around plagiarism and attribution.
  • Reducing free riding: Lots of people are afraid that commons can be destroyed by people who use the resource without giving back. Historically, this “tragedy of the commons” was about [rivalrous](https://en.wikipedia.org/wiki/Rivalry_(economics)) goods (like fisheries), but the same concern is often raised in the context of collaborative communities, whose labor can be rivalrous even when their goods are non-rivalrous. Some people like share-alike requirements because, pragmatically, they feel such requirements are one way to prevent (or at least reduce) this risk by encouraging people to either participate fully or not participate at all. (If you’re interested in this point, I’ve [written about it before](http://lu.is/blog/2014/12/02/free-riding-and-copyleft-in-cultural-commons-like-flickr/).)
  • “Fairness”: Many people like share-alike out of a deep moral sense that if you take, you should also give back. This often looks the same as the previous point, but with the key distinction that at least some people focused on fairness care more about process and less about outcomes: a smaller, less productive community with more sharing may, for them, be better than a larger, more productive community where not everyone shares perfectly.
  • Access to allow self-help: Another variation on the previous two points is a use of copyleft that focuses less on “is the author helping me by cooperating” and more on “did the author give me materials I can then use to help myself”. In this view, increased access to raw material (like source code, or data) can be good even the authors are non-cooperative. (To those familiar with the Linux kernel discussions, this is essentially “I got a lousy driver, and the authors hate me, but at least I got *a* driver”.)
  • Ethical: Many people simply think data/source should never be proprietary, and so will use any means possible, like copyleft, to increase the amount of non-proprietary code in the world.

All of these motivations can be more or less valid at different points in time, in ways that (again) deserve a different post. (For example, automatic attribution may not have the same impact as “human” attribution, which may not be a surprise given the evidence on crowding out of intrinsic motivations.)

Finally, next (and final?) post: what solutions we’ve got.