Copyleft and data: database law as (poor) platform

tl;dr: Databases are a very poor fit for any licensing scheme, like copyleft, that (1) is intended to encourage use by the entire world but also (2) wants to place requirements on that use. This is because of broken legal systems and the way data is used. Projects considering copyleft, or even mere attribution, for data, should consider other approaches instead.

Hollerith Census Machine Dials, by Marcin Wichary, under CC BY 2.0
The original database: Hollerith Census Machine Dials, by Marcin Wichary, under CC BY 2.0.

I’ve been a user of copyleft/share-alike licenses for a long time, and even helped draft several of them, but I’ve come around to the point of view that copyleft is a poor fit for data. Unfortunately, I’ve been explaining this a lot lately, so I want to explain why in writing. This first post will focus on how the legal system around databases is broken. Later posts will focus on how databases are hard to license, and what we might do about it.

FOSS licensing, and particularly copyleft, relies on legal features database rights lack

Defenders of copyleft often have to point out that copyleft isn’t necessarily anti-copyright, because copyleft depends on copyright. This is true, of course, but the more I think about databases and open licensing, the more I think “copyleft depends on copyright” almost understates the case – global copyleft depends not just on “copyright”, but on very specific features of the international copyright system which database law lacks.

To put it in software terms, the underlying legal platform lacks the features necessary to reliably implement copyleft.

Consider some differences between the copyright system and database law:

  • Maturity: Copyright has had 100 or so years as an international system to work out kinks like “what is a work” or “how do joint authors share rights?” Even software copyright law has existed for about 40 years. In contrast, database law in practice has existed for less  than 20 years, pretty much all of that in Europe, and I can count all the high court rulings on it on my fingers and toes. So key terms, like “substantial”, are pretty hard to define-courts and legislatures simply haven’t defined, or refined, the key concepts. This makes it very hard to write a general-purpose public license whose outcomes are predictable.

  • Stability: Related to the previous point, copyright tends to change incrementally, as long-standing concepts are slowly adapted to new circumstances. (The gradual broadening of fair use in the Google era is a good example of this.) In contrast, since there are so few decisions, basically every decision about database law leads to upheaval. Open Source licenses tend to have a shelf-life of about ten years; good luck writing a database license that means the same thing in ten years as it does today!

  • Global nature: Want to share copyrighted works with the entire world? Copyright (through the Berne Convention) has you covered. Want to share a database? Well, you can easily give it away to the whole world (probably!), but want to reliably put any conditions on that sharing? Good luck! You’ve now got to write a single contract that is enforceable in every jurisdiction, plus a license that works in the EU, Japan, South Korea, and Mexico. As an example again, “substantial” – used in both ODbL and CC 4.0 – is a term from the EU’s Database Directive, so good luck figuring out what it means in a contract in the US or within the context of Japan’s database law.

  • Default rights: Eben Moglen has often pointed out that anyone who attacks the GPL is at a disadvantage, because if they somehow show that the license is legally invalid, then they get copyright’s “default”: which is to say, they don’t get anything. So they are forced to fight about the specific terms, rather than the validity of the license as a whole. In contrast, in much of the world (and certainly in the US), if you show that a database license is legally invalid, then you get database’s default: which is to say, you get everything. So someone who doesn’t want to follow the copyleft has very, very strong incentives to demolish your license altogether. (Unless, of course, the entire system shifts from underneath you to create a stronger default – like it may have in the EU with the Ryanair case.)

With all these differences, what starts off as hard (“write a general-purpose, public-facing license that requires sharing”) becomes insanely difficult in the database context. Key goals of a general-purpose, public license – global, predictable, reliable – are very hard to do.

In  upcoming posts, I’ll try to explain why, even if it were possible to write such a license from a legal perspective, it might not be a good idea because of how databases are used.