Public licenses and data: So what to do instead?

I just explained why open and copyleft licensing, which work fairly well in the software context, might not be legally workable, or practically a good idea, around data. So what to do instead? tl;dr: say no to licenses, say yes to norms.

"Day 43-Sharing" by A. David Holloway, under CC BY 2.0.
Day 43-Sharing” by A. David Holloway, under CC BY 2.0.

Partial solutions

In this complex landscape, it should be no surprise that there are no perfect solutions. I’ll start with two behaviors that can help.

Education and lawyering: just say no

If you’re reading this post, odds are that, within your organization or community, you’re known as a data geek and might get pulled in when someone asks for a new data (or hardware, or culture) license. The best thing you can do is help explain why restrictive “public” licensing for data is a bad idea. To the extent there is a community of lawyers around open licensing, we also need to be comfortable saying “this is a bad idea”.

These blog posts, to some extent, are my mea culpa for not saying “no” during the drafting of ODbL. At that time, I thought that if only we worked hard enough, and were creative enough, we could make a data license that avoided the pitfalls others had identified. It was only years later that I finally realized there were systemic reasons why we were doomed, despite lots of hard work and thoughtful lawyering. These posts lay out why, so that in the future I can say no more efficiently. Feel free to borrow them when you also need to say no :)

Project structure: collaboration builds on itself

When thinking about what people actually want from open licenses, it is important to remember that how people collaborate is deeply impacted by factors of how your project is structured. (To put it another way, architecture is also law.) For example, many kernel contributors feel that the best reason to contribute your code to the Linux kernel is not because of the license, but because the high velocity of development means that your costs are much lower if you get your features upstream quickly. Similarly, if you can build a big community like Wikimedia’s around your data, the velocity of improvements is likely to reduce the desire to fork. Where possible, consider also offering services and collaboration spaces that encourage people to work in public, rather than providing the bare minimum necessary for your own use. Or more simply, spend money on community people, rather than lawyers! These kinds of tweaks can often have much more of an impact on free-riding and contribution than any license choice. Unfortunately, the details are often project specific – which makes it hard to talk about in a blog post! Especially one that is already too long.

Solving with norms

So if lawyers should advise against the use of data law, and structuring your project for collaboration might not apply to you, what then? Following Peter Desmet, Science Commons, and others, I think the right tool for building resilient, global communities of sharing (in data and elsewhere) is written norms, combined with a formal release of rights.

Norms are essentially optimistic statements of what should be done, rather than formal requirements of what must be done (with the enforcement power of the state behind them). There is an extensive literature, pioneered by Nobelist Elinor Ostrom, on how they are actually how a huge amount of humankind’s work gets done – despite the skepticism of economists and lawyers. Critically, they often work even without the enforcement power of the legal system. For example, academia’s anti-plagiarism norms (when buttressed by appropriate non-legal institutional supports) are fairly successful. While there are still plagiarism problems, they’re fairly comparable to the Linux kernel’s GPL-violation problems – even though, unlike GPL, there is no legal enforcement mechanisms!

Norms and licenses have similar benefits

In many key ways, norms are not actually significantly different than licenses. Norms and licenses both can help (or hurt) a community reach their goals by:

  • Educating newcomers about community expectations: Collaboration requires shared understanding of the behavior that will guide that collaboration. Written norms can create that shared expectation just as well as licenses, and often better, since they can be flexible and human-readable in ways legally-binding international documents can’t.
  • Serving as the basis for social pressure: For the vast majority of collaborative projects, praise, shame, and other social nudges, not legal threats, are the actual basis for collaboration. (If you need proof of this, consider the decades-long success of open source before any legal enforcement was attempted.) Again, norms can serve this role just as well or not better, since it is often desire to cooperate and a fear of shaming that are what actually drive collaboration.
  • Similar levels of enforcement: While you can’t use the legal system to enforce a norm, most people and organizations also don’t have the option to use the legal system to enforce licenses – it is too expensive, or too time consuming, or the violator is in another country, or one of many other reasons why the legal system might not be an option (especially in data!) So instead most projects result to tools like personal appeals or threats of publicity – tools that are still available with norms.
  • Working in practice (usually): As I mentioned above, basing collaboration on social norms, rather than legal tools, work all the time in real life. The idea that collaboration can’t occur without the threat of legal sanction is really a somewhat recent invention. (I could actually have listed this under differences – since, as Ostrom teaches us, legal mechanisms often fail where norms succeed, and I think that is the case in data too.)

Why are norms better?

Of course, if norms were merely “as good as” licenses in the ways I just listed, I probably wouldn’t recommend them. Here are some ways that they can be better, in ways that address some of the concerns I raised in my earlier posts in this series:

  • Global: While building global norms is not easy, social norms based on appeals to the very human desires for collaboration and partnership can be a lot more global than the current schemes for protecting database or hardware rights, which aren’t international. (You can try to fake internationalization through a license, but as I pointed out in earlier posts, that is likely to fail legally, and be ignored by exactly the largest partners who you most want to get on board.)
  • Flexible: Many of the practical problems with licenses in data space boil down to their inflexibility: if a license presumes something to be true, and it isn’t, you might not be able to do anything about it. Norms can be much more generous – well-intentioned re-users can creatively reinterpret the rules as necessary to get to a good outcome, without having to ask every contributor to change the license. (Copyright law in the US provides some flexibility through fair use, which has been critical in the development of the internet. The EU does not extend such flexibility to data, though member states can add some fair dealing provisions if they choose. In neither case are those exceptions global, so they can’t be relied on by collaborative projects that aim to be global in scope.)
  • Work against, not with, the permission culture: Lessig warned us early on about “permission culture” – the notion that we would always need to ask permission to do anything. Creative Commons was an attempt to fight it, but by being a legal obligation, rather than a normative statement, it made a key concession to the permission culture – that the legal system was the right terrain to have discussions about sharing. The digital world has pretty whole-heartedly rejected this conclusion, sharing freely and constantly. As a result, I suspect a system that appeals to ethical systems has a better chance of long-term sustainability, because it works with the “new” default behavior online rather than bringing in the heavy, and inflexible, hand of the law.

Why you still need a (permissive) license

Norms aren’t enough if the underlying legal system might allow an early contributor to later wield the law as a threat. That’s why the best practice in the data space is to use something like the Creative Commons public domain grant (CC-Zero) to set a clear, reliable, permissive baseline, and then use norms to add flexible requirements on top of that. This uses law to provide reliability and predictability, and then uses norms to address concerns about fairness, free-riding, and effectiveness. CC-Zero still isn’t perfect; most notably it has to try to be both a grant and a license to deal with different international rules around grants.

What next?

In this context, when I say “norms”, I mean not just the general term, but specifically written norms that can act as a reference point for community members. In the data space, some good examples are DPLA’s “CCO-BY” and the Canadensys biodiversity initiative. A more subtle form can be found buried in the terms for NIH’s Clinical Trials database. So, some potential next steps, depending on where your collaborative project is:

  • If your community has informal norms (“attribution good! sharing good!”) consider writing them down like the examples above. If you’re being pressed to adopt a license (hi, Wikidata!), consider writing down norms instead, and thinking creatively about how to name and shame those who violate those norms.
  • If you’re an organization that publishes licenses, consider using your drafting prowess to write some standard norms that encapsulate the same behaviors without the clunkiness of database (or hardware) law. (Open Data Commons made some moves in this direction circa 2010, and other groups could consider doing the same.)
  • If you’re an organization that keeps getting told that people won’t participate in your project because of your license, consider moving towards a more permissive license + a norm, or interpreting your license permissively and reinforcing it with norms.

Good luck! May your data be widely re-used and contributors be excited to join your project.