I just explained why open and copyleft licensing, which work fairly well in the software context, might not be legally workable, or practically a good idea, around data. So what to do instead? tl;dr: say no to licenses, say yes to norms.
Public licenses for databases don’t work well. Before going into solutions to that problem, though, I wanted to talk briefly about some things that are important to consider when thinking about solutions: real-world examples of the problems; a common, but bad, solution; and a discussion of the motivations behind public licenses.
tl;dr: Open licensing works when you strike a healthy balance between obligations and reuse. Data, and how it is used, is different from software in ways that change that balance, making reasonable compromises in software (like attribution) suddenly become insanely difficult barriers.
Continue reading “Copyleft and data: databases as poor subject”
There is a small genre of posts around re-inventing the interfaces of popular open source software; I thought I’d collect some of them for future reference:
- Firefox: 0.1 release notes
- Visual Editor: first blog post I can find (though I’m sure there are many emails), Economist
The first two (Drupal, WordPress) are particularly strong examples of the genre because they directly grapple with the difficulty of change for open source projects. I’m sure that early Firefox and VE discussions also did that, but I can’t find them easily – pointers welcome.
Other suggestions welcome in comments.
- Dancing: After five Wikimedia events (not counting WMF all-hands) I was finally dragged onto the dance floor on the last night. I’ll never be Garfield, but I had fun anyway. The amazing setting did not hurt.
- Our hosts: The conference was excellently organized and run. I’ve never had Mexico City high on my list of “places I must see” but it moved up many spots after this trip.
- First timers: I always enjoy talking to people who have never been to Wikimania before. They almost always seem to have enjoyed it, but of course the ones I talk to are typically the ones who are more outgoing and better equipped to enjoy things. I do hope we’re also being welcome to people who don’t already know folks, or who aren’t as outgoing.
- Luis von Ahn: Good to chat briefly with my long-ago classmate. I thought the Q&A section of his talk was one of the best I’ve seen in a long time. There were both good questions and interesting answers, which is more rare than it should be.
— Moiz Syed (@MoizSyed) July 18, 2015
- Keynotes: I’d love to have one keynote slot each year for a contributor to talk about their work within the movement. Finding the right person would be a challenge, of course, as could language barriers, but it seems like it should be doable.
- US English: I was corrected on my Americanisms and the occasional complexity of my sentence structure. It was a good reminder that even for fairly sophisticated speakers of English as a second language, California-English is not terribly clear. This is especially true when spoken. Verbose slides can help, which is a shame, since I usually prefer minimal slides. I will try to work on that in the future, and see how we can help other WMFers do the same.
- Mobile: Really hope someday we can figure out how to make the schedule legible on a mobile device :) Good reminder we’ve got a long way to go there.
- Community engagement: I enjoyed my departments “engage with” session, but I think next year we need to make it more interactive—probably with something like an introduction/overview followed by a World Cafe-style discussion. One thing we did right was to take questions on written cards. This helped indicate what the most important topics were (when questions were repeated), avoided the problem of lecture-by-question, and opened the floor to people who might otherwise be intimidated because of language barriers or personality. Our booth was also excellent and I’m excited to see some of the stories that came out of it.
- Technology and culture: After talking about how we’d used cards to change the atmosphere of a talk, someone deliberately provoked me: shouldn’t we address on-wiki cultural issues the same way, by changing the “technology” used for discussion? I agree that technology can help improve things, and we should think about it more than we do (e.g.) but ultimately it can only be part of the solution – our most difficult problems will definitely require work on culture as well as interfaces. (Surprisingly, my 2009 post on this topic holds up pretty well.)
- Who is this for? I’ve always felt there was some tension around whether the conference is for “us” or for the public, but never had language for it. An older gentleman who I spoke with for a while finally gave me the right term: is it an annual meeting or is it a public conference? Nothing I saw here changed my position, which is that it is more annual meeting than public conference, at least until we get much better at turning new users into long-term users.
- Esino Lario looks like it will be a lot of fun. I strongly support the organizing committee’s decision to focus less on brief talks and more on longer, more interactive conversations. That is clearly the best use of our limited time together. I’m also excited that they’re looking into blind submissions (which I suggested in my Wikimania post from last year).
- Being an exec: I saw exactly one regular talk that was not by my department, though I did have lots and lots of conversations. I’m still not sure how I feel about this tradeoff, but I know it will become even harder if we truly do transition to a model with more workshops/conversations and fewer lectures, since those will be both more valuable and more time-consuming/less flexible.
- Some day: I wrote most of this post in the Mexico City airport, and saw that there are flights from there to La Habana. I hope someday we can do a Wikimania there.
Quick brain dump after a bike ride home: free software took a huge leap in the late 90s and early 00s in large part because of non-ideological advantages that the rest of the world is now competing with or surpassing:
- Collaboration tools: Because we got to the ‘net first, our tools for collaborating with each other were simply better than what proprietary developers were doing: cvs, mailman, wiki, etc., were all better than the silo’d old-school tools. Modern best-of-breed collaboration tools have all learned from what we did and added proprietary sauce on top: github, slack, Google Docs, etc. So our tools that are now (at best) as productive as our proprietary counterparts, and sometimes less productive but ideologically agreeable.
- Release processes: “Release early/release often” made us better partners for our users. We’re now actively behind here: compare how often a mobile app or web user gets updates, exactly as the author intended, relative to a user of a modern Linux distro.
- Zero cost: We did things for no (direct) cost by subsidizing our work through college, startups, or consulting gigs; now everyone has a subsidize-by-selling-something-else model (usually advertising, though sometimes freemium). Again, advantage (mostly?) lost.
- Knowing our users: We knew a lot about our users, because we were our biggest users, and we talked to other users a lot; this was more effective than what passed for software design in the late 90s. This has been eclipsed by extensive a/b testing throughout the industry, and (to a lesser extent) by more extensive usage of direct user testing and design-thinking.
None of these are terribly original observations – all of these have been remarked on before. But after playing some with Google Photos this weekend, I’m ready to add another one to the list:
- “big data”: It didn’t matter that neither libre nor proprietary developers had their user’s data, because no one could do much that was useful with it. Now, Gmail’s spam filtering is better than SpamAssassin precisely because of all the data they have. Google Photos is nearly magical, again because they’ve processed literally billions of photos. That’s going to be difficult (impossible?) for pure peer production models to match. (I didn’t list machine learning because the techniques there can literally be done by summer interns – the blocker is the training data, not the processes.)
Worth asking what your project is doing that could be radically changed if your competitors get access to new technology. For example, for Wikipedia:
- Collaborating: Wiki was best-of-breed (or close); it isn’t anymore. Visual Editor helps get editing back to par, but the social aspect of collaboration is still lacking relative to the expectations of many users.
- Knowledge creation: big groups of humans, working together wiki-style, is the state of the art for creating useful, non-BS knowledge at scale. With the aforementioned machine learning, I suspect this will no longer the case in a (growing) number of domains.
I’m sure there are others…
It has been a long time since I was able to say to developer friends “come work with me” in anything but the most abstract “come work under the same roof” kind of sense. But today I can say to developers “come work with me” and really mean it. Which is fun :)
Details: Wikimedia’s new community tech team is hiring for a community tech developer and a team lead. This will be extremely community-intensive work, so if you enjoy and get energy from working with a community and helping them achieve their goals, this could be a great role for you. This team will work intensely with my department to ensure that we’re correctly identifying and prioritizing the needs of our most active editors. If that sounds like fun, get in touch :)
[And I realize that I’ve been bad and not posted here, so here’s my new job announce: “my department” is the Foundation’s new Community Engagement department, where we work to support healthy contributor communities and help WMF-community collaboration. It is a detour from law, but I’ve always said law was just a way to help people do their thing — so in that sense is the same thing I’ve always been doing. It has been an intense roller coaster of a first two months, and I look forward to much more of the same.]
Flickr recently started selling prints of Creative Commons Attribution-Share Alike photos without sharing any of the revenue with the original photographers. When people were surprised, Flickr said “if you don’t want commercial use, switch the photo to CC non-commercial”.
This seems to have mostly caused two reactions:
- “This is horrible! Creative Commons is horrible!”
- “Commercial reuse is explicitly part of the license; I don’t understand the anger.”
I think it makes sense to examine some of the assumptions those users (and many license authors) may have had, and what that tells us about license choice and design going forward.
Free riding is why we share-alike…
As I’ve explained before here, a major reason why people choose copyleft/share-alike licenses is to prevent free rider problems: they are OK with you using their thing, but they want the license to nudge (or push) you in the direction of sharing back/collaborating with them in the future. To quote Elinor Ostrom, who won a Nobel for her research on how commons are managed in the wild, “[i]n all recorded, long surviving, self-organized resource governance regimes, participants invest resources in monitoring the actions of each other so as to reduce the probability of free riding.” (emphasis added)
… but share-alike is not always enough
Copyleft is one of our mechanisms for this in our commons, but it isn’t enough. I think experience in free/open/libre software shows that free rider problems are best prevented when three conditions are present:
- The work being created is genuinely collaborative — i.e., many authors who contribute similarly to the work. This reduces the cost of free riding to any one author. It also makes it more understandable/tolerable when a re-user fails to compensate specific authors, since there is so much practical difficulty for even a good-faith reuser to evaluate who should get paid and contact them.
- There is a long-term cost to not contributing back to the parent project. In the case of Linux and many large software projects, this long-term cost is about maintenance and security: if you’re not working with upstream, you’re not going to get the benefit of new fixes, and will pay a cost in backporting security fixes.
- The license triggers share-alike obligations for common use cases. The copyleft doesn’t need to perfectly capture all use cases. But if at least some high-profile use cases require sharing back, that helps discipline other users by making them think more carefully about their obligations (both legal and social/organizational).
Alternately, you may be able to avoid damage from free rider problems by taking the Apache/BSD approach: genuinely, deeply educating contributors, before they contribute, that they should only contribute if they are OK with a high level of free riding. It is hard to see how this can work in a situation like Flickr’s, because contributors don’t have extensive community contact.1
The most important takeaway from this list is that if you want to prevent free riding in a community-production project, the license can’t do all the work itself — other frictions that somewhat slow reuse should be present. (In fact, my first draft of this list didn’t mention the license at all — just the first two points.)
Flickr is practically designed for free riding
Flickr fails on all the points I’ve listed above — it has no frictions that might discourage free riding.
- The community doesn’t collaborate on the works. This makes the selling a deeply personal, “expensive” thing for any author who sees their photo for sale. It is very easy for each of them to find their specific materials being reused, and see a specific price being charged by Yahoo that they’d like to see a slice of.
- There is no cost to re-users who don’t contribute back to the author—the photo will never develop security problems, or get less useful with time.
- The share-alike doesn’t kick in for virtually any reuses, encouraging Yahoo to look at the relationship as a purely legal one, and encouraging them to forget about the other relationships they have with Flickr users.
- There is no community education about the expectations for commercial use, so many people don’t fully understand the licenses they’re using.
So what does this mean?
This has already gone on too long, but a quick thought: what this suggests is that if you have a community dedicated to creating a cultural commons, it needs some features that discourage free riding — and critically, mere copyleft licensing might not be good enough, because of the nature of most production of commons of cultural works. In Flickr’s case, maybe this should simply have included not doing this, or making some sort of financial arrangement despite what was legally permissible; for other communities and other circumstances other solutions to the free-rider problem may make sense too.
And I think this argues for consideration of non-commercial licenses in some circumstances as well. This doesn’t make non-commercial licenses more palatable, but since commercial free riding is typically people’s biggest concern, and other tools may not be available, it is entirely possible it should be considered more seriously than free and open source software dogma might have you believe.
- It is open to discussion, I think, whether this works in Wikimedia Commons, and how it can be scaled as Commons grows. [↩]
It has been nearly a full decade since Jon Udell’s classic screencast about Wikipedia’s article on the Heavy Metal Umlaut (current text; Jan. 2005). In this post, written for Paul Jones’ “living and working online” class, I’d like to use the last decade’s changes to the article to illustrate some points about the modern Wikipedia.1
At the end of 2004, the article had been edited 294 times. As we approach the end of 2014, it has now been edited 1,908 times by 1,174 editors.2
This graph shows the number of edits by year – the blue bar is the overall number of edits in each year; the dotted line is the overall length of the article (which has remained roughly constant since a large pruning of band examples in 2007).
The dropoff in edits is not unusual — it reflects both a mature article (there isn’t that much more you can write about metal umlauts!) and an overall slowing in edits in English Wikipedia (from a peak of about 300,000 edits/day in 2007 to about 150,000 edits/day now).3
The overall edit count — 2000 edits, 1000 editors — can be hard to get your head around, especially if you write for a living. Implications include:
- Style is hard. Getting this many authors on the same page, stylistically, is extremely difficult, and it shows in inconsistencies small and large. If not for the deeply acculturated Encyclopedic Style we all have in our heads, I suspect it would be borderline impossible.
- Most people are good, most of the time. Something like 3% of edits are “reverted”; i.e., about 97% of edits are positive steps forward in some way, shape, or form, even if imperfect. This is, I think, perhaps the single most amazing fact to come out of the Wikimedia experiment. (We reflect and protect this behavior in one of our guidelines, where we recommend that all editors Assume Good Faith.)
The name change, tools, and norms
A few take aways:
- Talk pages: The screencast explained one key tool for understanding a Wikipedia article – the page history. This edit summary makes reference to another key tool – the talk page. Every Wikipedia article has a talk page, where people can discuss the article, propose changes, etc.. In this case, this user discussed the change (in November) and then made the change in December. If you’re reporting on an article for some reason, make sure to dig into the talk page to fully understand what is going on.
- Sources: The user justifies the name change by reference to sources. You’ll find little reference to them in 2005, but by 2008, finding an old source using a different term is now sufficient rationale to rename the entire page. Relatedly…
- Footnotes: In 2008, there was talk of sources, but still no footnotes. (Compare the story about Motley Crue in Germany in 2005 and now.) The emphasis on foonotes (and the ubiquitous “citation needed”) was still a growing thing. In fact, when Jon did his screencast in January 2005, the standardized/much-parodied way of saying “citation needed” did not yet exist, and would not until June of that year! (It is now used in a quarter of a million English Wikipedia pages.) Of course, the requirement to add footnotes (and our baroque way of doing so) may also explain some of the decline in editing in the graphs above.
Images, risk aversion, and boldness
Another highly visible change is to the Motörhead art, which was removed in November 2011 and replaced with a Mötley Crüe image in September 2013. The addition and removal present quite a contrast. The removal is explained like this:
This is clear as mud, combining legal issues (“no fair use rationale”) with Wikipedian jargon (“WP:NFCC content criteria 10c”). To translate it: the editor felt that the “non-free content” rules (abbreviated WP:NFCC) prohibited copyright content unless there was a strong explanation of why the content might be permitted under fair use.
This is both great, and sad: as a lawyer, I’m very happy that the community is pre-emptively trying to Do The Right Thing and take down content that could cause problems in the future. At the same time, it is sad that the editors involved did not try to provide the missing fair use rationale themselves. Worse, a rationale was added to the image shortly thereafter, but the image was never added back to the article.
So where did the new image come from? Simply:
boldly adding image to lead
“boldly” here links to another core guideline: “be bold”. Because we can always undo mistakes, as the original screencast showed about spam, it is best, on balance, to move forward quickly. This is in stark contrast to traditional publishing, which has to live with printed mistakes for a long time and so places heavy emphasis on Getting It Right The First Time.
There are a few other changes worth pointing out, even in a necessarily brief summary like this one.
- Wikipedia as a reference: At one point, in discussing whether or not to use the phrase “heavy metal umlaut” instead of “metal umlaut”, an editor makes the point that Google has many search results for “heavy metal umlaut”, and another editor points out that all of those search results refer to Wikipedia. In other words, unlike in 2005, Wikipedia is now so popular, and so widely referenced, that editors must be careful not to (indirectly) be citing Wikipedia itself as the source of a fact. This is a good problem to have—but a challenge for careful authors nevertheless.
- Bots: Careful readers of the revision history will note edits by “ClueBot NG“. Vandalism of the sort noted by Jon Udell has not gone away, but it now is often removed even faster with the aid of software tools developed by volunteers. This is part of a general trend towards software-assisted editing of the encyclopedia.
- Translations: The left hand side of the article shows that it is in something like 14 languages, including a few that use umlauts unironically. This is not useful for this article, but for more important topics, it is always interesting to compare the perspective of authors in different languages.
I look forward to discussing all of these with the class, and to any suggestions from more experienced Wikipedians for other lessons from this article that could be showcased, either in the class or (if I ever get to it) in a one-decade anniversary screencast. :)
- I still haven’t found a decent screencasting tool that I like, so I won’t do proper homage to the original—sorry Jon! [↩]
- Numbers courtesy X’s edit counter. [↩]
- It is important, when looking at Wikipedia statistics, to distinguish between stats about Wikipedia in English, and Wikipedia globally — numbers and trends will differ vastly between the two. [↩]
Primarily what I did during Wikimania was chew on pens.
However, I also gave some talks.
The first one was on Creative Commons 4.0, with Kat Walsh. While targeted at Wikimedians, this may be of interest to others who want to learn about CC 4.0 as well.
Second one was on Open Source Hygiene, with Stephen LaPorte. This one is again Wikimedia-specific (and I’m afraid less useful without the speaker notes) but may be of interest to open source developers more generally.
The final one was on sharing; video is below (and I’ll share the slides once I figure out how best to embed the notes, which are pretty key to understanding the slides):