Sunday, March 25, 2012

Free is not open, and open is not free

When talking to computer users and buyers over more than a decade, I was struck by how much a desire for things “open” was less about freedom and more about free. Sorry, Richard, but the reality is that people want free beer more than free speech. I saw this at SJSU, when IT buyers wanted “open” standards without knowing what open meant. I’ve remarked on this repeatedly in my research, including a 2007 chapter on open standards and a 2008 chapter on open source ideology.

Next month, I’ll be talking about the principles of open source software with a new type of audience, KGI graduate students trained in molecular biology who are preparing to work at leading biotech companies. So I thought I’d find examples of both “open source biology” (which is hard to do) and open source software being used for biology.

The best known public domain package for molecular biology is BLAST, which as Wikipedia helpfully notes, dates to a 1990 article in the Journal of Molecular Biology. The “Basic Local Alignment Search Tool” can be used to match a DNA sequence to a library of known sequences to find similar patterns. This is a very computationally intensive process — I recall that Apple made a special effort to support BLAST in its servers after Art Levinson (then Genentech CEO) joined the Apple board of directors. (Levinson succeeded Steve Jobs as Apple board chair after the latter’s untimely death.)

There seem to be two variants of BLAST out there: NCBI BLAST and mpiBLAST. The latter proclaims itself as “a freely available, open-source, parallel implementation of NCBI BLAST.” It seems to be supported by contributors from Virginia Tech, several national labs, IBM and a research group in Taiwan. However, the developer and user mailing lists have no postings since August 2011.

The gold standard is the original BLAST from the National Center for Biotechnology Information. The NCBI BLAST is delivered as a hosted service (what we now call SaaS), altthough the code and database of BLAST+ can be downloaded from a NIH website. However, it appears as the flow of technology is one way — from the government out to users — reminding me a lot of the VistA healthcare IT system that prompted my original interest in open source communities.

In other words, BLAST is free, but it isn’t really open. The government is fine with releasing technology in the public domain — which often is required under basic principles — but not in sharing control or authority.

Sure enough, I found an August 2011 posting by Peter Cock, a UK blogger (and software developer) blasting at BLAST and its lack of transparency:

Now as a USA government funded project, NCBI BLAST is released to the public domain … That's great, it's free and open source, and means BLAST can be modified and re-distributed. This is perfect for inclusion in Linux distributions like Debian which take licence freedom issues very seriously (see packages blast2 for NCBI "legacy" BLAST, and ncbi-blast+ for NCBI BLAST+, the re-write in C++).

However, in other terms the NCBI BLAST project is far from open. Looking at the BLAST Developer Information there is nothing about participating in BLAST development, and no sign of a developers mailing list.

NCBI BLAST doesn't have a public source code repository. … Update (21 October 2011) Not sure how long its been there, but there is (now) a read only public SVN for BLAST+ etc,

NCBI BLAST doesn't have a public bug tracker. Instead individuals must contact the NCBI by email, at blast-help (at) ncbi.nlm.nih.gov, which gets you in touch with the front line support team that then pass proper bug reports on to the actual developer team. The only way to track an issue is by follow up email, referencing the original report by date and email subject -- if there is an internal bug tracking number I've never been told it, and I have asked about this specifically.
Actually Peter has set his sights somewhat low. As Sibohan O’Mahony and I found in our research, transparency is the de minimis level of openness that can be provided to outsiders by an open source community. The truly open projects offer permeability, or the ability of outsiders to influence the direction of a project to make it suitable for a broader range of needs than is conceived by the original authors.

This permeability consists not just of the code, but also the direction and governance of the effort. If you can add changes but not influence the priorities or direction, then a project isn’t really open.

Alas, only a handful of projects meet this standard: Apache and Eclipse are the gold standard. Instead, fishbowl type single-firm communities use open source as a distribution mechanism rather than as a way of harnessing distributed innovation; MySQL was once an exemplar, but nowadays it’s Android. But then for any corporate sponsor, letting go is hard to do.

In the truly open communities, the ability to participate attracts participation and builds a real sense of community and shared governance. When I started researching open source 12 years ago, we assumed we would see more example of such communities forming, but they still seem few and far between.

Of course, for many of these tightly-controlled open source projects, the “free” beer isn’t really free but instead is more like a free puppy (cheap up front, not in the long run). In this case, you have neither open nor free.

No comments: