Free online access to journal articles. What more do you need to know?
Quite a lot, as it happens—even if you agree with that six-word definition of open access. There's a reason this book begins with a longer definition, “Open access literature is available online to be read for free by anyone, anytime, anywhere—as long as they have Internet access.” That definition expands OA to include more than just journal articles but also narrows the scope by saying “to be read.” As you'll see in this chapter, that defines one flavor of OA, but it's not the flavor most desired by some researchers and advocates.
This chapter provides expanded definitions and key terms, considers the colors and flavors of open access, notes some of the terms that were used before open access emerged as a term, distinguishes some other “opens” that may or may not be related to OA, and considers the state of the field in mid-2010.
Consider a narrower and more stringent definition of open access:
Open access requires that refereed journal articles be fully and freely available on the open Internet, on or before the date of formal publication, to be read, downloaded, distributed, printed, and used for any legal purpose (including text manipulation, datamining and other derivative purposes), without permission or other barriers.
See the differences? This definition restricts the universe to refereed journal articles, omitting many other kinds of content that appear in some journals. It explicitly provides for immediate access, not delayed access. Finally, it calls for much more than free reading—it calls for a range of other uses.
At this point, I believe most (but not all) open access advocates would regard all three definitions—the first sentence of this chapter, the definition from chapter 1, and the sentence set-off above—as correct, with the set-off version the most desirable. That hasn't always been the case and may not be the case in the future.
Three international meetings in 2002 and 2003 yielded statements that established open access as the common term for initiatives to make scholarly literature more widely and freely available. These statements were not the start of the movement for better access to the scholarly literature, but they're key defining points for the movement and the name open access.
According to the website www.soros.org/openaccess, the Budapest Open Access Initiative (BOAI) “arises from a small but lively meeting convened in Budapest by the Open Society Institute (OSI) on December 1-2, 2001.” The resulting statement appeared on February 14, 2002. Initially signed by 16 individuals, it has to date been endorsed by 5,278 individuals and 539 organizations from many nations.
Given the primacy of BOAI, it's worth quoting almost half of the 1,100-word document, omitting some argumentation at the beginning and in the middle (and a key moral argument already given in chapter 1):
An old tradition and a new technology have converged to make possible an unprecedented public good. The old tradition is the willingness of scientists and scholars to publish the fruits of their research in scholarly journals without payment, for the sake of inquiry and knowledge. The new technology is the Internet. The public good they make possible is the world-wide electronic distribution of the peer-reviewed journal literature and completely free and unrestricted access to it by all scientists, scholars, teachers, students, and other curious minds….
For various reasons, this kind of free and unrestricted online availability, which we will call open access, has so far been limited to small portions of the journal literature…. [W]e call on all interested institutions and individuals to help open up access to the rest of this literature and remove the barriers, especially the price barriers, that stand in the way….
The literature that should be freely accessible online is that which scholars give to the world without expectation of payment. Primarily, this category encompasses their peer-reviewed journal articles, but it also includes any unreviewed preprints that they might wish to put online for comment or to alert colleagues to important research findings…. By “open access” to this literature, we mean its free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the Internet itself. The only constraint on reproduction and distribution, and the only role for copyright in this domain, should be to give authors control over the integrity of their work and the right to be properly acknowledged and cited.
… To achieve open access to scholarly journal literature, we recommend two complementary strategies.
- Self-archiving: First, scholars need the tools and assistance to deposit their refereed journal articles in open electronic archives, a practice commonly called self-archiving. When these archives conform to standards created by the open archives Initiative, then search engines and other tools can treat the separate archives as one….
- Open-access Journals: second, scholars need the means to launch a new generation of journals committed to open access, and to help existing journals that elect to make the transition to open access. Because journal articles should be disseminated as widely as possible, these new journals will no longer invoke copyright to restrict access to and use of the material they publish. Instead they will use copyright and other tools to ensure permanent open access to all the articles they publish …
Open access to peer-reviewed journal literature is the goal. self-archiving (1.) and a new generation of open-access journals (2.) are the ways to attain this goal. They are not only direct and effective means to this end, they are within the reach of scholars themselves, immediately, and need not wait on changes brought about by markets or legislation….
Consider that boldfaced definition. It calls for virtually unlimited usage. Note also that this statement does not mark either OA strategy as the preferred or initial strategy.
George Soros' Open Society Institute provided funding for a range of OA-related initiatives following this statement including the Directory of Open Access Journals or DOAJ (discussed further in chapter 6).
This statement, released June 20, 2003, originated in a one-day meeting with two dozen participants held April 11, 2003, at the Howard Hughes Medical Institute. You'll find the full 1,800-word statement (including a meeting summary) at www.earlham.edu/~peters/fos/bethesda.htm. The document includes some interesting statements such as an affirmation of the principle “that only the intrinsic merit of the work, and not the title of the journal in which a candidate's work is published, will be considered in appointments, promotions, merit awards or grants.” Here's how the Bethesda group defined “open access publication”:
Definition of Open Access Publication
An Open Access Publication [1] is one that meets the following two conditions:
- The author(s) and copyright holder(s) grant(s) to all users a free, irrevocable, worldwide, perpetual right of access to, and a license to copy, use, distribute, transmit and display the work publicly and to make and distribute derivative works, in any digital medium for any responsible purpose, subject to proper attribution of authorship [2], as well as the right to make small numbers of printed copies for their personal use.
- A complete version of the work and all supplemental materials, including a copy of the permission as stated above, in a suitable standard electronic format is deposited immediately upon initial publication in at least one online repository that is supported by an academic institution, scholarly society, government agency, or other well-established organization that seeks to enable open access, unrestricted distribution, interoperability, and long-term archiving (for the biomedical sciences, PubMed Central is such a repository).
Notes:
- Open access is a property of individual works, not necessarily journals or publishers.
- Community standards, rather than copyright law, will continue to provide the mechanism for enforcement of proper attribution and responsible use of the published work, as they do now.
Note three things here: the definition calls for essentially unlimited derivative use; it is in no way limited to journal articles or science, technology, and medicine; and deposit in a repository is a requirement for a publication to be considered OA.
This document, “Berlin Declaration” for short, grew out of an October 20–22, 2003, conference in Berlin, held under the auspices of the Max Planck Society. You'll find the declaration and related notes, including a list of more than 200 organizations who have signed the declaration over the years, at http://oa.mpg.de/openaccess-berlin/berlindeclaration.html. The declaration explicitly notes the Budapest and Bethesda statements. Here are the goals and the first paragraph of the definition. The definition itself is nearly identical to the Bethesda definition, except that for the repository requirement, the phrase “in at least one online repository using suitable technical standards (such as the Open Archive definitions)” has been added after “one online repository,” that deposit is considered publication, and the phrase “immediately upon publication” does not appear.
Our mission of disseminating knowledge is only half complete if the information is not made widely and readily available to society. New possibilities of knowledge dissemination not only through the classical form but also and increasingly through the open access paradigm via the Internet have to be supported. We define open access as a comprehensive source of human knowledge and cultural heritage that has been approved by the scientific community.
In order to realize the vision of a global and accessible representation of knowledge, the future Web has to be sustainable, interactive, and transparent. Content and software tools must be openly accessible and compatible.
Definition of an Open Access Contribution
Establishing open access as a worthwhile procedure ideally requires the active commitment of each and every individual producer of scientific knowledge and holder of cultural heritage. Open access contributions include original scientific research results, raw data and metadata, source materials, digital representations of pictorial and graphical materials, and scholarly multimedia material. (Actual definition follows, nearly identical to Bethesda.)
As of October 2003, then, there are three definitions for OA that all require full “responsible” derivative use and distribution, not only reading—and two of three that rely entirely on repositories, with no mention in the definitions of open access publishing itself (except for the assertion in the Berlin document that deposit in a repository constitutes publication).
Things have changed since then, largely to allow a broader range of actions to be called open access, even though they don't fully comply with these three statements (which Peter Suber calls the BBB statements).
You'll see two colors commonly mentioned in discussions of OA and two other terms I'm calling “flavors”: green, gold, gratis, and libre. Green and gold OA deal directly with two different approaches to providing free access to peer-reviewed scholarly journal articles, with little or no relevance to other material that might be made freely available. Gratis and libre, two flavors of open access, are relevant to all freely available material, including but not limited to peer-reviewed journal articles. In brief:
Let's look at each of those terms in more detail.
Green open access—sometimes called “the green road to OA”—means depositing peer-reviewed articles at least as soon as they're published in freely accessible digital repositories (subject, institutional or otherwise). Most such repositories use the Open Archives Initiative Protocols for Metadata Harvesting (OAI-PMH), which help assure effective harvesting—but OAI-PMH is not a requirement for green OA. Articles must be complete full text (it's not enough to deposit abstracts). Articles may be in the form submitted to or approved by peer reviewers, sometimes called preprint, rather than the final post-review form, sometimes called postprint.
Green OA works best with OA archives or repositories. Here's what Peter Suber says about OA repositories in “A Very Brief Introduction to Open Access” (www.earlham.edu/~peters/fos/brief.htm):
OA archives or repositories do not perform peer review, but simply make their contents freely available to the world. They may contain un-refereed preprints, refereed postprints, or both. Archives may belong to institutions, such as universities and laboratories, or disciplines, such as physics and economics. Authors may archive their preprints without anyone else's permission, and a majority of journals already permit authors to archive their postprints. When archives comply with the metadata harvesting protocol of the open archives Initiative, then they are interoperable and users can find their contents without knowing which archives exist, where they are located, or what they contain. There is now open-source software for building and maintaining OAI-compliant archives and worldwide momentum for using it.
An earlier version of this statement included an additional sentence: “The costs of an archive are negligible: some server space and a fraction of the time of a technician.” Peter Suber removed the sentence because most effective institutional repositories do much more than simply accept faculty preprints and postprints and can involve substantial expenses.
Sharp-eyed readers may notice another issue in the statement's penultimate sentence: “users can find their contents …”—which is true only if a search engine is harvesting the metadata and making it searchable in a free and useful manner.
OA repositories need not be limited to journal articles. Institutional repositories, specifically those within academic institutions, commonly serve as homes for a broader range of scholarly material—working papers, data sets, and the like. As long as metadata properly labels peer-reviewed articles as such, including the journal in which they appeared, there should be no confusion about the inclusion of non-peer-reviewed material within the same repository. For that mater, subject repositories need not be limited to peer-reviewed articles. It may be worth noting that many “green OA” journal articles at present aren't in repositories at all—they're on authors' personal websites.
There's a curious ambiguity about preprint deposits. As defined by some OA experts, these versions are the versions submitted to journals—prior to peer review. Given that peer review can lead to substantive as well as editorial changes in articles, a preprint might better be thought of as a draft. I've always thought of preprint deposits in terms of papers that have been accepted through peer review, but haven't yet been copy-edited and, in some cases, laid out for publication.
At one point, leading green OA advocate Stevan Harnad recommended that scholars deposit preprints along with correction sheets to allow readers to create the equivalent of the final paper and asserted that such deposits were always legitimate, regardless of the publisher's policies regarding open access. The first clause of the fourth sentence in the quoted paragraph above—“Authors may archive their preprints without anyone else's permission”—makes this assumption. The idea is that an author's transfer of copyright to a publisher only affects the published version. To the best of my knowledge, this assumption has not been tested in the courts. It is at least conceivable that a publisher could consider provision of free access to a marginally different text to be infringement, and it is certainly the case that a publisher's terms of publication could require, as a contractual mater, that no “preprint” version be available on an open archive. It's hard to say whether that's a real-world concern—whether any journal publisher would be willing to go that far in order to prevent access to draft articles. It's not out of the question, however. As recounted by Kevin Smith in a September 7, 2010, post at Scholarly Communications @ Duke (http://library.duke.edu/blogs/scholcomm), Knopf, the publisher of Raymond Carver's short stories, has threatened Carver's widow with a copyright infringement suit for her plans to publish the original versions of the stories—apparently heavily edited by Carver's editor at Knopf. That involves fiction for which Carver was paid, not scholarly articles for which authors are not paid, but those distinctions might not mater.
It's been suggested that OA archives or repositories “can provide OA by default to all their contents or can let authors control the degree of accessibility to their works.”1 That's true but unless authors provide immediate access to articles, those articles are not OA—even if they're in an OA repository. The repository at that point is a hybrid: part open, part closed. A hybrid repository makes sense for drafts, institutional records, working notes and other items, but if articles themselves have controlled access, they are simply not OA.
The virtues of green OA are that it (theoretically) doesn't require consent or change in policies from publishers; that—up to a point—it doesn't change the current model; that it might be cheaper to implement than full-scale gold OA (depending on the actual cost of establishing and maintaining effective repositories); and that it might yield easier and more comprehensive searching if there are search engines doing exhaustive harvesting.
The chief drawbacks include one of the virtues: Green OA does not inherently change the current subscription model and won't provide near-term cost savings for libraries. Green OA also doesn't necessarily provide the final version of journal articles, which may make the OA version less useful and certainly not a clear substitute for the published version. There are other issues, discussed later, having to do with effective access and long-term access.
As of mid-2010, most traditional journals allow green OA in one form or another. I've seen a number as high as 90% of traditional journals, although the percentage that allows immediate deposit of postprints is considerably smaller.
Nobody knows how many articles are available through green OA. The OAIster database (founded by the University of Michigan, now part of OCLC) includes more than 25 million records from more than 1,100 contributors. Those records are freely searchable as part of Worldcat.org, but also through a separate OAIster-only engine at http://oaister.worldcat.org. The 25 million records include many things other than journal articles, and problems with repository software can result in things like having one OAIster record for each page of a scanned book. Additionally, some OAIster records point to items that turn out not to be open access—in one small random sample, only about one-quarter of the records led to true OA materials.2 Still, OAIster (and other OAI harvests) provide clear demonstrations that green OA can be effective OA. ScientificCommons, www.scientificcommons.org, shows more than 38 million “publications” from 1,269 repositories as of September 23, 2010. That figure presumably includes gold as well as green OA, just as Google Scholar includes many sources of material.
Gold open access—sometimes called “the gold road to open access”—means the journal itself provides immediate full-text online access at no charge to readers. The online version of peer-reviewed portions of gold OA journals are funded by some means other than mandatory subscriptions.
Gold OA requires OA journals: journals that provide immediate, free online access to peer-reviewed articles. Journal is a slightly trickier term today than it was in, say, 1985—but before we consider that, here's Peter Suber's terse description of OA journals, also from “A Very Brief Introduction …”:
OA journals perform peer review and then make the approved contents freely available to the world. Their expenses consist of peer review, manuscript preparation, and server space. OA journals pay their bills very much the way broadcast television and radio stations do: those with an interest in disseminating the content pay the production costs upfront so that access can be free of charge for everyone with the right equipment. Sometimes this means that journals have a subsidy from the hosting university or professional society. Sometimes it means that journals charge a processing fee on accepted articles, to be paid by the author or the author's sponsor (employer, funding agency). OA journals that charge processing fees usually waive them in cases of economic hardship. OA journals with institutional subsidies tend to charge no processing fees. OA journals can get by on lower subsidies or fees if they have income from other publications, advertising, priced add-ons, or auxiliary services. Some institutions and consortia arrange fee discounts. Some OA publishers waive the fee for all researchers affiliated with institutions that have purchased an annual membership. There's a lot of room for creativity in finding ways to pay the costs of a peer-reviewed OA journal, and we're far from having exhausted our cleverness and imagination.
While this paragraph provides an excellent summary of key aspects and possibilities for OA journals, some items deserve discussion and amplification. For example:
The virtues of gold OA are that it assures immediate access to final articles, with all copyediting and other manuscript preparation in place, and that it should lower costs for libraries to the extent that OA journals displace traditional journals or traditional journals transform to OA journals.
The main drawback of gold OA is that it directly challenges existing journal publishers and the existing publishing system—and does no good for library budgets until and unless OA journals displace traditional journals.
As of mid-2010, roughly 20% of peer-reviewed journals are OA, if we accept the assertion that there are about 25,000 peer-reviewed journals. The Directory of Open Access Journals lists more than 5,400 journals as of September 2010. Naturally, 20% of journals does not equal 20% of articles. A study published in PLoS One, an innovative gold OA journal (www.plosone.org), on June 23, 2010 finds that 20.4% of a sample of peer-reviewed articles published in 2008 are available openly in full text on the web—but only 8.5% are available at publisher sites; the rest are accessible through search engines and appear in repositories or other sites.3 At this point, and with singular exceptions such as PLoS One, OA journals tend to publish fewer articles than traditional journals, hardly surprising given the newness of most OA journals. It's worth noting that the article cites a typical one-year embargo as the basis for studying 2008 articles in early 2010: the 20% figure for articles is for delayed OA, not full OA. It's also worth noting that one-third of the green OA articles were not in repositories but rather on personal websites or other websites, which are less likely to remain available for the long term.
According to The stm report, an October 2009 overview published by the International Association of Scientific, Technical and Medical Publishers available at www.stm-assoc.org/2009_10_13_MWC_STM_Report.pdf, only about 2% of a claimed 1.5 million articles published per year are published in “full” OA journals, with another 5% in journals offering delayed access and 1% published in hybrid journals (subscription journals that offer OA only for articles where a special author-side fee is paid). That's a snapshot, and the percentage of articles in gold OA journals will certainly increase.
You'll see mention of other colors in some discussions of OA, but with no broad agreement or usage. While there could be other OA vehicles—e.g., personal websites, blogs, wikis, etc.—there are no agreed standards for such vehicles and much less likelihood of broad, well-defined searchability or longevity. (For more on the longevity question, see chapter 3.)
Tom Wilson, publisher and editor of the long-established OA journal Information Research, further distinguishes between what he calls partial open access journals and true open access journals, reserving the later label for journals that don't have author-side fees. He also calls these journals, funded by subsidies, voluntary work, grants, or advertising, platinum access journals.
Gratis OA is online digital literature that anyone can read without charge. There are no price barriers to read the literature. You'll also see “weak OA” used, particularly prior to mid-2008, when Peter Suber began using the terms gratis and libre.
The very existence of gratis OA and the perceived need to define the term indicate the reality: the Budapest, Bethesda, and Berlin statements require an ideal set of conditions—conditions that many scholars and journals find difficult to meet. This is a classic case where the best can be the enemy of the good. Requiring all the conditions defined in the key OA statements would substantially delay and reduce the availability of journal articles to be read freely, a key objective and the one of most importance to most researchers, practitioners, and users. The bulk of the problems addressed by OA, and all of the problems apparent to perhaps 99% of potential users, are covered by gratis OA—the ability to read freely gets us most of the way there.
If you're familiar with Creative Commons licenses, you can summarize gratis OA by saying that even the most restrictive CC license, BY-NC-ND, supports gratis OA: You can read it (and copy it for others to read), but not much more. In practice, if there's no CC license on an OA source, it may not even be legal to copy articles for preservation purposes, substantially weakening long-term access.
Libre OA is online digital literature that is free of charge and free of “unnecessary” copyright and licensing restrictions. You'll also see “strong OA” used prior to mid-2008.
I would say that libre OA is what Budapest, Bethesda, and Berlin call open access— but it's not that simple. Unnecessary is a tricky term. For example:
Another way to look at gratis and libre is in terms of barriers. Gratis OA removes pricing barriers for use of the journal literature. Libre OA removes at least some permission barriers from those wishing to use articles in ways beyond reading and copying them. Unless the only remaining restriction on reuse is the requirement for attribution, libre OA may be a misnomer—but that's probably too restrictive. As Peter Suber puts it, “Some OA providers permit commercial reuse and some do not. Some permit derivative works and some do not.” If you're not permitting commercial reuse or derivative works, it's hard to see what permission barriers you're removing—why this form of OA belongs in the libre camp at all. Maybe we need full libre OA as a term that means all permission barriers, possibly excepting some form of attribution, have been removed.
Currently, it appears that most OA literature—whether green OA or gold OA—is gratis or somewhere between gratis and libre. The differences between gratis and libre primarily involve secondary uses of published material, including data mining. So, for example:
Libre OA must be gratis OA—it's not possible for it to be otherwise. Gold OA should be green OA, but that's not a requirement. It's possible for a journal to make its articles freely readable at time of publication but not allow those articles to be deposited in institutional or subject repositories. (It is certainly the case that some OA journals ask that articles not be made available in repositories prior to formal publication.) There's a (weak) economic case for doing that if a gold OA journal is partially or wholly supported by web advertising, as refusal to allow green OA assures that all readers come through the journal's own website to maximize page views and related ad revenue. (That's a hypothetical. I am not personally aware of any gold OA journals that do, in fact, restrict green OA after publication.)
What do you call a journal that makes peer-reviewed articles available for free online reading—but only after six months or a year? What do you call a subject repository where articles must be deposited—but where free online access can be embargoed for up to a year?
John Willinsky calls the former delayed open access. The National Institutes of Health (NIH) calls the later PubMed, and most would consider PubMed one of the great examples of open access at work.
In both cases, it seems reasonable to call these good steps in the right direction— but not true OA. Timeliness is important to truly effective use of existing research, and timeliness may be critical for a lay reader who needs to understand important new medical findings.
Delayed situations are compromises, just as gratis OA and the weak definition of libre OA are compromises. It's important to recognize compromises for what they are: necessary steps to improve existing situations, but steps that don't quite reach the desired conclusion.
You may see this term used in 20th-century discussions of open access issues, and you'll see the abbreviation in the URL for many key documents. Peter Suber used this term for some time prior to the Budapest declaration—since steps toward open access go back long before 2002, to at least 1966.
There are many other catchprases beginning with “open.” Dorothea Salo provides the slides from a June 2010 presentation entitled “Open Sesame! (and other open movements)” at www.slideshare.net/cavlec/open-sesame-and-other-open-movements. Her list includes open source (software for which human-readable source code is freely available), open standards, open content (e.g., the free culture movement), open courseware, open data, and open notebook science—and that's a tiny (but useful) subset of the Opens.
This report doesn't deal with other Opens. When you encounter an Open term, dig a little: Open can be used for commercial and propriety efforts as easily as it can for commendable efforts to improve science, software and humanity through shared resources.
Portions of chapter 3 and chapter 4 discuss some of the things that open access is not. For example, it is not an assertion that scholarly publishing involves no costs; it is not a movement against copyright law; it is not a movement against peer review.
The term royalty-free literature has been used to define the primary target for open access, but it may not be the right term. After all, most magazine articles and newspaper articles are royalty free: the writer receives a single payment, either as a salary or as an article fee. (In the later case, the writer also typically retains copyright and the rights to reuse the material for other purposes after some reasonable period.) The OA movement has not, to my knowledge, involved the suggestion that magazine articles and newspaper articles ought to be freely available online. Unfortunately, there's not a good terse term. Payment free doesn't work, because scholars are most certainly paid to write peer-reviewed articles, as such articles are the most visible outcomes of research. The key is that publishers don't pay scholars for the articles, and that's hard to sum up in a terse phrase.
OA can and does go beyond peer-reviewed articles, but such articles are the focus of most OA activity and the area where libraries can see the most potential benefit from widespread adoption of OA.