Is This Science Hacker a Heroine or a Villain?
Alexandra Elbakyan is turning the multimillion dollar science publishing industry upside down.
In 2011, 23-year-old Alexandra Elbakyan did something radical: She created Sci-Hub, a digital repository housing copies of 64.5 million scientific journal articles.
That might seem, on its face, ho-hum. But Sci-Hub was radical because it was free to anyone with an internet connection, threatening to disrupt the research publishing industry, which holds its copyrighted material behind a paywall.
And it all started as a self-serving project.
“When I was a student in Kazakhstan University, I did not have access to any research papers,” Elbakyan wrote to the judge in the case brought against her by Elsevier, the largest publisher of medical journals. “I needed [the papers] for my research project. Payment of $32 is just insane when you need to skim or read tens or hundreds of these papers to do research.
“I obtained these papers by pirating them,” she continued. “Later, I found there are lots and lots of researchers (not even students, but university researchers) just like me, especially in developing countries.”
It turned out Elbakyan was not alone. She was soon a regular on message boards and in chatrooms with other people looking to share passwords and access journal articles they couldn’t afford. “I could obtain any paper by pirating it, so I solved many requests and people always were very grateful for my help. After that, I created sci-hub.org website that simply makes this process automatic and the website immediately became popular.
Sci-Hub is not backing down, and in fact, is getting increasingly popular—in 2016, the web site claimed to services over 200,000 requests per day. The site hosts more than 64.5 million scientific journal articles.
In October, biodata scientist Daniel Himmelstein at the University of Pennsylvania studied Sci-Hub’s repository and found that its collection housed 85.2 percent of all journal articles stored behind all paywalls everywhere; 97.3 percent of Elsevier’s catalog—the largest publisher of biomedical journals—is available for free on Sci-Hub.
“We estimate that over a six-month period in 2015–2016, Sci-Hub provided access for 99.3% of valid incoming requests,” the (not peer-reviewed) study’s authors wrote. “Hence, the scope of this resource suggests the subscription publishing model is becoming unsustainable. For the first time, the overwhelming majority of scholarly literature is available gratis to anyone with an Internet connection.”
As a successful pirater of scientific information, Elbakyan lives in her native Kazakhstan. She is forced to hide and change the IP address of the site every time the U.S. blocks the address.
Here’s why: Science research is huge business in America. We might think of research as public information, but it remains one of the most guarded, expensive, valuable products the United States produces. Taxpayers fund the National Institute of Health, which dishes out $32 billion a year in grants. Eighty percent of that funding goes to more 2,500 universities, medical schools and research institutions. Private colleges with massive endowments and public universities alike get the majority of their research funding from the federal government. Harvard, for example, receives 75 percent of its funding from the federal government.
Taxpayers thus cover the costs of their public institutions. If science research is thought of as a business, the public’s role is to have supplied the seed capital with the research grants from agencies like the National Institute of Health, with a second round of funding of investments in public universities and institutions covering overhead, salaries, and more.
In other words, it’s not cheap. That investment should, in theory, allow the public to access the goods they have paid for—for free.
In practice, however, taxpayers pay for this research. The very universities that conducted the research are forced to spend millions of dollars on subscriptions, essentially buying back the published results of the studies they funded. This research hides behind paywalls and can’t be accessed without thousands (sometimes, tens of thousands) of dollars in membership fees.
Sci-Hub changed that—and is flourishing because of its championing of free access to information. Its success, however, has forced Elbakyan into hiding in Kazakhstan, out of the purview of American law enforcement, as she continues to post articles. That’s led to widespread questioning of why and how the information was ever locked away in the first place, since most of it came from publicly funded research.
Elsevier, in particular, uses a peer-review method of editing. Scientists must pay journals to run their content and the editing is done free of charge, the opposite of traditional publishing where the reporter is paid for their story, and their editor is also compensated for her work. Eventually—sometimes upwards of a year or more later—the results are published.
“They're being given the content for free, they're getting almost all the labor for free, and then they're being paid essentially monopolistic prices because everybody needs access to their content,” Michael Eisen, a renowned molecular biology professor at the University of California Berkeley and investigator with the Howard Hughes Medical Institute, told The Daily Beast. “Change was never going to come from the publishers because they had too much of an incentive to keep things the way they are.”
Elsevier's parent company, RELX Group, brought in more than $9.37 billion in revenue in 2016, 40 percent of which came from their science and technology publications, according to its 2016 annual report. (Elsevier's worth is about $3 billion.) That same report notes that the company operates at a 32 percent profit margin. By contrast, Time Inc., the largest American publishing house had $3 billion in revenue and net loss of $38 million in 2016, according to SEC filings.
It’s tough days for traditional publishing houses where the writers and editors are paid, unlike the medical publishing world, where neither are compensated and most of the expense of procuring the reporting was paid for by the American public.
Sci-Hub’s genius is that it’s actually a web scraping program. Users go to the site, enter the URL, DOI, or text search identifying the article they want. The site searches its archives of articles stored on its servers and those stored on LibGen, a partner-site. If the requested article is available, the web scraper delivers it. If it is not, it goes to its vault of known usernames and passwords acquired from academics with legit access, pulls up the article you requested giving you a copy and storing a copy away on its servers. (The question remains whether academics are donating their logins to Sci-Hub or, as some have suggested, these credentials have been stolen via phishing and potentially other invasive means.)
Given the massive library of information on Sci-Hub and the lucrative business of medical publishing, it is not surprising that Elbakyan is facing legal challenges in the United States. On November 3, 2017, a U.S. District Court for the Eastern District of Virginia agreed to go beyond the typical punitive measures applied to copyright violators and ordered search engines to block Sci-Hub from American customers. This ruling upheld an earlier recommendation by Judge John Anderson, which issued a report siding with the American Chemical Society’s demand that Sci-Hub pay it $4.8 million in damages and requested that internet service providers, like search engines, block Sci-Hub from users.
The legal precedent and future implications of this ruling include changing the role of internet service providers and search engines from gateways to law enforcers. If the provider or search engine is required to enforce the censorship requirements made by the government, the country could be entering a new, untested era of regulation. However, these rulings are only applicable within the U.S.; outside access to Sci-Hub is not bound by American laws.
In addition to the American Chemical Society, Elsevier has also sued Elbakyan. In June, the United States Southern District Court of New York sided with the publisher, in what is known as largest copyright infringement case in the world, issuing a $15 million dollar judgement.
Elbakyan, however, has found an astonishingly simple solution: She simply moves the site to a new IP address every time it’s blocked. Devotees turn to Twitter and Wikipedia for up-to-date information on Sci-Hub’s current address—until it moves again, at which point, the cycle begins all over again.
And while Elbakyan’s motivation for pirating papers came from her personal need for the research, she was not the first person to try to free scientific discovery. Many inside the U.S tried before her—and have failed.
In the mid-1990s, work on the human genome had just begun, changing the course of biological history and scientific discovery. The highly detailed work required a new set of technological tools; reading previous, relevant research wasn’t possible the way it was for other subfields. Scientists like Eisen started looking for ways to comb through published results; in doing so, some began thinking of journal articles as sources of data to be analyzed, rather than as points of analysis on their own.
“It was a coincidence, in some sense, that right as that was happening, the scientific literature became available electronically,” Eisen, referring to the internet’s potential to host information and make it searchable, told The Daily Beast.“We went down to the Stanford library—they were in charge of putting a lot of the journals from scientific societies online. We just asked, kind of naively, ‘Can we get access to all these papers? We want to do some cool science with the data.’ We were flat-out told ‘no.’”
Eisen said he had the skills to hack in, but he wasn’t willing to go to jail for the cause. Instead, in 2003, he helped launch The Public Library of Science, or PLOS, an open-access journal. One of his co-founders was the former head of the National Institutes of Health, Harold Varmus, who won the 1989 Nobel Prize in Physiology or Medicine for his work with Michael Bishop for their discovery of the cellular origin of retroviral oncogenes, which held implications for the study of cancer cell mutations. Another cofounder of PLOS was Pat Brown, professor emeritus of biochemistry at Stanford who had been Varmus’s post-doctoral mentee. Brown brought up the idea of open source journals with his mentor, Varmus, while visiting him in Washington, D.C. Varmus wrote in his book The Art and Politics of Science that when Brown pointed out the potential of internet to share scientific literature, he was left with some big questions: “Was the scientific community taking adequate advantage of the Internet and new computational tools to improve publication practices and use of the literature? Could electronic public libraries provide much more than the titles and abstracts stored in the PubMed catalog—for instance, full texts of published reports? If information from DNA sequencing efforts, such as the Human Genome Project, was made freely and fully available on the Internet, couldn’t we do the same for the scientific literature?”
This got Varmus “thinking about how Internet-based distribution and storage of biomedical research articles could dramatically alter the way we worked,” he wrote. By May 1999, Varmus penned a manifesto calling for all publically funded research to be free and accessible in a database, which he called E-Biomed.
Then he posted his manifesto on NIH’s website. Varmus attempted to assuage critics that he wasn’t turning the NIH into a publishing company but that E-Biomed would be in the public’s best interests: “We are proposing this plan in an effort to accelerate much-needed public discussion of electronic publication in the United States and abroad and to provide the financial, technical, and administrative assistance to initiate such a program.”
The publishing industry didn’t see things Varmus’s way though. As Varmus wrote in The Art and Politics of Science: “The for-profit publishing houses were also unhappy and sent their lead lobbyist the former congresswoman Pat Schroeder, to Capitol Hill to talk to members of my appropriations subcommittees. Even my strongest supporter in Congress, John Porter, was sufficiently concerned by her visit to ask me to come to his office to explain what I was trying to do in a chat that was uncomfortable for both of us.”
About a year later, Congress and Varmus had reached a compromise: Any publically funded research exceeding $100 million in grants from the U.S. government would be available for free within a year of publication on PubMed Central. While this compromise was better than nothing, many viewed it as the equivalent of staying up-to-date on current events by watching year-old episodes of the nightly news.
For Eisen, delays in releasing scientific findings have consequences. One of his arguments against the current peer-review model is that it takes too long. “Each round of reviews takes a month or more, and it is rare for papers to be accepted without demanding additional experiments, analyses and rewrites, which take months or sometimes years to accomplish. And this time matters. The scientific enterprise is all about building on the results of others—but this can’t be done if the results of others are languishing in peer review. There can be little doubt that this delay slows down scientific progress and often costs lives,” Eisen said in 2013 at a talk at the Commonwealth Club in San Francisco.
Within a year of penning his E-Biomed manifesto, Varmus was working to get PLOS off the ground with Eisen and Brown.
The challenge for PLOS was that traditional journals had already sufficiently cornered the market. To put it another way: If you didn’t publish in certain journals you might find yourself in academic purgatory. The expression “publish or perish” isn’t really the whole story, though; it’s more like “publish in a few specific journals or become irrelevant.” The impact factor rules the publishing world, limiting options for scientists and centralizing power and control among a few journals. You won’t get the good jobs, win the big prizes, or be able to raise money for future experiments if your research doesn’t “matter.” This is the mindset of many great scientists whose anxiety feeds into this self-perpetuating system.
Logically, if the top investigators agreed to publish in open-source journals or if elite universities made it a job requirement to publish in open-source journals, control over the research would dissipate. But most scientists aren’t willing to put their careers on the line for the greater good. Eisen likes to point out that all of his papers are all published in PLOS and that he’s doing quite well for himself. And, he’s not alone, as of 2017, PLOS has grown to include seven peer-reviewed journals and several other products, which together have published more than 165,000 articles from scientists in more than 190 countries. And yet the traditional journals still have a firm grip on the industry.
While Americans like Aaron Swartz—the open source advocate who tried to free JSTOR files from the MIT server—worked subversively, Varmus tried leveraging his insider influence as the head of NIH to change the system. Eisen, meanwhile, tried to create a new system. But it took Elbakyan—a scientist from Kazakhstan who needed the research and couldn’t afford it, knew how to illegally access it, and wasn’t threatened by American justice—to successfully open scientific discovery to everyone in 2011.
“Fortunately, she's both beyond their reach and also seems to be energized by their attacks on her,” Eisen told The Daily Beast (he did not reveal if he keeps in touch with Elbakyan, but does publicly follow her efforts closely). “Her personality seems right to be this renegade. She's done more good around science and around the world than all of the scientific communities put together over the last 20 years.”