This Geek Will Put Reporters Out of Business
Artificial intelligence systems can turn structured data into stories so sophisticated they're indistinguishable from those penned by humans. How robots are taking over the newsroom.
When the science fiction writer William Gibson first looked inside a computer, he was surprised by its antiquated mechanical guts. He described the moment like this: “I'd been expecting an exotic crystalline thing, a cyberspace deck or something, and what I got was a little piece of a Victorian engine that made noises like a scratchy old record player. That noise took away some of the mystique for me; it made computers less sexy. My ignorance had allowed me to romanticize them.”
Gibson’s experience is a good metaphor for a broader cultural phenomenon: any incomprehensible technology initially glows with an aura of mystery and romance. The process of prying open a previously inscrutable device to glimpse its inner works often follows the same trajectory as the Gibson anecdote. Seeing how things actually work means abandoning certain fanciful myths about how you imagine they work.
This applies to software as much as hardware. And it’s especially true when computer software performs feats that were once considered achievable only by human minds: playing chess, correctly answering Jeopardy questions, composing aesthetically pleasing paintings and symphonies, or writing sports narratives and analytical reports on everything from the stock market to pharmaceutical research.
Few humans make a living through chess, Jeopardy, or rarefied aesthetic pursuits. But the number of professionals who use written language to present, clarify, interpret, or analyze data represents a substantial sector of the economy. So do artificial intelligence systems capable of sophisticated writing pose a genuine threat to human jobs, or is such fear an ignorance-fueled fantasy that dissolves on closer inspection of how these systems actually work?
Natural language generation programs date back to the 1970s. Early software like Novel Writer and Talespin could tell murder mysteries and children’s stories, respectively, based on character goals and motives that were specified as input. The results were not dazzling: “John Bear is somewhat hungry. John Bear wants to get some berries. John Bear wants to get near the blueberries. John Bear walks from a cave entrance to the bush by going through a pass through a valley through a meadow. John Bear takes the blueberries. John Bear eats the blueberries.”
More refined programs developed over the following decades, but they did not threaten the livelihoods of mystery novelists, screenwriters, or children’s authors. The realm of nonfiction, however, has recently proved more susceptible to automated encroachments. In 2010, the Chicago-based company Narrative Science began marketing a system called Quill that can write stories and reports in the style of stockbrokers, sportswriters, retail managers, real estate brokers, portfolio managers, and a vast range of other professionals.
Quill emerged from research at Northwestern University’s Intelligent Information Laboratory. Professors Larry Birnbaum and Kris Hammond looked outside the computer science department and began collaborating with students and faculty from Northwestern’s Medill school of journalism. Teams of computer scientists and journalists worked together to build a game-recap generator that could write baseball stories indistinguishable from pieces written by human reporters. They soon realized the range of applications extended beyond sports writing to any domain where structured data can be transformed into a narrative that emphasizes interesting and important points, finds causal connections, and even recommends courses of action.
Almost 60 companies now use the Quill. The system writes millions of recaps of Little League baseball games each year, but it also writes financial news items for Forbes, sports stories for the Big Ten network, and portfolio summaries for mutual fund companies like American Century. Narrative science has even attracted investment from the CIA, which would presumably use Quill to digest and analyze streams of raw intelligence from around the world.
Stuart Frankel, the CEO of Narrative Science, dismisses concerns that Quill represents any sort of threat to humans. “There are two popular fears about AI systems: that they’re going to kill us or that they’re going to take our jobs. We joke here that Quill is not going to write anybody to death.” The risk of job elimination, however, is harder to brush aside. “If you look at the history of technology, anything that automates what was previously done by humans does have the potential to decrease jobs in the short term, but I think in the long run more jobs will be created.”
In the late 15th century, books and pamphlets produced with a printing press rather than by hand were often described as “artificial writing.” Printing presses were even associated with malicious supernatural forces. One possible origin of the term “printer’s devil,” a printer’s apprentice, is an epithet coined by scribes who feared the new technology would render the handwritten copying of manuscripts obsolete. But the invention of the printing press eventually created an entire industry of publishers, journalists, pamphleteers, and distributors that employed many more people than the medieval sector of scribes.
Kris Hammond, the Northwestern computer science professor and current CTO for Narrative Science, thinks Quill complements rather than competes with human intelligence. “Quill is not an autonomous learning system. You have to tell it what’s causally important. So you wouldn’t see Quill look at a basketball game, for example, and observe that the amazing thing about the game is that the top scorers all wore jerseys with prime numbers. That might be true, but it’s not interesting.” Humans are still the arbiters of the interesting, a criterion that shifts based on the audience and purpose of a particular story.
Quill would not spill much ink describing a dull inning that did not impact a baseball game. Nor would it report a Little League game the same way it would cover the Major League. If a 10-year-old pitcher did not have a good game, Quill might stress his effort and heart while noting that the team still came up short. If a 25-year-old pitcher with a multimillion dollar salary did not have a good game, Quill would point out that it was an atrocious performance.
After analyzing the relevant data and choosing a particular narrative structure, Quill then assesses different “angles” for a story, such as a “come-from-behind victory” or an “early-victory-that-was-never-in-doubt.” It also follows the same process with enterprise data; after analysis of the performance of a company’s stock, the system might frame the story by emphasizing how the stock performed relative to expectations for a given quarter.
Hammond and Stuart both stress that the world is awash in data that few people have the time or inclination to understand. “I think it’s an abomination the people still have to look at spreadsheets,” Hammond said. They envision a future in which Quill is a ubiquitous technology that can instantly convert a wilderness of numbers into a concise and compelling story that emphasizes only what matters.
Quill can also tell stories that are so individualized the size of their intended readership is one. When turning data from a Fit Bit into a coherent narrative that summarizes trends and concludes that you are not sleeping enough but are still on track to meet your six-month goals, a highly personalized story makes sense. Similarly, a portfolio summary that dwells in detail on the specific set of companies most important to your particular investments is a benign sort of customization. It might be narcissistic for the parents of every player on a Little League baseball team to receive a personalized story focused primarily on the exploits of their young slugger, but it’s also relatively harmless—a narrative manifestation of the tunnel-vision that parents already have for their kids.
More unsettling possibilities are also imaginable. Some critics fear a future in which news stories are calibrated to the level of sophistication that an individual’s online reading habits suggest they will enjoy. Thus only those articles that an algorithm thinks you can understand will appear as options. If it did occur, this would only exaggerate what’s already true of the Internet; information appears differentially based on past behavior, and individuals self-sort into groups that prefer the biases and level of intricacy that different sources offer.
Technologies like Quill invite speculation about all sorts of dystopian scenarios. Perhaps the program could ingest every document ever written by a particular journalist or stockbroker, analyze the idiosyncrasies of their style, and produce an automated facsimile of their work, a kind of digital doppelgänger. Hammond stresses that Quill does not have these capacities, and that it seeks only to augment and enhance the capacities of humans, not to replicate them wholesale and thus render them redundant.
He did predict in a 2012 interview with the Guardian, however, that 90 percent of journalism stories will be produced by computers by the year 2025. But he also presents Quill with the familiar trope of technology-as-friendly-and-helpful-assistant. (The word robot comes from a Czech term for servant.) He thinks the technology will free journalists and other professionals from the drudgery of boring work, allowing them to pursue more intellectually interesting endeavors. If Quill were analyzing “angles” for a story on Quill targeted to an audience of potential investors, this approach would certainly do better than a frame of technology-as-sinister-destroyer-of-worlds-and-jobs.
The media, not to mention Hollywood, tends to prefer the second angle, perhaps for reasons that connect back to William Gibson’s point. “Any transformative technology is initially frightening,” Frankel said. “But once you understand how it actually works, it’s usually pretty boring.” Narratives about technology can function as comforting fictions that we—and perhaps soon sophisticated software programs—tell to disguise its destructive progress, as hysterical projections of unfounded fears of a future that will in fact be boring, or some subtle combination of these extremes, with constantly shifting tones and angles that vary based on audiences and algorithms. The final draft is still being written.