Behind the Scenes With the Team That Built Jeopardy's Watson
Users of the Internet community Reddit.com submitted a series of questions to the IBM Watson Research team through their iAmA subreddit, where you, “Post what you are, have people ask you about yourself.”
“Last week you requested someone who worked on Watson over in IAMA,” Reddit writes, “and IBM Watson Research team was game to answer your top questions about Watson, how it was developed and how IBM plans to use it in the future.”
The following was originally published on Reddit’s blog and republished here with permission.
Q1. Could you give an example of a question (or question style) that Watson always struggled with? (via Chumpesque)
IBM: Any questions that require the resolution of very opaque references especially to everyday knowledge that no one might have written about in an explicit way. For example, “If you're standing, it's the direction you should look to check out the wainscoting.”
Or questions that require a resolving and linking opaque and remote reference, for example “A relative of this inventor described him as a boy staring at the tea kettle for an hour watching it boil.”
The answer is James Watt, but he might have many relatives and there may be very many ways in which one of them described him as studying tea boil. So first, find every possible inventor (and there may be 10,000's of inventors), then find each relative, then what they said about the inventor (which should express that he stared at boiling tea). Watson attempts to do exactly this kind of thing but there are many possible places to fail to build confident evidence in just a few seconds.
"IBM had to develop a mechanical device that grips and physically pushes the button."
Q2. What was the biggest technological hurdle you had to overcome in the development of Watson? (via this_is_not_the_cia)
Accelerating the innovation process—making it easy to combine, weigh evaluate and evolve many different independently developed algorithms that analyze language form different perspectives.
Watson is a leap in computers being able to understand natural language, which will help humans be able to find the answers they need from the vast amounts of information they deal with everyday. Think of Watson as a technology that will enable people to have the exact information they need at their fingertips.
Q3. Can you walk us through the logic Watson would go through to answer a question such as, "The antagonist of Stevenson's Treasure Island." (Who is Long John Silver?) (via elmuchoprez)
Step One: Parses sentence to get some logical structure describing the answer
X is the answer.
antagonist_of(X, Stevenson's Treasure Island).
modifies_possesive(Stevenson, Treasure Island).
Step Two: Generates Semantic Assumptions
Step Three: Builds different semantic queries based on phrases, keywords and semantic assumptions.
Step Four: Generates 100s of answers based on passage, documents and facts returned from 3. Hopefully Long-John Silver is one of them.
Step Five: For each answer formulates new searches to find evidence in support or refutation of answer—score the evidence.
Long-John Silver the main character in Treasure Island.....
The antagonist in Treasure Island is Long-John Silver
Treasure Island, by Stevenson was a great book.
One of the great antagonists of all time was Long-John Silver
Richard Lewis Stevenson's book, Treasure Island features many great characters, the greatest of which was Long-John Silver.
Step Six: Generate, get evidence and score new assumptions
Positive Examples: (negative examples would support other characters, people, books, etc associated with any Stevenson, Treasure or Island)
Stevenson = Richard Lewis Stevenson
"by Stevenson" --> Stevenson's
main character --> antagonist
Step Seven: Combine all the evidence and their scores
Based on analysis of evidence for all possible answer compute a final confidence and link back to the evidence.
Watson's correctness will depend on evidence collection, analysis and scoring algorithms and the machine learning used to weight and combine the scores.
Q4. What is Watson’s strategy for seeking out Daily Doubles, and how did it compute how much to wager on the Daily Doubles and the final clue? (AstroCreep5000)
Watson’s strategy for seeking out Daily Doubles is the same as humans—Watson hunts around the part of the grid where they typically occur. In order to compute how much to wager, Watson uses input like its general confidence, the current state of the game (how much ahead or behind), its confidence in the category and prior clues, what is at risk and known human betting behaviors. We ran Watson through many, many simulations to learn the optimal bet for increasing chances of winning.
Q5. It seems like Watson had an unfair advantage with the buzzer. How did Jeopardy! and IBM try to level the playing field? (via Raldi)
Jeopardy! and IBM tried to ensure that both humans and machines had equivalent interfaces to the game. For example, they both had to press down on the same physical buzzer. IBM had to develop a mechanical device that grips and physically pushes the button. Any given player however has different strengths and weakness relative to his/her/its competitors. Ken had a fast hand relative to his competitors and dominated many games because he had the right combination of language understanding, knowledge, confidence, strategy and speed. Everyone knows you need ALL these elements to be a Jeopardy! champion.
Both machine and human got the same clues at the same time—they read differently, they think differently, they play differently, they buzz differently but no player had an unfair advantage over the other in terms of how they interfaced with the game. If anything the human players could hear the clue being read and could anticipate when the buzzer would enable. This allowed them the ability to buzz in almost instantly and considerably faster than Watson's fastest buzz. By timing the buzz just right like this, humans could beat Watson's fastest reaction. At the same time, one of Watson's strength was its consistently fast buzz—only effective of course if it could understand the question in time, compute the answer and confidence and decide to buzz in before it was too late.
The clues are in English—Brad and Ken's native language; not Watson's. Watson analyzes the clue in natural language to understand what the clue is asking for. Once it has done that, it must sift through the equivalent of one million books to calculate an accurate response in 2-3 seconds and determine if it's confident enough to buzz in, because in Jeopardy! you lose money if you buzz in and respond incorrectly. This is a huge challenge, especially because humans tend to know what they know and know what they don't know. Watson has to do thousands of calculations before it knows what it knows and what it doesn't. The calculating of confidence based on evidence is a new technological capability that is going to be very significant in helping people in business and their personal lives, as it means a computer will be able to not only provide humans with suggested answers, but also provide an explanation of where the answers came from and why they seem correct.
Q6. What operating system does Watson use? What language is he written in? (via RatherDashing)
Watson is powered by 10 racks of IBM Power 750 servers running Linux, and uses 15 terabytes of RAM, 2,880 processor cores and is capable of operating at 80 teraflops. Watson was written in mostly Java but also significant chunks of code are written C++ and Prolog, all components are deployed and integrated using UIMA.
Watson contains state-of-the-art parallel processing capabilities that allow it to run multiple hypotheses—around one million calculations—at the same time.
Watson is running on 2,880 processor cores simultaneously, while your laptop likely contains four cores, of which perhaps two are used concurrently.
Processing natural language is scientifically very difficult because there are many different ways the same information can be expressed. That means that Watson has to look at the data from scores of perspectives and combine and contrast the results. The parallel processing power provided by IBM Power 750 systems allows Watson to do thousands of analytical tasks simultaneously to come up with the best answer in under three seconds.
Q7. Are you pleased with Watson's performance on Jeopardy!? Is it what you were expecting? (via eustis)
We are pleased with Watson's performance on Jeopardy! While at times, Watson did provide the wrong response to the clues, such as its Toronto response, it is still a giant leap in a computer’s understanding of natural human language; in its ability to understand what the Jeopardy! clue was asking for and respond with the correct response the majority of the time.
Q8. Will Watson ever be available public [sic] on the Internet? (i4ybrid)
We envision Watson-like cloud services being offered by companies to consumers, and we are working to create a cloud version of Watson's natural language processing. However, IBM is focused on creating technologies that help businesses make sense of data in order to enable companies to provide the best service to the consumer. So, we are first focused on providing this technology to companies so that those companies can then provide improved services to consumers. The first industry we will provide the Watson technology to is the healthcare industry, to help physicians improve patient care.
Consider these numbers:
• Primary care physicians spend an average of only 10.7 - 18.7 minutes face-to-face with each patient per visit.
• Approximately 81% average 5 hours or less per month – or just over an hour a week—reading medical journals.
• An estimated 15% of diagnoses are inaccurate or incomplete.
In today’s healthcare environment, where physicians are often working with limited information and little time, the results can be fragmented care and errors that raise costs and threaten quality. What doctors need is an assistant who can quickly read and understand massive amounts of information and then provide useful suggestions.
In terms of other applications we’re exploring, here are a few examples of how Watson might some day be used:
• Watson technology offered through energy companies could teach us about our own energy consumption. People querying Watson on how they might improve their energy management would draw on extensive knowledge of detailed smart meter data, weather and historical information.
• Watson technology offered through insurance companies would allow us to get the best recommendations from insurance agents and help us understand our policies more easily. For our questions about insurance coverage, the question answering system would access the text for that person’s actual policy, the other policies that they might have purchased, and any exclusions, endorsements, and riders.
• Watson technology offered through travel agents would more easily allow us to plan our vacations based on our interests, budget, desired temperature, and more. Instead of having to do lots of searching, Watson-like technology could help us quickly get the answers we need among all of the information that is out there on the Internet about hotels, destinations, events, typical weather, etc, to plan our travel faster.
Q9. How raw is your source data? I am sure that you distilled down whatever source materials you were using into something quick to query, but I noticed that on some of the possible answers Watson had, it looked like you weren't sanitizing your sources too much; for example, some words were in all caps, or phrases included extraneous and unrelated bits. Did such inconsistencies not cause you any problems? Couldn't Watson trip up an answer as a result? (via knorby)
Some of the source data was very messy and we did several things to clean it up. It was relatively rare, less than 1% of the time that this issue overtly surfaced in a confident answer. Evidentiary passages might have been weighed differently if they were cleaner, however. We did not measure how much of problem messy data effected evidence assessment.
Q10. I'm interested in how Watson is able to (sometimes) use object-specific questions like "Who is --" or "Where is --". In the training/testing materials I saw, it seemed to be limited to "What is--" regardless of what is being talked about ("What is Shakespeare?"), which made me think that words were only words and Watson had no way of telling if a word was a person, place, or thing. Then in the Jeopardy challenge, there was plenty of "Who is--." Was there a last minute change to enable this, or was it there all along and I just never happened to catch it? I think that would help me understand the way that Watson stores and relates data. (via wierdaaron)
Watson does distinguish between and people, things, dates, events, etc. certainly for answering questions. It does not do it perfectly of course, there are many ambiguous cases where it struggles to resolve. When formulating a response, however, since "What is...." was acceptable regardless, early on in the project, we did not make the effort to classify the answer for the response. Later in the project, we brought more of the algorithms used in determining the answer to help formulate the more accurate response phrase. So yes, there was a change in that we applied those algorithms, or the results there-of, to formulate the "who"/"what" response.
Q11. Now that both Deep Blue and Watson have proven to be successful, what is IBM's next "great challenge"? (xeones)
We don’t assign grand challenges, grand challenges arrive based on our scientists' insights and inspiration. One of the great things about working for IBM Research is that we have so much talent that we have ambitious projects going on in a wide variety of areas today.
• We are working to make computing systems 1,000 times more powerful than they are today from the petascale to the exascale.
• We are working to make nanoelectronic devices 1,000 times smaller than they are today, moving us from an era of nanodevices to nanosystems. One of those systems we are working on is a DNA transistor, which could decode a human genome for under $1000, to help enable personalized medicine to become reality.
• We are working on technologies that move from an era of wireless connectivity—which we all enjoy today—to the Internet of Things and people, where all sorts of unexpected things can be connected to the Internet.
Q12. Can we have Watson itself / himself do an AMA? If you give him traditional questions, ie not phrased in the form they are on jeopardy, how well will he perform- how tailored is he to those questions, and how easy would it be to change that? Would it be unfeasible to hook him up to a website and let people run queries?
At this point, all Watson can do is play Jeopardy and provide responses in the Jeopardy format. However, we are collaborating with Nuance, Columbia University Medical Center and the University of Maryland School of Medicine to apply Watson technology to healthcare. You can read more about that
Q13. After seeing the description of how Watson works, I found myself wondering whether what it does is really natural language processing, or something more akin to word association. That is to say, does Watson really need to understand syntax and meaning to just search its database for words and phrases associated with the words and phrases in the clue? How did Waston's approach differ from simple phrase association (with some advanced knowledge of how Jeopardy clues work, such as using the word "this" to mean "blank"), and what would the benefit/ drawback have been to taking that approach?
Watson performs deep parsing on questions and on background content to extract the syntactic structure of sentences (e.g., grammatical and logical structure) and then assign semantics (e.g., people, places, time, organization, actions, relationship etc). Watson does this analysis on the Jeopardy! clue, but also on hundreds of millions of sentences from which it abstracts propositional knowledge about how different things relate to one another. This is necessary to generate plausible answers or to relate an evidentiary passage to a question even if they are expressed with different words or structures. Consider more complex clues like: “A relative of this inventor described him as a boy staring at the tea kettle for an hour watching it boil.” Sometimes, of course, Jeopardy questions are best answered based on the weight of a simple word associations. For example, "Got ___ !" – well if "Milk" occurs mostly frequently in association with this phrase in everything Watson processed, then Watson should answer "Milk". It’s a very quick and direct association based on the frequency of exposure to that context.
Other questions require a much deeper analysis. Watson has to try many different techniques, some deeper than others, for almost all questions and all at the same time to learn which produces the most compelling evidence. That is how it gets its confidence scores for its best answer. So even the ones that might have been answered based on word-association evidence, Watson also tried to answer other ways requiring much deeper analysis. If word association evidence produced strong evidence (high confidence scores) then that is what Watson goes with. We imagine this is to the way a person might quickly peruse many different paths toward an answer simultaneously but then will provide the answer they are most confident in being correct.
Q14. In the time it takes a human to even know they are hearing something (about .2 seconds) Watson has already read the question and done several million computations. It's got a huge head start. Do you agree or disagree with that assessment?
The clues are in English—Brad and Ken's native language; not Watson's. Watson must calculate its response in 2-3 seconds and determine if it's confident enough to buzz in, because as you know, you lose money if you buzz in and respond incorrectly. This is a huge challenge, especially because humans tend to know what they know and know what they don't know.
Watson has to do thousands of calculations before it knows what it knows and what it doesn't. The calculating of confidence based on evidence is a new technological capability that is going to be very significant in helping people in business and their personal lives, as it means a computer will be able to not only provide humans with suggested answers, but also provide an explanation of where the answers came from and why they seem correct. This will further human ability to make decisions.
This Q&A was originally published on Reddit’s blog and republished here with permission.