Who Cares About Your Health Data?

“Come, let us go down and there confuse their language, that they may not understand one another’s speech.”—Genesis 11:4-9

The masters of data are feverishly quantifying everything you digitally touch, from your guilty love of Lana del Ray to the goofy snowboard you just bought online. Your personal health data, on the other hand, most likely has been stored on a sheath of paper tucked away in your doctor’s file cabinet.

No more. From the recent rise of digitized health records to a dramatic increase in full DNA sequences to devices that measure your daily steps and blood pressure, the bits and bytes pour forth. The world has seen nothing like this before—not even from YouTube videos, currently the most prolific source of human data.

No one knows exactly how much health data is being produced right now, although a recent study out of Cold Spring Harbor Laboratory in New York reports that genetic data alone has reached the 25-petabyte mark. (That’s a 25 followed by 15 zeros. YouTube videos equaled 100 petabytes last year).

In the next decade, the study predicts, as we move from about 250,000 people now sequenced to millions, genomic data will break through the Exabyte barrier—that’s 18 zeroes. Never mind all of the other health data that’s possible to collect right now.

The question is: What does all of this health data mean for individuals—for you and your family and friends? And what can it tell us about our health present and future?

Regrettably, not much, although this may be changing as health care begins a monumental shift from an era of collecting data to actually understanding what it means.

Take me, for example. For the past 14 years I have been collecting data on myself as a reporter in what I call The Experimental Man Project. It’s a reporter-as-guinea-pig quest to test-drive new medical technologies to see what is useful for a healthy person. The project has produced almost 10 terabytes of data (that’s a 10 with 12 zeros).

To put this in perspective, that’s more data for one guy than all humans produced from the beginning of history through the late 1990s.

Since 2001, when I started my self-quantifying quest with a story in Wired about getting my DNA sequenced, I have been poked, prodded, and scanned in thousands of tests. This includes a full genome sequence of all 6 billion of my As, Ts, Cs, and Gs; extensive testing on environmental toxins inside my body; brain and body scans; profiles of the proteins in my blood and the microbes in my gut; and more.

If scientists collected this much data on all 7 billion-plus people on Earth, the number of zeroes would be over 1 yottabyte. That’s 24 zeroes, as in 1,000,000,000,000,000,000,000,0000.

Genetically, I now have 31,307 genotypes annotated—DNA markers that cover every characteristic imaginable, from my slightly elevated risk for a brain aneurism and glaucoma to whether I have blue eyes (yes) or can taste bitterness (no). I have a high risk for low empathy (but I don’t care what you think), and metabolize caffeine so fast that I can drink Joe before I go bed (can you?).

As you read this, an automated program called Promethease is scouring DNA databases in search of newly identified markers for heart attack risk, diabetes, cancer, and more. You can check out my results here.

Unfortunately, most DNA markers in my body are only statistical correlations within populations. This means that genetic variants that differ from person to person apply mostly to how you fit into a study group, not as an individual. Nor have scientists clinically tested most markers over time to see if people carrying, say, a high-risk marker for inflammatory bowel disease actually manifests the disease. Most markers for common diseases also tend to be minor factors in predicting or explaining whether or not a person gets diabetes or stroke.

For instance, I have 262 genetic markers associated with cardiovascular disease. These tell me that I have a low, medium, and high risk for heart attack, hypertension, and stroke, depending on the study and the marker. Obviously I can’t have all three. But even if the markers were accurate, most confer a small risk compared to, say, eating lots of greasy burgers or smoking.

Possibly more useful is a gene mutation that I carry called CYP2C9*3 that may cause me to very quickly metabolize the blood thinner warfarin. This genetic trait could be critical for me if I need this drug after a heart attack or during surgery—it’s commonly used to prevent clots forming in the brain and elsewhere—since the normal dose could cause me to over bleed.

Genetics has been far more successful at identifying rare genetic diseases such as Tay Sachs and Fragile X. Scientists also are making dramatic discoveries linking specific genetic mutations to cancer. These can help target drugs to more effectively battle the disease. Mostly, though, genetics so far has been frustratingly obtuse in predicting accurate and meaningful risks for common diseases.

Environmentally, toxicologists have detected hundreds of toxins in my blood and urine—trace amounts of heavy metals, PCBs, and pesticides. For instance, my blood contains 50 percent higher levels of DDE—the chemical that the pesticide DDT breaks down into—probably because I grew up in Kansas when nearby farmers were still spraying fields with this now-banned chemical.

Scientists also spotted higher than average levels of flame retardants inside me, probably because the airplanes I spend too much time flying in are drenched in flame-stopping chemicals. Yet the impact of these wee amounts of chemicals—parts per billions and parts per trillion—is mostly unknown.

As for my brain, I have spent over 25 hours getting my noggin scanned using magnetic resonance imaging (MRI). One test hunted for trace amounts of protein plaque that can be predictive of a person with Alzheimer’s disease in their future. Thankfully, I came out clear.

Other brain tests, called functional MRIs, measured the blood flow in my brain when stimulated with ideas, emotions, and images, such as cute kittens and people having sex. These tests suggested that I am “normal” in how my brain responds to anxiety, and also is “normal” for greed when neuroscientists tested me for my motivations around making money—whatever that means.

More recently I’ve been piling up measurements using ever-sleeker personal devices that keep track of my steps, blood pressure, blood oxygen levels, weight, and pulse. This data is interesting for me to track my progress against myself, and my friends. Physicians, however, don’t yet know what to make of this real-time data, since their studies and guidelines use measurements taken mostly during exams.

What’s lacking are human trials over time to see, for instance, how your daily stepping out actually impacts your weight or sleep patterns.

Clinical trials are not as fun as building cool gadgets and collecting data. They are expensive and time-consuming, but essential.

What’s really needed is a sequel to the data collection revolution—what Stanford geneticist Michael Snyder and others have called Interpretomics, or the science of interpreting medical data. This comes from the use of the suffix “-omics” to describe genomics, microbiomics, and so forth.

Researchers should conduct long-term studies that accurately assess the impact of individual data points—from genetics to health records—and also how these factors work together to impact our health.

By the way, Snyder has collected even more data on himself than I have—over 30 terabytes, making him one of a handful of people to crack the terabyte barrier. Another one is UC San Diego physicist Larry Smarr. He has tracked the microbes in his gut for several years, among a vast number of other tests.

Anyone following my testing over the years has already heard some of what I’m reporting here in talks and articles, and in my book, Experimental Man. I have been searching for my own interpretome for years.

Thankfully, progress is being made. Just this week a new company called Arivale announced that it has raised $36 million to assess thousands of people over time using a battery of several dozen tests. These include a full genome sequence and measurements of environmental toxins, standard body chemistry tests, and steps as measured on a Fitbit-style device, plus regular scans of a person’s gut microbes.

“We want to create a database of actionable possibilities that is scientifically validated,” said Arivale co-founder Lee Hood*, a physician and the co-founder and head of the Institute for Systems Biology in Seattle. Hood already has tested 107 individuals, or “pioneers,” using the company’s protocol. He said that they are looking into the correlations among some 35,000 factors, from genetics to, diet to find out what all this data means.

Arivale is hoping to track customers long-term, but will emphasize real-time interpretation and lifestyle guidance to improve health, explained CEO and co-founder Clayton Lewis. Lewis is a triathlete and a general partner at Maveron, a fund set up by Starbuck founder Howard Shultz, which led this Arivale investment round. “We are offering nutritional and behavioral coaching that integrates all of your data,” he added.

Companies have made claims before that they can use genetics and other high-tech tests to change their customers’ lives. Most, however, have come up short, given the current limitations in using genetics for predicting people’s future health.

But in a quick peek at what the company is offering, it appears that Arivale is heading in the right direction. For one thing, Hood and Lewis readily admit that some genetic tests for common diseases remains nascent and in need of further validating—although a few are actionable now, said Hood. (I have asked to be included in the next round of testing, so stand by for a more detailed report, hopefully in a few months.)

Arivale will reportedly charge $2,000 for its testing and analysis, which isn’t cheap, although Lewis believes that eventually this sort of intensive monitoring and interpretation will pay for itself in improved health outcomes. “We already are seeing many of our original 100 pioneers saying they feel better, and we can see that some are healthier,” adds Hood.

Other companies and projects are forming to address the interpretome issue, including the Personalized Genome Project founded by geneticist George Church* at Harvard, and a new company formed by geneticist Craig Venter called Human Longevity, based in San Diego. (Send me companies and projects that you think should be on this list).

In many ways the collection of data in the past few decades is our civilization’s greatest invention. It’s our Parthenon, Pyramids, Great Wall, printing press, and steam engine rolled into one. But this brainchild needs what is perhaps an even greater invention just now being tinkered with in health care—the invention of using data wisely.

* Lee Hood and George Church are unpaid advisers to Arc, co-founded by the author of this column.

Who Cares About Your Health Data?

Personal health data is piling up fast, but what does it mean, and who is trying to make sense of it?

David Ewing Duncan