One of the recent hit stories in mainstream media is about deadly air pollution in China. Numerous newsroom editors fell in love with a specific headline: “China Air Pollution Kills 4,000 People Per Day: Researchers,” said Bloomberg Business. This lede also appeared on The Atlantic, Christian Science Monitor, The Guardian, and so on.
We are of course concerned about the quality of air in a populous country that has become the global epicenter of industrial production. We just don’t think the deaths-per-day metric serves the story well.
Dividing a large number by 365 days is a favorite technique of journalists. Here are a bunch of other recent examples we found online:
Periscope has 10M Registered Users Watching 40 Years of Video Per Day, TechCrunch
J.J. Watt Eats Up to 9000 Calories Per Day, Is an Unrepentant Brunch-Lover, Bleacher Report
America’s law enforcement officers have shot and killed upwards of 385 people so far this year… that’s a rate of about 1 every 9 hours, or 2.5 shootings per day, Washington Post
Not all large numbers should be expressed in daily terms. Two types of problems can arise. Sometimes, the quantity being measured does not really scale with time. Sometimes, the per-day measure gives the wrong impression that it is evenly distributed over time. Readers are thus misled about the magnitude of the metric.
The following table shows deaths from cancer in selected countries, expressed in three ways: annual deaths, deaths per day, and proportion of total deaths.
Cancer is a bigger killer in the United Kingdom than in the United States, if one considers that in 2010, cancer deaths accounted for 29 percent of all deaths in the UK compared to 24 percent in the U.S. But wait, on a per-day basis, cancer killed only 442 British people, compared to 1,615 Americans. So is cancer more than three times as deadly in the U.S. as in the UK?
By the same logic, we are led to believe that cancer kills 100 times more people in the U.S. than in Singapore. There is nothing wrong with the calculation but per-day metrics are frequently not informative.
The issue is that the number of deaths scales with population. Comparing daily rates confuses two factors: a difference in population, and a difference in cancer mortality. China has a fifth of the world’s population, and notwithstanding the controversial birth-control policy, its population grew about 15 million in 2010. So, even if air quality did not worsen, the 4,000-dead-per-day number would grow year after year.
We do not disagree that the air quality in China is not good. The culprit is division by the number of days in a year, which is a fixed number. Dividing by population instead of by 365 results in a more meaningful metric. Another option is to divide by the number of deaths from all causes.
Next, consider the item about Periscope, a Twitter app for streaming live events. The 40 years of videos sounds impressive at first glance but divided by 10 million users, that amounts to just about two minutes per user per day. But this per-user, per-day metric is misleading as well.
First, users are not created equal. Only a proportion of those registrants will be actively using the app. So we should divide the minutes viewed by the average number of viewers per day.
Second, the popularity of streams is extremely uneven. One of the first events that placed Periscope on the map was illegal streaming of the Mayweather-Pacquaio fight in May. A splurge of viewing occurred on a single day. When the data contain spikes like this, one should be warned against using a per-day average.
When we are presented with a statistical average such as two minutes of viewing per day, the natural tendency is to extrapolate linearly. If a user watched two minutes per day, then in 10 days, the user should have watched 20 minutes. If the data behind that average value includes the Mayweather-Pacquaio fight, then the forecast would likely to be too optimistic.
An MIT statistics professor, Arnold Barnett, has been tracking airline safety for decades. He watched as the developed nations pushed the death risk per flight from 1 in 1 million in the 1960s to 1 in 13 million in the 1990s. Then, in the first half of 2000s, the trend reversed. As Dr. Barnett said, “we lost it all on a Tuesday in September.” No one died in plane crashes between 2000 and 2004 but it took just one extreme act of terrorism to turn the metric upside down. Should such extreme events be included in computing averages? If the goal is to forecast risk, we tend to exclude them as we hope history would not repeat itself.
A per-day metric is useful when time is a true driver, such as time spent playing video games per day, or when the underlying process is more or less regular, such as the number of vehicular accidents per day. The JJ Watt quote above is an example of appropriate use. Eating is a daily activity, and calories do accumulate. In many other situations, expressing large numbers in daily terms just spreads misinformation.
When reading articles about deaths by air pollution, we should also pay attention to how the data come about. Have you ever seen a death certificate or an obituary that attributes a death to air pollution? We haven’t. The basic sources of mortality data tell us about cancers and diseases, accidents and so on but not about indirect causes of death. To get around the lack of basic data, researchers at the Berkeley Earth project used a set of equations to attribute deaths to air pollution. So our confidence in the conclusion depends on our confidence in the methodology.
In his new book, Sex by Numbers, David Spiegelhalter, a statistics professor at the University of Cambridge, came up with a classification of numbers used by the media. He said numbers fall into five categories: 100 percent trusted, 75 percent trusted, 50 percent trusted, 25 percent trusted, and not-to-be-trusted. We think the 4,000 deaths number is in the 25 percent trusted category.