In recent remarks to the Electronic Privacy Information Center, Apple’s CEO Tim Cook declared that “people have a fundamental right to privacy.” He has begun an assault on the business model of Big Data as currently shaped by Internet giants Google and Facebook.
It was Eric Schmidt, while CEO of Google, who famously pronounced the death of online privacy in 2009, telling people “if you have something to hide, maybe you shouldn’t be doing it in the first place.”
Under Schmidt’s watch, Google launched Gmail in 2005, perfecting the business model of providing a free service in exchange for access to user data. With Mark Zuckerberg at its helm, Facebook has pushed the boundaries of using data collected from its users, such as personalizing advertising copy by including images of your friends, and pushing automatic posts to your social network when you play a song on Spotify.
Cook may already have found an ally in University of Pennsylvania’s Anneberg School for Communication. A survey managed by the School and released in June found that about 90 percent of consumers do not consider it fair to trade personal data for benefits such as targeted ads and discounts. The researchers produced evidence that the majority of consumers felt resigned to the status quo in which they did not feel in control of their own data.
The UPenn story was picked up by mainstream media, such as the New York Times and Fortune. Natasha Lomas of TechCrunch, a leading online news website that tracks the tech industry, reeled off an invective titled “The Online Privacy Lie is Unraveling,” calling the collection of data a “heist of unprecedented scale.”
Wait a minute. Didn’t Harvard Business Review only a month ago take the opposite stance? In a long article, marketing executives are told the customers demand fair value for data collection, and advised to creatively design non-monetary trade-offs because “in-kind” benefits work better than dollars. The authors are consultants at frog, a global product design and branding agency founded in 1969, with a sterling client list, including SAP, GE, HP and Microsoft. Most notably, frog designed some of the early Apple computers. How did these consultants support their arguments? They have conducted a proprietary survey.
So within the course of two months, two top-class media outlets published key findings on digital privacy that contradicted one another. Each report cited a professional survey, sourced to an organization of stature, UPenn and frog. This state of affairs is all too familiar nowadays. It frustrates me and it should you too.
Surveys are not all created equal. In this instance, I find the frog study less credible for a number of reasons.
The frog analysis was based on 900 people across five countries. That is a depressingly small sample, marginally sufficient if analyzed in aggregate. Drawing country-level conclusions with an average of fewer than 200 responses is far below standard statistical practice. By contrast, the UPenn study had 1,506 respondents, all within the U.S., which means it is almost eight times the size of the frog sample.
UPenn conducted telephone interviews, half of which to mobile users. Response by telephone is typically considered more reliable than online answers. frog did not disclose whether its study was done online, via telephone, or otherwise. Nor did frog mention any qualifying questions. The researchers at UPenn required respondents to be somewhat active online, effectively removing people who are probably in the dark about oneline data collection practices.
The consultants at frog also made a blanket claim that their 900-strong sample is demographically representative of the global Internet user but provided no details. The UPenn report lists key statistics on its survey respondents (Table 1), such as age, gender, race and income. Readers are therefore able to judge whether the respondents are sufficiently average.
There are reasons to doubt whether those 900 respondents from the U.S., China, India, Great Britain and Germany can truly represent the world. The sample did not include anyone from South America, Africa, Middle East, Scandinavia, or Australasia. Besides, the “general online population” doesn’t really exist. Who defines it? What factors describes it?
Surveys are a very powerful tool but they can produce widely varying results depending on the selection of respondents and the design of the questionnaires. The mainstream media do not currently have the know-how or the incentive to weed out poorly executed surveys, which is a big part of the problem.
While these two surveys focused on learning what customers know or don’t know about what data businesses collect about them, and how businesses use the data, these researchers failed to ask a crucial question.
Neither research team asked consumers how well they understand the “benefits” promised to them in the trade-off arrangement. There is a tendency in mainstream media to glorify algorithms developed by Facebook, Google and other businesses.
To wit, consider this question from the UPenn survey: “It’s OK if a store where I shop uses information it has about me to create a picture of me that improves the services they provide for me.” Fifty-five percent of Americans disagree with this statement.
The source of dissent is likely the unease about spying and profiling, possibly collateral damage from the scandal about government digital surveillance. Notice that the survey designer accepted that the information collected will produce an accurate profile of you, and thus the data-science algorithms analyzing such profiles will succeed in improving the service provided to you. The trouble is that the promised benefits frequently fail to materialize.
Algorithms make mistakes all the time. Just think about auto-correction on your iPhone, the recommendations on Netflix, or the coupons automatically printed at Kmart’s checkout. Just this past week, a fraud detection algorithm incorrectly blocked my ATM card while I was traveling in Germany, despite having called in a travel notification before the trip.
I wonder how much the survey result might shift if it were clear that, say, only 30 percent of the time will the shop deliver an improved service to you.
Even if algorithms were infallible, business goals would not usually align with consumer wants. Everyone agrees no one desires dragnet advertising but has anyone ever said “give me some targeted advertising”? I don’t think so, and neither does Tim Cook.
Consumers love to get discounts for the purchases they would make anyway but to businesses, those are unnecessary discounts on sure purchases. Marketing offers are profitable when you buy something you were not considering, or spend more than you were intending. In other words, algorithms that delight you or I probably aren’t contributing much value to the bottom line.
Don’t believe everything you read in the media. Much of it is unfiltered press release material. Because the frog survey lacks important details, it is difficult to accept its findings on face value. Neither survey examined the axiom that data science is in the service of consumers—in reality, the primary patron of data science is the business owner.
Andrew Gelman and Kaiser Fung are statisticians who deal with uncertainty every working day. In Statbusters they critically evaluate data-based claims in the news, and they usually find that the real story is more interesting than the hype.