Advertisement

Who controls your data?

We requested our personal information from dozens of companies. Here’s what they gave us -- and what they didn’t.

The average American, one study tell us, touches their phone 2,600 times per day. By the end of a given year, that's nearly a million touches, rising to two million if you're a power user.

Each one of those taps, swipes and pulls is a potential proxy for our most intimate behaviors. Our phones are not only tools that help us organize our day but also sophisticated monitoring devices that we voluntarily feed with interactions we think are private. The questions we ask Google, for instance, can be more honest than the ones we ask our loved ones -- a "digital truth serum," as ex-Googler and author Seth Stephens-Davidowitz writes in Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are.

Hoover up these data points and combine them with all of our other devices -- smart TVs, fitness trackers, cookies that stalk us across the web -- and there exists an ambient, ongoing accumulation of our habits to the tune of about 2.5 quintillion (that's a million trillion) bytes of data per day.

Sometimes that data gets spliced, scattered and consolidated across a web of collaborators, researchers and advertisers. Acxiom, for instance, claims 1,500 data points for each of the 500 million people in its database, including most US adults. Just in the past few months, Facebook was reported to have asked hospitals, including Stanford University School of Medicine, to share and integrate patients' medical data with its own (the research project has since been put on hold). In April, gay dating app Grindr was revealed to have shared customers' HIV status with two app-optimization companies. And who suspected completing an online personality test would pave the way for President Donald Trump's targeted political advertising?

In short, the close relationships we have with our devices are not monogamous. But what's a privacy-valuing citizen who still wants or needs to partake in our fabulously networked 21st-century society to do?

There likely could not be a more timely moment for the public to care about the General Data Protection Regulation (GDPR), the European Union's superlatively complex, contested, sweeping data-privacy law that came into force on May 25th.

Its key rights include access to personal data, explanations of the algorithms that shape citizens' lives, portability (or moving your data from one company to another) and deletion. Years in the making, it affects any global organization's business in the European Union, leading companies worldwide to spend millions of dollars bringing their privacy standards into compliance, in some cases standardizing their practices outside the EU too.

So we decided to test the system. A team of nine Engadget reporters in London, Paris, New York and San Francisco filed more than 150 subject access requests -- in other words, requests for personal data -- to more than 30 popular tech companies, ranging from social networks to dating apps to streaming services. We reached out before May 25th -- when previous laws for data access existed in the EU -- as well as after, to see how procedures might have changed.

The EU has had a data-protection directive since 1995, yet studies have repeatedly shown that its rights weren't well-enforced. The GDPR has been law since 2016, yet it only grew teeth this May, with companies now open to fines of up to 4 percent of global annual revenue.

The EU has had a data-protection directive since 1995, yet studies have repeatedly shown that its rights weren't well-enforced.

Indeed, the history of data privacy is really a tale of violation without meaningful justice. For example, hacked credit agency Equifax is still in business, and its customers can't even cut ties with it if they wish to. In the UK, Facebook was fined £500,000 ($640,000) for its role in the Cambridge Analytica scandal, the maximum sum under laws at the time of the incident -- but also equal to the amount of cash the company makes every 5.5 minutes.

If the same thing happened today, Facebook could be hit with fines potentially in the billions of dollars. Already, about 1,000 US-based news websites including the Los Angeles Times and Chicago Tribune are inaccessible in the EU, and in a recent Deloitte survey only about a third of organizations could say they were fully compliant.

The hope is that the GDPR will be a gold standard for how to feasibly check the power of big tech companies whose market value dwarfs the GDP of some of the countries trying to hold them accountable.

We contacted companies by email or through their websites when they specified a method in their privacy policies, or we sent a letter when they didn't. (Instagram, for instance, only added an email address for data requests on May 25th and didn't reply to our mailed request.) Our letter was a modified version of the template on the UK Information Commissioner's Office's website, quoting directly from the relevant laws. We asked for information on what data was held on us, where it came from, who it's been sent to and how we've been profiled, among other questions.

What we requested

Our access requests were based on this template from the UK Information Commissioner's Office. Before May 25th we quoted from the UK's Data Protection Act of 1998 or a similar law in the subject's home country. After May 25th, we cited Article 15 of the EU General Data Protection Regulation.

Specifically, we asked for:

  • Any and all personal data an organization was holding on us

  • The categories of personal data concerned

  • An explanation of the purpose of collecting the data

  • Who has received or will receive the data

  • How long the personal data will be stored and the criteria used to determine that period

  • If the personal data was not collected directly from the data subject, where it was collected from

  • If the data was used in any automated decision-making, including profiling, referred to in Article 22(1) and (4); meaningful information about the logic involved; and the significance and the envisaged consequences for the data subject

We included identifying information (name, address, email, phone number) and a copy of an identification document such as a driver's license. We noted that under the GDPR, access requests must be responded to within a month.

Our requests were made from personal email and home addresses, in an effort to be treated as much like regular consumers as possible. In most cases, we sent follow-up questions identifying ourselves as reporters.

"Data requests are a window into the soul of on organization," said Hadi Asghari, an assistant professor at Delft University of Technology in the Netherlands, whose research has shown how little EU access laws have been adhered to in recent years. And we made unexpected discoveries: the distorted, fun house mirror profile that Acxiom held on one reporter; a kink app with lax security practices; a dating service that sent us a stranger's data. But we also saw the wildly divergent extents to which companies are adjusting to the GDPR. Personal information is the commodity that fuels the big data economy, and like all commodities, there's a fight for its control.

Data retrieval
How big tech manages your personal information

There is an elephant in the room to address here: Understanding data privacy is fundamentally boring, if not unintelligible, to a regular user.

Privacy policies are the backbone of understanding data rights. They're also legalese-packed documents that are thousands of words long and describe an infrastructure of data movement that many companies can't keep track of themselves. Reading every privacy policy you encounter in a year would take 76 full workdays, according to a 10-year-old study by Carnegie Mellon researchers (and consider how many more apps we encounter daily in 2018). To not read them is a basic, wholly understandable human aversion toward ennui.

This touches on what academics call the digital-privacy paradox. When polled, people say they care deeply about privacy, but in reality, they will give up their data or even the email addresses of their friends in exchange for something as trivial as a pizza.

It's with this in mind that we waded through all sorts of corporate responses to our data requests: emails, Excel spreadsheets, data-download tools. Beyond simply what was given to us, would it be understandable, even meaningful?

Netflix, for instance, provided full glossaries for its tables of data in a single PDF.

Beyond simply what was given to us, would the data be understandable, even meaningful?

Spotify, in contrast, provided its data through an online-download function. Inside, one UK-based reporter received 101 JSON files, and another received 90. While admirably comprehensive, these are dumps from databases normally read by computers: There's no way to reasonably make sense of the file names, let alone their plain-text contents. Spotify Customer Service did not provide full explanations of the file names, and a spokeswoman said while we could ask about specific data fields, the company did not have a glossary for all of its files.

(A third reporter who made an identical request from the US received only seven files with basic information like payment methods, playlists and followers. The spokeswoman said that "there are no differences to the information shared based on countries" and that worldwide users could request additional files by contacting customer service, but this interaction points to an obvious conundrum: How do you ask for files that you don't know exist?)

Instagram, too, offered its data -- aside from copies of photos and videos -- in reams of plain-text JSON files, which a spokeswoman justified as a more portable format. The right to portability, however, is separate from the right to access one's data.

At least it provided some information. Dating app Bumble sent a UK reporter nothing more than basic personal info (name, age, language), the photos he'd uploaded and the last year of IP addresses and login times. A request from a US-based reporter went unanswered for more than a month; the company eventually provided data 12 weeks later.

Another common theme was for companies to serve up sections of their privacy policy in reply to data requests, essentially telling us their general policy for data use but not what happened to our data specifically.

Snapchat, for instance, referred us to its privacy policy, which states that it "may" procure information about us from affiliates and third parties or share our data with them, though it does provide a list of who those parties may be. Tinder said it "may" receive information about us from partner organizations, without specifying whom. Upon asking Tinder's public relations team for clarification, we received no response.

The Article 29 Working Party, formerly a nonbinding guidance group made of EU representatives and data experts, has said that "language qualifiers such as 'may', 'might', 'some', 'often' and 'possible' should... be avoided" in replying to data requests. Legally, the GDPR demands that any communication is "in a concise, transparent, intelligible and easily accessible form, using clear and plain language." Yet we found this to be incredibly inconsistent.

Researchers made a similar conclusion in July. An artificial intelligence analysis of 14 privacy policies (including those from Amazon, Microsoft and Uber) developed by the European Consumer Organisation (BEUC) and the European University Institute in Florence found that a third of clauses were "potentially problematic" or provided insufficient information.

The key to understanding how our data is used, however, is to see how it's analyzed. The categories we are put into and how our behavior is modeled have a profound effect on how we use technology. Unsurprisingly, most companies kept this information buttoned up.

The GDPR provides for full explanations as well as the ability to opt out of automated decision-making and profiling when it involves "legal" or "similarly significant" effects. While it's unclear precisely how the latter is defined, it likely relates to decisions like credit scores, university admissions and issues involving health or employment.

Thus, in reply to our access requests, Netflix said nothing more about why we were recommended certain films other than that those recommendations were "driven by members' viewing activity and service interactions." The posts we see on Instagram are ranked, according to the company, on "timeliness of the post, your connection to the person posting, and the likelihood you'll be interested in the content." Tinder claimed to have moved away from the desirability rating named an Elo score that it assigned to every user but said that "we cannot provide any information that reveals or otherwise compromises all or any part of our proprietary trade secrets or know how, which risks potential infringement of such intellectual property."

One insight into how our data is processed is through Facebook's "ad interests," a list of 357 categories I had been placed into for advertisers to match me with. There were obvious ones ("writer," "Liverpool FC," "soul music"), questionable cases ("First Epistle to the Thessalonians") and some whose definitions I didn't entirely understand ("excited state," "reality," "outlaw"). There's no specific explanation for why I was assigned these interests, and a Facebook spokesman declined to explain, though users can remove themselves from categories. The spokesman said the categories were based only on how you engage with Facebook pages and did not consider other information the company may have on you like location data or browsing history.

My Facebook data dump noticeably increased after May 25th, from 10 advertisers with my contact info to 180.

I could also see advertisers on Facebook who already had my contact details from elsewhere. Those included foreign versions of services I already use (EA Sports Norway) and instances where I knew I'd provided my email but did not know they'd track me down there: the venue for a preview of a friend's musical, the e-retailer where I bought a backpack, a half-marathon I ran two years ago.

My Facebook data dump also noticeably increased before and after May 25th: from 10 advertisers with my contact info to 180. My ad interests swelled from 198 to 357. One representative from Facebook told me the former was due to a bug that caused fewer advertisers to be shown between October 2017 and April 2018; another rep told me the latter was due to fluctuations in the ad categories Facebook maintains. Both denied the change was related to the GDPR.

There are three types of data, said Frederike Kaltheuner, a data privacy and security expert at London-based nonprofit Privacy International. The first is data that you consciously give companies: your name, email, date of birth. The second is automatically monitored: where you log in from, what time you do it, where else you visit on the web. The third and most difficult to obtain is data that's modeled or predicted from other data, such as your quantified attractiveness or trustworthiness.

"A lot of organizations don't consider modeled data to be personal data," said Kaltheuner. "The main misunderstanding is people only ever think about the data that they actively share. And also companies love to talk about the data that we share."

"The main misunderstanding is people only ever think about the data that they actively share."

This is why, essentially, the Cambridge Analytica scandal was an outrage. We already know that companies amass data on us. What we don't know is how they interpret that data and then use it to influence our lives.

When we ask companies for all the data and profiling they do on us and they provide us back only the data we've given to them, they may not only be omitting stray details but also furthering the idea that this is all that our personal information is used for. We might think we're trading our data for a service we can see -- inputting our location to Google Maps, getting directions to the cinema -- but how companies process that data is more opaque -- tracking our location to tell others how busy the theater is or which roads to avoid. (Google, it was recently revealed, will even track where you are when you turn location sharing off.) Facebook has also come under fire for researching both how to manipulate users' emotions and (separately) how to identify when they're in psychologically vulnerable states. Just recently, we learned it's giving users reputation rankings too -- naturally without explaining how they're calculated. In short, data is our currency, but we often don't know how much we're truly paying.

How to request your data from a tech company

A key way to understand how the personal information ecosystem works is to request your own. Since the GDPR came into force, most companies have provided a mechanism of sorts for requesting data.

The most straightforward are data-download functions, although they're often a dump of everything you've uploaded or posted, without a lot of context on what's happened to it. For example, here are instructions on a few major sites:

For more details, it helps to reach out to the companies directly with specific, legally robust questions. There are tools to help:

  • My Data Request is a comprehensive database of how to reach 111 major companies, with template letters according to where you live. However, for some companies like Facebook, the site doesn't provide contact details and only refers users to data-download functions.

  • Access My Info is an easy-to-use data-access tool for Canadian residents, developed by nonprofit Open Effect and the University of Toronto's Citizen Lab.

  • My Data Done Right is an upcoming access-request tool for EU residents launching in September by Dutch digital rights organization Bits of Freedom.

Of course, you can always rummage through each company's privacy policy for details yourself -- many are linked at the bottom of a website or are easily searchable. To help with navigating them, PriBot has developed a chatbot as well as an AI-driven visualization tool that works with many of the internet's privacy policies.

Making data requests is not only useful for you, the customer, but also in aggregate ought to encourage companies to clarify their policies, potentially for the first time. "It forces companies to have practices in place to deal with these requests," said Privacy International's Kaltheuner. "The more people do them, the more procedures there will be in place to actually respond to them adequately."

We also faced mixed answers to an apparently simple question: Who else has my data?

Evernote, for instance, has a full online list of its 17 third-party vendors (PayPal, Zendesk) as well as an explanation of what each one is used for. Tinder, in contrast, doesn't list the names of its third-party vendors, only mentioning that "third parties" and "advertising partners" may receive user data.

In a recent Deloitte survey 56 percent of organizations said they have yet to figure out what data they have passed to third parties or how the new law will affect it. An additional 10 percent hadn't addressed whether their policies on third parties fit with the GDPR at all.

Before the GDPR, it was not the norm for organizations to have a clear data map of everywhere the information they hold goes, said Bret Cohen, a partner at law firm Hogan Lovells who specializes in privacy and cybersecurity. "The GDPR, more than any other law or event, spurred organizations to really look under the cover and figure out what exactly they were doing with data," he said. "It was a real motivator."

Asghari, from Delft University of Technology, noted that company sharing of third-party names was also commonly lacking in his research. "They are not even sure themselves who the data is shared with," he said.

There was also no standardized process for organizations to verify our identities for data requests.

Bumble requested two emailed identification documents, suggesting a driver's license, passport, birth certificate or bank statement. Spotify asked for the last four digits of a credit card on file, emphasizing not to send the entire card number. Services like Twitter, Google and LinkedIn had us make requests via web forms when we were already logged into our accounts and asked for nothing more. This raises a question of how easily your personal data could be hacked: If someone spoofed your email address or gained access to one of your accounts, would companies hand over your data to them?

"The main takeaway is to not gather more information than what you need to. If you're not in the business of collecting copies of driver's licenses or passport photos, don't collect them," said Jeewon Kim Serrato, US head of data protection, privacy and cybersecurity at law firm Norton Rose Fulbright. "Data minimization is really the foundational privacy principle. ... You should only keep the data that is providing value. And if it is no longer providing a service to the customer, it is now basically causing costs [for] the company."

"You should only keep the data that is providing value... if it is no longer providing a service to the customer, it is now basically causing costs [for] the company."

This is a complete reworking of the data-grabbing mindset. The ethos has long been to collect as much of the shimmering resource as possible, without necessarily knowing how it will be used in the future. "Storage of data is getting cheaper and cheaper, and so actually deleting it in a proper way requires sometimes more financial investments than just keeping it forever," said Serrato. The GDPR might change that calculus: The costs of handling valuable data are now too high if you don't know exactly what you're going to do with it.

Still, this is just a snapshot of a sprawling law that will be molded and refined through national legislation and test cases for years to come.

"There are certainly some companies playing the waiting game for a number of reasons, but I would say primarily based on the cost of compliance," said Cohen, of Hogan Lovells. "The companies who see themselves as the primary enforcement target are doing everything that they can to comply."

Experts expect enforcers to target the big tech companies first, particularly those in consumer-facing businesses: social media, mobile apps, health, advertising and self-driving cars as well as companies marketing to children. Data regulators in the UK, Ireland and France all say complaints and reports of data breaches have shot up since the GDPR came into force. California recently passed a digital privacy law that will go into effect in January 2020, viewed as a major step forward for data protection in the US, where there is no comparably comprehensive legislation.

Meanwhile, officials are talking tough about slapping fines on violators. "Clearly, abuse has become the norm. The aim of the EU data protection agency that I lead is to stop it," Giovanni Buttarelli, the European Union's data protection supervisor, writes. "The public will see the first results before the end of the year." All eyes will be on the first major case brought against a big tech company under the GDPR -- and, crucially, the amount the regulators choose to penalize them with. It will be the strongest indicator yet of whether this ambitious law can truly level the playing field between consumers and today's data barons.

Data retrieval credits
Features editor: Aaron Souppouris
Lead reporter: Chris Ip
Additional reporting: Matt Brian, Dan Cooper, Steve Dent, Jamie Rigg, Mat Smith, Nick Summers
Copy editor: Megan Giller
Illustration: Koren Shadmi