Big data and the promise of bureaucratic efficiency

One of the fundamental questions of my PhD thesis has been to conceptualize privacy and surveillance in a way which not only describes the society we live in, but also explains why the current information society with its fetishization of data looks the way it does. I have looked to various theories on surveillance and socio-legal conceptualizations of information privacy to address this question, but I was never really satisfied with the answer.

Michel Foucault’s panopticon deals with the psychological effects of being under visible surveillance, yet does not adequately explain life in the era of databases and electronic surveillance. Philosopher Manuel DeLanda’s excellent War in the Age of Intelligent Machines (1991), addresses the intelligence community’s perverse data collection logic, but does not really expand on the political economy of surveillance. Oscar Gandy does a better job at that, but descriptions and theories based on the US context are not directly applicable in Europe.

Socio-legal theories and some communication research address how people perceive privacy, but it is increasingly difficult to connect ideal notions of privacy to what is actually happening in the world, and the gap between norms of privacy, data practices, and laws of privacy is growing ever wider.

During the past two years I’ve delved into the legislative process of the new data protection law in the EU, the General Data Protection Regulation, which will enter into force in May 2018. One of my earliest observations was the inaccessibility of the language and the complexity of the document that addresses a very basic human need: to be able to choose when one is out of sight. Instead, the end result is an intricate web of rules and exceptions to the collection of personal information with very vague references to actual perceptions of privacy.

After reading David Graeber’s Utopia of Rules I came to an insight that had previously existed only as a side note in my conceptualization of surveillance societies: the role of bureaucracies. Rather than thinking of data collection as an element of discipline in the Foucauldian sense, I started to think of data collection as part of the bureaucratic system’s inherent logic that is independent from the actual power of surveillance.

The utopian promise of big data is not that of control but of efficiency. The present logic of data maximization defies traditional ideals of data minimization according to which data can only be processed for a specific purpose. The collection of data points is such an essential part of modern bureaucracies, private and public alike, that its role in society is treated as a given. This is why attitudes to data collection and privacy are not divided along the public/private or even the left/right spectra but rather along the lines of strange bedfellows such as anarchism and libertarianism versus socialism and fascism. The goals are of course very different, but the means are similar.

By seeing questions of privacy and surveillance through this lens the GDPR’s legislative process started to make more sense to me. The discourses employed by corporate and public lobbyists were not really about control over information flows, nor were they about disciplinary power. They were about the promise of bureaucratic efficiency.

The mystification of algorithms

Whenever I read stories on big data, it strikes me that journalists hardly ever know or care to explain what algorithms are or what they do. Take this example from the Economist’s recent special report on big data and politics:

Campaigners are hoovering up more and more digital information about every voting-age citizen and stashing it away in enormous databases. With the aid of complex algorithms, these data allow campaigners to decide, say, who needs to be reminded to make the trip to the polling station and who may be persuaded to vote for a particular candidate.

The Economist, March 26th, Special Report p.4

First, few seemed bothered with making a distinction between inferred intelligence and collected data. The quote above is an example of inferring information from existing databases – trying to figure out what kind of behaviour correlates with voting for a specific party. Since most databases are of a commercial nature,  I am guessing that they are trying to figure out if certain consumer behaviour, like buying organic milk, correlates with voting democrat.

In the case of protest movements, the waves of collective action leave a big digital footprint. Using ever more sophisticated algorithms, governments can mine these data.

The Economist, March 26th, Special Report p.4

The second example is about mining social media for data on dissidents and revolutionary action. There the data itself can be a source of “actionable intelligence” as Oscar Gandy would put it. There is nothing inherently sophisticated in looking for evidence of people participating in protest events on Facebook or finding protest movement chatter on Twitter.

Second, while the algorithms might be complex, they are usually employed in programmes that have relatively clear user interfaces. The Citizen Lab at the University of Toronto demonstrated that “net nannying” tools that are employed in schools, homes or businesses are also frequently used by authoritarian states for monitoring a whole nation’s communications.

While these reports give some insight into how data science is used to gain an advantage in politics or law enforcement, they tend to mystify the technologies and techniques involved. We are left confounded by this data magic that somehow changes the playing field. But the guiding principles are not that hard to understand, and using the programmes do not require a degree in computer science. We might not know exactly how the algorithms work, but we know what sources of information they use and what their purposes are.

Slide illustrating how to search PRISM’s counterterrorism database


The Privacy Shield – dead on arrival

Captain America Shield 04Five months after the ECJ ruled  that the Safe Harbor agreement is invalid the EU Commission has presented a new “Privacy Shield” that will replace the old agreement.

The Privacy Shield does contain some improvements regarding the rights of European citizens. It does not, however, fundamentally change the national security exception which brought down the agreement in the first place.

Recital 55 of the draft agreement reads as follows:

The Commission’s analysis shows that U.S. law contains clear limitations on the access and use of personal data transferred under the EU-U.S. Privacy Shield for national security purposes as well as oversight and redress mechanisms that provide sufficient safeguards for those data to be effectively protected against unlawful interference and the risk of abuse.

The Commission refers to Presidential Policy Directive 28 (“PPD-28”) regarding limitations on signal intelligence, issued by President Obama on January 17, 2014. The PPD-28 extends the same level of protection to non-US citizens as US citizens.

Sec.4 Safeguarding Personal Information Collected Through
Signals Intelligence

All persons should be treated with dignity and respect, regardless of their nationality or wherever they might reside, and all persons have legitimate privacy interests in the handling of their personal information.

That being said, any presidential policy directives may be overturned by future presidents, as this is a policy document, not an amendment to existing law (which permits the surveillance of non-US nationals, see FISAAA 2008 section 702). If you have the time, see the late Caspar Bowden’s excellent presentation on why agreements such as the Privacy Shield are doomed to fail:

Even if the PPD-28 would be allowed to stay in force AND the PPD-28 would be respected, it still endorses the mass collection of data:

Sec. 2 Limitations on the Use of Signals Intelligence Collected in Bulk
Locating new or emerging threats and other vital national security information is difficult, as such information is often hidden within the large and complex system of modern global communications. The United States must consequently collect signals intelligence in bulk.

Granted, the PPD-28 states that bulk collection must be used only for the detection and countering of (1) espionage, (2) terrorism, (3) weapons of mass destruction, (4) cybersecurity, (5) threats to US armed forces and their allies, and (6) transnational criminal threats. However, by its very definition, bulk collection means that all data is retained and accessible by the intelligence community, and there is no effective oversight on how that data is used. Let me cite Edward Snowden on what this boils down to in practice:

“In the course of their daily work they stumble across something that is completely unrelated to their work, for example an intimate nude photo of someone in a sexually compromising situation but they’re extremely attractive,” he said. “So what do they do? They turn around in their chair and they show a co-worker. And their co-worker says: ‘Oh, hey, that’s great. Send that to Bill down the way.’  (New York Times, July 20, 2014)

Data power conference (June, 22-23), part 1: Disconnect

I recently attended and presented at “Data Power” in what turns out was an excellent conference organized by the University of Sheffield. The conference had called upon academics to submit papers that approached the question of big data from a  societal (& critical) perspective. That being said, the conference papers were more often than not empirically founded and the presenters refrained from adapting a conspiratorial mindset, which might sometimes be the case when discussing big data.

Here are some of the key points that I picked up from attending the different panels:

Disconnect & Resignation / tradeoff fallacy

Stefan Larsson (Lund University) and Mark Andrejevic (Pomona College) both stressed that there is a disconnect between commercial claims that people happily trade their privacy for discounts and services and how people actually feel. In reality, people feel that they are “forced or bribed” to give up their data in order to access a service. Joseph Turow, Michael Hennessy and Nora Draper have recently published a survey on what they call the “tradeoff fallacy” which supports the disconnect and resignation hypothesis put forth by Larsson and Andrejevic.

Access rights are rarely respected

Clive Norris (University of Sheffield) and Xavier L’Hoiry (University of Leeds) had investigated if companies or the public sector (data controllers) actually respect that people have the right to access their own data according to current data protection legislation. Turns out, they don’t:

• “20 % of data controllers cannot be identified before submitting an access request;
• 43 % of requests did not obtain access to personal data;
• 56 % of requests could not get adequate information regarding third party data sharing;
• 71 % of requests did not get adequate information regarding automated decision making processes.”

Instead, the controllers consulted applied what Norris & L’Hoiry call “discourses of denial”, either questioning the rights themselves (we do not recognize them), falsely claiming that only law enforcement would have access to this data or even claiming that the researches were insane to make such a claim (why would you possibly want this information?). The most common response was, however, none at all. Deafening silence is an effective way to tackle unpopular requests.

Self-management of data is not a workable solution

Jonathan Obar (University of Ontario Institute of Technology & Michigan State University) showed that data privacy cannot possibly be better protected through individual auditing of how companies and officials use your personal data, calling this approach a “romantic fallacy”.

Even if data controllers would respect the so-called ARCO rights (access to data, rectification of data, cancellation of data & objection to data processing), it is far too difficult and time-consuming for regular citizens to manage their own data. Rather, Obar suggests that either data protection authorities (DPAs) or private companies would oversee how our data is used, a form of representative data management. The problem with this solution is of course the significant resources it would require.

There is no such thing as informed consent in a big data environment

Mark Andrejevic emphasized that data protection regulation and big data practice are based on opposing principles: big data on data maximization and data protection on data minimization. The notion of relevance does not work as a limiting factor for collecting data, since the relevance of data is only determined afterwords by aggregating data and looking for correlations. This makes informed consent increasingly difficult: what are we consenting to if we do not know the applications of the collection?

Perverse repercussions of the Charlie Hebdo attack

Many media outlets have seen the attack on Charlie Hebdo as a grave threat to the freedom of expression. While the attack  itself can cause news outlets to self-censor their publications out of fear, it is not very fruitful to evaluate the state of civil liberties in the wake of terrorist attacks. Perversely, rather than strengthen the fundamental principles on which the freedom of expression is based, terror attacks have been used to limit communication rights: the Patriot Act was enacted just months after 9/11, while the Data Retention Directive was a direct consequence of the London and Madrid attacks.

Usually these laws have been enacted in the countries that have suffered from the attacks, but now David Cameron has proposed that Britain’s intelligence agencies should be allowed to break into the encrypted communications of instant messaging apps such as iMessage:

“In extremis, it has been possible to read someone’s letter, to listen to someone’s call, to mobile communications … The question remains: are we going to allow a means of communications where it simply is not possible to do that? My answer to that question is: no, we must not. The first duty of any government is to keep our country and our people safe.”

The proposed measure is not only a textbook example of treating the symptoms and not the disease, but essentially a threat to the very freedom which several political leaders swore to protect after the Charlie Hebdo attack. Freedom of expression is inherently connected to the freedom from surveillance. Censorship cannot exist without the surveillance of communications. This proposed ban on encrypted communications would greatly impede the media outlets’ capacity to protect their informants, because as Cory Doctorow points out, weakening the security of communications also means that foreign spies, evil hackers and other wrongdoers will be able to access British communications, not only MI5.

It’s like the NSA/GCGQ leak never happened.

Geolocation: GPS, IP addresses and Wi-Fi

A study by researchers from Australia’s ICT research centre NICTA revealed in 2012 that geolocation based on IP addresses alone is off by 100 km in approximately 70 per cent of all cases. With regular broadband, it is possible to have more accurate predictions, but since mobile data is, well, mobile, it means that the user moves around quite a bit. If you roam in another country, for example, your IP will still register you as being in the country of your operator.

In other words, for consumer monitoring or surveillance purposes, IP address location data is worthless.

So why does turning on wifi make location data more accurate? Because turning on wifi means turning on the device’s access to a database on wifi access points and radio towers. So-called Wi-Fi-based positioning systems (WPS) are maintained by different companies, most notably Google, Microsoft and Apple.  On the plus side, your phone gets an accurate location read even though you’re inside a building. The downside? You get tracked even though you have your GPS turned off and you’re not connected to a Wi-Fi network, but simply have your phone’s Wi-Fi on.

In some cases, the phone keeps tracking networks even though Wi-Fi is off. Google does acknowledges this with the following statement:

“To improve location accuracy and for other purposes, Google and other apps may scan for nearby networks, even when wifi is off. If you don’t want this to happen, go to advanced > scanning always available.”

If you have a Google account, it could be worth checking out where you’ve been the past year through Google’s location history service.

In light of this, it becomes clear that any data retention laws that governments might have pale in comparison with the data retention gathered as a part of services provided by Google, Apple or Microsoft.