Big data and the promise of bureaucratic efficiency

One of the fundamental questions of my PhD thesis has been to conceptualize privacy and surveillance in a way which not only describes the society we live in, but also explains why the current information society with its fetishization of data looks the way it does. I have looked to various theories on surveillance and socio-legal conceptualizations of information privacy to address this question, but I was never really satisfied with the answer.

Michel Foucault’s panopticon deals with the psychological effects of being under visible surveillance, yet does not adequately explain life in the era of databases and electronic surveillance. Philosopher Manuel DeLanda’s excellent War in the Age of Intelligent Machines (1991), addresses the intelligence community’s perverse data collection logic, but does not really expand on the political economy of surveillance. Oscar Gandy does a better job at that, but descriptions and theories based on the US context are not directly applicable in Europe.

Socio-legal theories and some communication research address how people perceive privacy, but it is increasingly difficult to connect ideal notions of privacy to what is actually happening in the world, and the gap between norms of privacy, data practices, and laws of privacy is growing ever wider.

During the past two years I’ve delved into the legislative process of the new data protection law in the EU, the General Data Protection Regulation, which will enter into force in May 2018. One of my earliest observations was the inaccessibility of the language and the complexity of the document that addresses a very basic human need: to be able to choose when one is out of sight. Instead, the end result is an intricate web of rules and exceptions to the collection of personal information with very vague references to actual perceptions of privacy.

After reading David Graeber’s Utopia of Rules I came to an insight that had previously existed only as a side note in my conceptualization of surveillance societies: the role of bureaucracies. Rather than thinking of data collection as an element of discipline in the Foucauldian sense, I started to think of data collection as part of the bureaucratic system’s inherent logic that is independent from the actual power of surveillance.

The utopian promise of big data is not that of control but of efficiency. The present logic of data maximization defies traditional ideals of data minimization according to which data can only be processed for a specific purpose. The collection of data points is such an essential part of modern bureaucracies, private and public alike, that its role in society is treated as a given. This is why attitudes to data collection and privacy are not divided along the public/private or even the left/right spectra but rather along the lines of strange bedfellows such as anarchism and libertarianism versus socialism and fascism. The goals are of course very different, but the means are similar.

By seeing questions of privacy and surveillance through this lens the GDPR’s legislative process started to make more sense to me. The discourses employed by corporate and public lobbyists were not really about control over information flows, nor were they about disciplinary power. They were about the promise of bureaucratic efficiency.

The mystification of algorithms

Whenever I read stories on big data, it strikes me that journalists hardly ever know or care to explain what algorithms are or what they do. Take this example from the Economist’s recent special report on big data and politics:

Campaigners are hoovering up more and more digital information about every voting-age citizen and stashing it away in enormous databases. With the aid of complex algorithms, these data allow campaigners to decide, say, who needs to be reminded to make the trip to the polling station and who may be persuaded to vote for a particular candidate.

The Economist, March 26th, Special Report p.4

First, few seemed bothered with making a distinction between inferred intelligence and collected data. The quote above is an example of inferring information from existing databases – trying to figure out what kind of behaviour correlates with voting for a specific party. Since most databases are of a commercial nature,  I am guessing that they are trying to figure out if certain consumer behaviour, like buying organic milk, correlates with voting democrat.

In the case of protest movements, the waves of collective action leave a big digital footprint. Using ever more sophisticated algorithms, governments can mine these data.

The Economist, March 26th, Special Report p.4

The second example is about mining social media for data on dissidents and revolutionary action. There the data itself can be a source of “actionable intelligence” as Oscar Gandy would put it. There is nothing inherently sophisticated in looking for evidence of people participating in protest events on Facebook or finding protest movement chatter on Twitter.

Second, while the algorithms might be complex, they are usually employed in programmes that have relatively clear user interfaces. The Citizen Lab at the University of Toronto demonstrated that “net nannying” tools that are employed in schools, homes or businesses are also frequently used by authoritarian states for monitoring a whole nation’s communications.

While these reports give some insight into how data science is used to gain an advantage in politics or law enforcement, they tend to mystify the technologies and techniques involved. We are left confounded by this data magic that somehow changes the playing field. But the guiding principles are not that hard to understand, and using the programmes do not require a degree in computer science. We might not know exactly how the algorithms work, but we know what sources of information they use and what their purposes are.

Slide illustrating how to search PRISM’s counterterrorism database


ECJ invalidates the Safe Harbour agreement: will all data transfers to the US stop?

Map from
Map from

Following the recommendation of Attorney General Yves Bot, the ECJ ruled today that the Safe Harbor agreement is invalid:

the Court declares the Safe Harbour Decision invalid. This judgment has the consequence that the Irish supervisory authority is required to examine Mr Schrems’ complaint with all due diligence and, at the conclusion of its investigation, is to decide whether, pursuant to the directive, transfer of the data of Facebook’s European subscribers to the United States should be suspended on the ground that that country does not afford an adequate level of protection of personal data.

The full judgement is available here.

This means that first of all, national Data Protection Authorities (DPAs) are granted power to decide whether or not data transfers are legitimate or not. The decision by the court will thus not stop all transfers to the US, it simply means that national DPAs may now block any transfers if they so see fit, as they are no longer required to follow the Safe Harbor agreement.

The Safe Harbor agreement did not fall because it was a self-regulatory instrument with a long history of compliance issues. It fell because US public authorities would not be required to follow the agreement, and because US law would always override it.

There was even a “national security exception” in the agreement, which makes the mass surveillance of Facebook data possible:

Adherence to these Principles may be limited: (a) to the extent necessary to meet national security, public interest, or law enforcement requirements; (b) by statute, government regulation, or case law that create conflicting obligations or explicit authorizations, provided that, in exercising any such authorization, an organization can demonstrate that its non-compliance with the Principles is limited to the extent necessary to meet the overriding legitimate interests furthered by such authorization;

(EC: Commission Decision 2000/520 Annex I)

What now?

Although this does not mean that data transfers between the EU and the US will stop immediately, this means that DPAs have the power to block them. IT companies will probably start applying for Binding Corporate Rules and using model contract clauses. But the weakness of the Safe Harbour agreement, the national security exception, is present in those cases as well. If DPAs decide to crack down on IT companies this might mean that more and more data centres will have to be established on European soil. For the IT giants this will just be a huge headache, but for SMEs this might mean that EU customers are off limits if the data isn’t stored in Europe, a cost which smaller startups might not be able to cover.

It is unlikely, however, that things will go that far. The enforcement of data protection rules will probably not go that far, and trade relations are at stake if this decision is interpreted strictly. The Safe Harbour agreement was always a political solution. The Commission knew that the US would never have information privacy laws adequate by European standards, and so a self-regulatory initiative was concocted. Now they will need a new agreement, but it will be much harder to come up with one that is seen as legitimate in light of the NSA leaks. It will be interesting to see them try.

Data power conference (June, 22-23), part 1: Disconnect

I recently attended and presented at “Data Power” in what turns out was an excellent conference organized by the University of Sheffield. The conference had called upon academics to submit papers that approached the question of big data from a  societal (& critical) perspective. That being said, the conference papers were more often than not empirically founded and the presenters refrained from adapting a conspiratorial mindset, which might sometimes be the case when discussing big data.

Here are some of the key points that I picked up from attending the different panels:

Disconnect & Resignation / tradeoff fallacy

Stefan Larsson (Lund University) and Mark Andrejevic (Pomona College) both stressed that there is a disconnect between commercial claims that people happily trade their privacy for discounts and services and how people actually feel. In reality, people feel that they are “forced or bribed” to give up their data in order to access a service. Joseph Turow, Michael Hennessy and Nora Draper have recently published a survey on what they call the “tradeoff fallacy” which supports the disconnect and resignation hypothesis put forth by Larsson and Andrejevic.

Access rights are rarely respected

Clive Norris (University of Sheffield) and Xavier L’Hoiry (University of Leeds) had investigated if companies or the public sector (data controllers) actually respect that people have the right to access their own data according to current data protection legislation. Turns out, they don’t:

• “20 % of data controllers cannot be identified before submitting an access request;
• 43 % of requests did not obtain access to personal data;
• 56 % of requests could not get adequate information regarding third party data sharing;
• 71 % of requests did not get adequate information regarding automated decision making processes.”

Instead, the controllers consulted applied what Norris & L’Hoiry call “discourses of denial”, either questioning the rights themselves (we do not recognize them), falsely claiming that only law enforcement would have access to this data or even claiming that the researches were insane to make such a claim (why would you possibly want this information?). The most common response was, however, none at all. Deafening silence is an effective way to tackle unpopular requests.

Self-management of data is not a workable solution

Jonathan Obar (University of Ontario Institute of Technology & Michigan State University) showed that data privacy cannot possibly be better protected through individual auditing of how companies and officials use your personal data, calling this approach a “romantic fallacy”.

Even if data controllers would respect the so-called ARCO rights (access to data, rectification of data, cancellation of data & objection to data processing), it is far too difficult and time-consuming for regular citizens to manage their own data. Rather, Obar suggests that either data protection authorities (DPAs) or private companies would oversee how our data is used, a form of representative data management. The problem with this solution is of course the significant resources it would require.

There is no such thing as informed consent in a big data environment

Mark Andrejevic emphasized that data protection regulation and big data practice are based on opposing principles: big data on data maximization and data protection on data minimization. The notion of relevance does not work as a limiting factor for collecting data, since the relevance of data is only determined afterwords by aggregating data and looking for correlations. This makes informed consent increasingly difficult: what are we consenting to if we do not know the applications of the collection?

Is sensitive personal information becoming desensitized?

With all the hype surrounding big data, it should come as no surprise that people are worried about how their data is used.

"How concerned are you about the unnecessary disclosure of personal information?" Source: Eurobarometer 359, p. 59.
“How concerned are you about the unnecessary disclosure of personal information?” Source: Eurobarometer 359, p. 59.

Some data points have always been part of the modern bureaucratic state (see Giddens, 1985, for example). These are usually referred to as objective data points, which indicate facts such as births, deaths, age and income. Many of these data points are simply necessary to run a well-functioning state apparatus.

The reason why smartphones and social media is changing our relation to personal data is that subjective data points, such as what people read, what they search for, who they secretly stalk on Facebook, are much easier to come by than before. What’s more, people readily submit information on themselves on social networks that used to be hidden deep under the surface. There is a huge gap between what data protection officials think is sensitive information and what’s actually happening.

Fore example, the UK’s Information Commissioner’s Office defines sensitive personal data as something which has information on a) the racial or ethnic origin of a person, b) his/her political opinions or religious beliefs, c) his/her sexual life, among other things.

Facebook sees your sexual preferences, religious beliefs and political views as “basic info”. Not even “details about you”, but “basic”; information which is, undeniably, potentially very sensitive in many parts of the world.

The gist is that while marketers and companies are hoping to gather more and more sensitive information on potential customers, they really, REALLY don’t want to have their customer databases defined as collections of “sensitive data”. Because when that happens, you are suddenly forced to follow strict rules regarding what you can do with the information. Funnily enough, the best way to avoid that is not by refraining from collecting sensitive information, but rather by claiming that the information you have gathered is not on “real, identifiable people” but just “profiles”.

Giddens, Anthony (1985): The Nation-State and Violence: Volume Two of a Contemporary Critique of Historical Materialism. Cambridge: Polity Press.

Eurobarometer survey 359 on data protection

Big Data Dystopia pt 2: Newspapers and web shops join forces

In an earlier post, I discussed the possible implications of banks and insurance companies converging. This post will focus on the convergence of newspapers and web shops.

In a nutshell, a daily newspaper’s greatest assets have usually been its reach and its credibility.

The past 20 years or so, newspaper subscriptions have been declining in most countries. Other media outlets are just as popular as newspapers’ websites, and the reach of newspapers is no longer as dominant as it used to be.

Credibility, however, works differently. Increased competition does not affect credibility negatively. A good review in the New York Times can lift something or someone fairly unknown from the margins to the mainstream.

It’s not news that many newspapers are struggling in the online ad market, even though the market is growing. Google and Facebook dominate, and little suggests that newspapers will be able to compete with the two online ad powerhouses. However, the two have not yet been that successful in sealing the deal; that is, getting people to actually buy products online.

One of Amazon’s greatest feats is doing exactly that.  With the help of its elaborate recommendation system, Amazon recommends products based on previous purchases and browsing history. Amazon’s algorithm can even identify you (and help you on your way) as a potential drug dealer if you choose to buy a certain scale.

What Amazon tries to achieve is increased credibility  through crowdsourcing customer reviews. Still, an anonymous, non-professional customer review is nothing like an article in The Guardian.

In 2013, Amazon owner Jeff Bezos bought the Washington Post. Bezos’ editorial aspirations aside, the move is likely to spur innovative cross-ownership business models. Similarly, Finnish newspaper Helsingin Sanomat has also launched their own web shop, Mitä Saisi Olla.  Although significantly smaller in scale, the message is clear: if online ads fail, online shops might be the answer.

Now, based on innovations in behavioural targeting and automatized tracking of reading patterns online, newspapers have more information on their readers than ever. Not all newspapers track their users of course, but those wishing to remain attractive to advertisers in this day and age should at least consider doing so. A third asset for newspapers has emerged: Deep knowledge about reading patterns can tell as much or even more than a person’s  Google search history. The articles we read, how much time we spend reading them and whether we recommend them to our peers are essential for understanding not only who we are but also who we strive to be.

This could lead to at least two outcomes. First, reviews and product benchmarks might be published alongside convenient links to the web store. A great book review can be the catalyst for a spontaneous one-click-buy.

Second, data on reading patterns can be  compared to consumption history, creating an even clearer picture of consumer interests. The web shop is no longer fully dependent on browsing history but can also rely on actual information on  consumers’ interests. Similarly, the newspaper can not only speculate on its readers’ consumption patterns, but actually convince advertisers that they know exactly what products their readers will buy.

The crux is  that such actions might damage the newspaper’s reputation. Let’s hope that the newspapers won’t be reduced to mere barkers for web shops.

Big Data Dystopia pt 1: Banks and insurance companies converge

One of the more discussed developments in the field of media is that platforms are converging: tv companies are going online, telcos are providing streaming services, radio channels are becoming online media and so on. This convergence or “vertical integration” is of course challenging for law-makers and courts that have to apply antitrust laws to completely new cases.

Vertical vs horizontal integration:

Source: Wikipedia Commons / Martin Sauter

Technological developments in the communications industry are also affecting how other markets work. Let’s take the example of banks and insurance companies.

Insurance premiums are, somewhat simplified, based on calculating risk. In order to calculate what a person’s life insurance premium should cost,  factors like age, work and medical history are taken into account. These factors determine the probability of the insured falling ill or getting into an accident. The more detailed information on the insured’s life, the better.

Banks (and several app developers) are waking up to the fact that electronic payment means that it is possible to know a whole lot more about their customers than in the cash and check days. Consumers can click away and define every single purchase they make, be it clothing, food, home electronics or alcohol. A great service for the banks’ customers,  who can plan their home economy better than before and pinpoint where their salaries are going. Or as Pridmore and Zwich (2011, 273)  would have it, “[c]onsumers often happily participate in the personal information economy and the surveillance practices that underpin it. ”

In some countries, insurance companies and banks are converging – either insurance companies begin to provide banking services or banks buy insurance companies. This way customers can choose to buy their insurance and their bank services from the same provider – often at a better price. This is when it gets interesting.

The consolidation of banking and insurance data means that, theoretically, insurance companies could adjust insurance premiums according to the purchases being made by the same company’s banking clients. Failed to renew your gym membership, money spent on alcohol gone up two thirds? Several visits to the doctor in the past months? Perhaps the next insurance premium won’t be as affordable.

The question is, are the new banking services provided with the consumer in mind, or is this simply another way to solicit data from people in order to create business elsewhere? One thing is certain: insurance actuaries would love to have access to that data.


Pridmore, Jason and Detlev Zwick. 2011. Editorial: Marketing and the Rise of Commercial Consumer Surveillance. Surveillance & Society 8(3): 269-277.