Big data and the promise of bureaucratic efficiency

One of the fundamental questions of my PhD thesis has been to conceptualize privacy and surveillance in a way which not only describes the society we live in, but also explains why the current information society with its fetishization of data looks the way it does. I have looked to various theories on surveillance and socio-legal conceptualizations of information privacy to address this question, but I was never really satisfied with the answer.

Michel Foucault’s panopticon deals with the psychological effects of being under visible surveillance, yet does not adequately explain life in the era of databases and electronic surveillance. Philosopher Manuel DeLanda’s excellent War in the Age of Intelligent Machines (1991), addresses the intelligence community’s perverse data collection logic, but does not really expand on the political economy of surveillance. Oscar Gandy does a better job at that, but descriptions and theories based on the US context are not directly applicable in Europe.

Socio-legal theories and some communication research address how people perceive privacy, but it is increasingly difficult to connect ideal notions of privacy to what is actually happening in the world, and the gap between norms of privacy, data practices, and laws of privacy is growing ever wider.

During the past two years I’ve delved into the legislative process of the new data protection law in the EU, the General Data Protection Regulation, which will enter into force in May 2018. One of my earliest observations was the inaccessibility of the language and the complexity of the document that addresses a very basic human need: to be able to choose when one is out of sight. Instead, the end result is an intricate web of rules and exceptions to the collection of personal information with very vague references to actual perceptions of privacy.

After reading David Graeber’s Utopia of Rules I came to an insight that had previously existed only as a side note in my conceptualization of surveillance societies: the role of bureaucracies. Rather than thinking of data collection as an element of discipline in the Foucauldian sense, I started to think of data collection as part of the bureaucratic system’s inherent logic that is independent from the actual power of surveillance.

The utopian promise of big data is not that of control but of efficiency. The present logic of data maximization defies traditional ideals of data minimization according to which data can only be processed for a specific purpose. The collection of data points is such an essential part of modern bureaucracies, private and public alike, that its role in society is treated as a given. This is why attitudes to data collection and privacy are not divided along the public/private or even the left/right spectra but rather along the lines of strange bedfellows such as anarchism and libertarianism versus socialism and fascism. The goals are of course very different, but the means are similar.

By seeing questions of privacy and surveillance through this lens the GDPR’s legislative process started to make more sense to me. The discourses employed by corporate and public lobbyists were not really about control over information flows, nor were they about disciplinary power. They were about the promise of bureaucratic efficiency.

Will new business models for journalism challenge the journalists’ autonomy?

I’ve been thinking about new business models for the news media, and in my opinion we’re witnessing a trend where new entrants are proposing business models where the readers pay for individual stories instead of a monthly subscription. There are probably more solutions out there, but it seems like the two most popular models are to either start a Kickstarteresque campaign to fund individual stories (such as Finnish startup Rapport), or to solicit micropayments for every story a user reads (Dutch company Blendle).

My intention is not to piss on these ideas, but I find it interesting that no one seems to think about what this does to journalism (full disclosure: I haven’t actively looked for this type of critique, so I might just have missed the critical debate).

I will try to be specific:

Premise 1: The (modern) ideal of news reporting is that journalists are allowed to pursue stories freely without interference by the business department. By separating the creative and commercial interests from each other journalists are granted the necessary autonomy to scrutinize the powerful.

Premise 2: cultural creators (such as journalists) are often underpaid and on temporary contracts, which makes them highly dependent on the owners. Although they provide the creative content that media organizations profit from, they get a very small slice of the pie.

So while story-funding grants journalists some independence from the owners, they become subjected to market pressures.  Won’t this heavily affect the journalist’s autonomy?

Personalization paranoia or how I was stalked by Daniel Tiger

The thing about personalized ads and content on Facebook is that you don’t know exactly why the content you see ends up on your News Feed. While this algorithmic black box is well known to many and probably ignored by most, academic analysis of behavioural advertising rarely take a closer look at what personalized ads do to a person’s psyche.

The Fred Rogers Company

The other day I was taking a daily scroll through my News Feed when I noticed an article from the Atlantic titled Daniel Tiger is Secretly Teaching Kids to Love Uber. For those of you without toddlers or a peculiar interest in kids’ TV shows, Daniel Tiger  is a friendly 4-year old tiger who teaches children how to cope with failure with happy-go-lucky songs.

Was the article served to me because I subscribe to the Atlantic’s Facebook page? I read several of their articles a week, so seeing an article from the Atlantic isn’t  too strange. However, I don’t see all of their articles, and the ones I do tend to be focused on topics related to the Internet economy (for obvious reasons).

Was it, in fact, the article’s reference to Uber, not Daniel Tiger, that made Facebook present this particular article to me? Or was it because Facebook had identified me as a parent and tended to suggest similar content to parents? Or did Facebook register that I googled the show at some point, and if I did, had I been signed into my Facebook account at the time or used private browsing? Or did Netflix share some of their viewing data with Facebook?

In this targeted online environment consent to terms and conditions and privacy notices make little sense. It is impossible to keep track of the myriad ways companies share and collect data, and a carte blanche is usually required to even begin using the service. While the goal might be efficient targeting to make advertisers happy, it results in personalization paranoia. Calling Facebook’s targeting a black box is therefore not an entirely accurate metaphor. I would prefer to call it a one-way mirror — everything we do is monitored, we’re vaguely aware of it, but we have no idea who’s watching.


The mystification of algorithms

Whenever I read stories on big data, it strikes me that journalists hardly ever know or care to explain what algorithms are or what they do. Take this example from the Economist’s recent special report on big data and politics:

Campaigners are hoovering up more and more digital information about every voting-age citizen and stashing it away in enormous databases. With the aid of complex algorithms, these data allow campaigners to decide, say, who needs to be reminded to make the trip to the polling station and who may be persuaded to vote for a particular candidate.

The Economist, March 26th, Special Report p.4

First, few seemed bothered with making a distinction between inferred intelligence and collected data. The quote above is an example of inferring information from existing databases – trying to figure out what kind of behaviour correlates with voting for a specific party. Since most databases are of a commercial nature,  I am guessing that they are trying to figure out if certain consumer behaviour, like buying organic milk, correlates with voting democrat.

In the case of protest movements, the waves of collective action leave a big digital footprint. Using ever more sophisticated algorithms, governments can mine these data.

The Economist, March 26th, Special Report p.4

The second example is about mining social media for data on dissidents and revolutionary action. There the data itself can be a source of “actionable intelligence” as Oscar Gandy would put it. There is nothing inherently sophisticated in looking for evidence of people participating in protest events on Facebook or finding protest movement chatter on Twitter.

Second, while the algorithms might be complex, they are usually employed in programmes that have relatively clear user interfaces. The Citizen Lab at the University of Toronto demonstrated that “net nannying” tools that are employed in schools, homes or businesses are also frequently used by authoritarian states for monitoring a whole nation’s communications.

While these reports give some insight into how data science is used to gain an advantage in politics or law enforcement, they tend to mystify the technologies and techniques involved. We are left confounded by this data magic that somehow changes the playing field. But the guiding principles are not that hard to understand, and using the programmes do not require a degree in computer science. We might not know exactly how the algorithms work, but we know what sources of information they use and what their purposes are.

Slide illustrating how to search PRISM’s counterterrorism database


The Privacy Shield – dead on arrival

Captain America Shield 04Five months after the ECJ ruled  that the Safe Harbor agreement is invalid the EU Commission has presented a new “Privacy Shield” that will replace the old agreement.

The Privacy Shield does contain some improvements regarding the rights of European citizens. It does not, however, fundamentally change the national security exception which brought down the agreement in the first place.

Recital 55 of the draft agreement reads as follows:

The Commission’s analysis shows that U.S. law contains clear limitations on the access and use of personal data transferred under the EU-U.S. Privacy Shield for national security purposes as well as oversight and redress mechanisms that provide sufficient safeguards for those data to be effectively protected against unlawful interference and the risk of abuse.

The Commission refers to Presidential Policy Directive 28 (“PPD-28”) regarding limitations on signal intelligence, issued by President Obama on January 17, 2014. The PPD-28 extends the same level of protection to non-US citizens as US citizens.

Sec.4 Safeguarding Personal Information Collected Through
Signals Intelligence

All persons should be treated with dignity and respect, regardless of their nationality or wherever they might reside, and all persons have legitimate privacy interests in the handling of their personal information.

That being said, any presidential policy directives may be overturned by future presidents, as this is a policy document, not an amendment to existing law (which permits the surveillance of non-US nationals, see FISAAA 2008 section 702). If you have the time, see the late Caspar Bowden’s excellent presentation on why agreements such as the Privacy Shield are doomed to fail:

Even if the PPD-28 would be allowed to stay in force AND the PPD-28 would be respected, it still endorses the mass collection of data:

Sec. 2 Limitations on the Use of Signals Intelligence Collected in Bulk
Locating new or emerging threats and other vital national security information is difficult, as such information is often hidden within the large and complex system of modern global communications. The United States must consequently collect signals intelligence in bulk.

Granted, the PPD-28 states that bulk collection must be used only for the detection and countering of (1) espionage, (2) terrorism, (3) weapons of mass destruction, (4) cybersecurity, (5) threats to US armed forces and their allies, and (6) transnational criminal threats. However, by its very definition, bulk collection means that all data is retained and accessible by the intelligence community, and there is no effective oversight on how that data is used. Let me cite Edward Snowden on what this boils down to in practice:

“In the course of their daily work they stumble across something that is completely unrelated to their work, for example an intimate nude photo of someone in a sexually compromising situation but they’re extremely attractive,” he said. “So what do they do? They turn around in their chair and they show a co-worker. And their co-worker says: ‘Oh, hey, that’s great. Send that to Bill down the way.’  (New York Times, July 20, 2014)

ECJ invalidates the Safe Harbour agreement: will all data transfers to the US stop?

Map from
Map from

Following the recommendation of Attorney General Yves Bot, the ECJ ruled today that the Safe Harbor agreement is invalid:

the Court declares the Safe Harbour Decision invalid. This judgment has the consequence that the Irish supervisory authority is required to examine Mr Schrems’ complaint with all due diligence and, at the conclusion of its investigation, is to decide whether, pursuant to the directive, transfer of the data of Facebook’s European subscribers to the United States should be suspended on the ground that that country does not afford an adequate level of protection of personal data.

The full judgement is available here.

This means that first of all, national Data Protection Authorities (DPAs) are granted power to decide whether or not data transfers are legitimate or not. The decision by the court will thus not stop all transfers to the US, it simply means that national DPAs may now block any transfers if they so see fit, as they are no longer required to follow the Safe Harbor agreement.

The Safe Harbor agreement did not fall because it was a self-regulatory instrument with a long history of compliance issues. It fell because US public authorities would not be required to follow the agreement, and because US law would always override it.

There was even a “national security exception” in the agreement, which makes the mass surveillance of Facebook data possible:

Adherence to these Principles may be limited: (a) to the extent necessary to meet national security, public interest, or law enforcement requirements; (b) by statute, government regulation, or case law that create conflicting obligations or explicit authorizations, provided that, in exercising any such authorization, an organization can demonstrate that its non-compliance with the Principles is limited to the extent necessary to meet the overriding legitimate interests furthered by such authorization;

(EC: Commission Decision 2000/520 Annex I)

What now?

Although this does not mean that data transfers between the EU and the US will stop immediately, this means that DPAs have the power to block them. IT companies will probably start applying for Binding Corporate Rules and using model contract clauses. But the weakness of the Safe Harbour agreement, the national security exception, is present in those cases as well. If DPAs decide to crack down on IT companies this might mean that more and more data centres will have to be established on European soil. For the IT giants this will just be a huge headache, but for SMEs this might mean that EU customers are off limits if the data isn’t stored in Europe, a cost which smaller startups might not be able to cover.

It is unlikely, however, that things will go that far. The enforcement of data protection rules will probably not go that far, and trade relations are at stake if this decision is interpreted strictly. The Safe Harbour agreement was always a political solution. The Commission knew that the US would never have information privacy laws adequate by European standards, and so a self-regulatory initiative was concocted. Now they will need a new agreement, but it will be much harder to come up with one that is seen as legitimate in light of the NSA leaks. It will be interesting to see them try.

Data power conference (June, 22-23), part 1: Disconnect

I recently attended and presented at “Data Power” in what turns out was an excellent conference organized by the University of Sheffield. The conference had called upon academics to submit papers that approached the question of big data from a  societal (& critical) perspective. That being said, the conference papers were more often than not empirically founded and the presenters refrained from adapting a conspiratorial mindset, which might sometimes be the case when discussing big data.

Here are some of the key points that I picked up from attending the different panels:

Disconnect & Resignation / tradeoff fallacy

Stefan Larsson (Lund University) and Mark Andrejevic (Pomona College) both stressed that there is a disconnect between commercial claims that people happily trade their privacy for discounts and services and how people actually feel. In reality, people feel that they are “forced or bribed” to give up their data in order to access a service. Joseph Turow, Michael Hennessy and Nora Draper have recently published a survey on what they call the “tradeoff fallacy” which supports the disconnect and resignation hypothesis put forth by Larsson and Andrejevic.

Access rights are rarely respected

Clive Norris (University of Sheffield) and Xavier L’Hoiry (University of Leeds) had investigated if companies or the public sector (data controllers) actually respect that people have the right to access their own data according to current data protection legislation. Turns out, they don’t:

• “20 % of data controllers cannot be identified before submitting an access request;
• 43 % of requests did not obtain access to personal data;
• 56 % of requests could not get adequate information regarding third party data sharing;
• 71 % of requests did not get adequate information regarding automated decision making processes.”

Instead, the controllers consulted applied what Norris & L’Hoiry call “discourses of denial”, either questioning the rights themselves (we do not recognize them), falsely claiming that only law enforcement would have access to this data or even claiming that the researches were insane to make such a claim (why would you possibly want this information?). The most common response was, however, none at all. Deafening silence is an effective way to tackle unpopular requests.

Self-management of data is not a workable solution

Jonathan Obar (University of Ontario Institute of Technology & Michigan State University) showed that data privacy cannot possibly be better protected through individual auditing of how companies and officials use your personal data, calling this approach a “romantic fallacy”.

Even if data controllers would respect the so-called ARCO rights (access to data, rectification of data, cancellation of data & objection to data processing), it is far too difficult and time-consuming for regular citizens to manage their own data. Rather, Obar suggests that either data protection authorities (DPAs) or private companies would oversee how our data is used, a form of representative data management. The problem with this solution is of course the significant resources it would require.

There is no such thing as informed consent in a big data environment

Mark Andrejevic emphasized that data protection regulation and big data practice are based on opposing principles: big data on data maximization and data protection on data minimization. The notion of relevance does not work as a limiting factor for collecting data, since the relevance of data is only determined afterwords by aggregating data and looking for correlations. This makes informed consent increasingly difficult: what are we consenting to if we do not know the applications of the collection?

Behavioural advertising – Always Be Creeping

There’s a new business logic which permeates most of today’s online commerce. The ABC is no longer Always Be Closing, it’s Always Be Creeping.

But even as behavioural advertising evolves and targeting becomes more sophisticated, sometimes companies may wish to be subtler when offering targeted ads to consumers. In a much-cited New York Times article from 2012, a former employee of Target said that

[W]e started mixing in all these ads for things we knew pregnant women would never buy, so the baby ads looked random. We’d put an ad for a lawn mower next to diapers. We’d put a coupon for wineglasses next to infant clothes. That way, it looked like all the products were chosen by chance. And we found out that as long as a pregnant woman thinks she hasn’t been spied on, she’ll use the coupons. She just assumes that everyone else on her block got the same mailer for diapers and cribs. As long as we don’t spook her, it works. 

Tene and Polonetsky (2013) argue that it’s not the data collection itself which is creepy, but how statistical analysis is used to come to certain conclusions about you.

This is especially the case when “offline” purchases are combined with information on online behaviour, a practice referred to as “onboarding”. We have grown accustomed to personalised ads based on web browsing or Facebook likes, but today’s marketers want a complete picture of our everyday transactions as well.

Whether or not one sees this as invasive is up to each and everyone to decide, but one can bear in mind that one of the industry’s lead data brokers, Acxiom, has “information [on] about 700 million consumers worldwide with over 3000 data segments for nearly every U.S. consumer (FTC report, 2014).” Combined, the biggest data brokers have billions and billions of records on people and businesses.

In their defence, the Digital Advertising Alliance does offer consumers a choice to opt out of data tracking. If consumers know that such an option exists is another question entirely, and the registry only covers companies which have agreed to participate. In the end, such self-regulatory measures directed towards consumers are ineffective, as the most privacy-conscious are likely to use other means to conceal their actions online whereas the vast majority of people are unaware that such options exist.


Federal Trade Commission, 2014: DATA BROKERS: A Call for Transparency and Accountability. 

Tene, Omer and Polonetsky, Jules, 2013: A Theory of Creepy: Technology, Privacy and Shifting Social Norms [September 16, 2013]. Yale Journal of Law & Technology, 2013. Available at SSRN:

Perverse repercussions of the Charlie Hebdo attack

Many media outlets have seen the attack on Charlie Hebdo as a grave threat to the freedom of expression. While the attack  itself can cause news outlets to self-censor their publications out of fear, it is not very fruitful to evaluate the state of civil liberties in the wake of terrorist attacks. Perversely, rather than strengthen the fundamental principles on which the freedom of expression is based, terror attacks have been used to limit communication rights: the Patriot Act was enacted just months after 9/11, while the Data Retention Directive was a direct consequence of the London and Madrid attacks.

Usually these laws have been enacted in the countries that have suffered from the attacks, but now David Cameron has proposed that Britain’s intelligence agencies should be allowed to break into the encrypted communications of instant messaging apps such as iMessage:

“In extremis, it has been possible to read someone’s letter, to listen to someone’s call, to mobile communications … The question remains: are we going to allow a means of communications where it simply is not possible to do that? My answer to that question is: no, we must not. The first duty of any government is to keep our country and our people safe.”

The proposed measure is not only a textbook example of treating the symptoms and not the disease, but essentially a threat to the very freedom which several political leaders swore to protect after the Charlie Hebdo attack. Freedom of expression is inherently connected to the freedom from surveillance. Censorship cannot exist without the surveillance of communications. This proposed ban on encrypted communications would greatly impede the media outlets’ capacity to protect their informants, because as Cory Doctorow points out, weakening the security of communications also means that foreign spies, evil hackers and other wrongdoers will be able to access British communications, not only MI5.

It’s like the NSA/GCGQ leak never happened.

Is sensitive personal information becoming desensitized?

With all the hype surrounding big data, it should come as no surprise that people are worried about how their data is used.

"How concerned are you about the unnecessary disclosure of personal information?" Source: Eurobarometer 359, p. 59.
“How concerned are you about the unnecessary disclosure of personal information?” Source: Eurobarometer 359, p. 59.

Some data points have always been part of the modern bureaucratic state (see Giddens, 1985, for example). These are usually referred to as objective data points, which indicate facts such as births, deaths, age and income. Many of these data points are simply necessary to run a well-functioning state apparatus.

The reason why smartphones and social media is changing our relation to personal data is that subjective data points, such as what people read, what they search for, who they secretly stalk on Facebook, are much easier to come by than before. What’s more, people readily submit information on themselves on social networks that used to be hidden deep under the surface. There is a huge gap between what data protection officials think is sensitive information and what’s actually happening.

Fore example, the UK’s Information Commissioner’s Office defines sensitive personal data as something which has information on a) the racial or ethnic origin of a person, b) his/her political opinions or religious beliefs, c) his/her sexual life, among other things.

Facebook sees your sexual preferences, religious beliefs and political views as “basic info”. Not even “details about you”, but “basic”; information which is, undeniably, potentially very sensitive in many parts of the world.

The gist is that while marketers and companies are hoping to gather more and more sensitive information on potential customers, they really, REALLY don’t want to have their customer databases defined as collections of “sensitive data”. Because when that happens, you are suddenly forced to follow strict rules regarding what you can do with the information. Funnily enough, the best way to avoid that is not by refraining from collecting sensitive information, but rather by claiming that the information you have gathered is not on “real, identifiable people” but just “profiles”.

Giddens, Anthony (1985): The Nation-State and Violence: Volume Two of a Contemporary Critique of Historical Materialism. Cambridge: Polity Press.

Eurobarometer survey 359 on data protection