Big data and the promise of bureaucratic efficiency

One of the fundamental questions of my PhD thesis has been to conceptualize privacy and surveillance in a way which not only describes the society we live in, but also explains why the current information society with its fetishization of data looks the way it does. I have looked to various theories on surveillance and socio-legal conceptualizations of information privacy to address this question, but I was never really satisfied with the answer.

Michel Foucault’s panopticon deals with the psychological effects of being under visible surveillance, yet does not adequately explain life in the era of databases and electronic surveillance. Philosopher Manuel DeLanda’s excellent War in the Age of Intelligent Machines (1991), addresses the intelligence community’s perverse data collection logic, but does not really expand on the political economy of surveillance. Oscar Gandy does a better job at that, but descriptions and theories based on the US context are not directly applicable in Europe.

Socio-legal theories and some communication research address how people perceive privacy, but it is increasingly difficult to connect ideal notions of privacy to what is actually happening in the world, and the gap between norms of privacy, data practices, and laws of privacy is growing ever wider.

During the past two years I’ve delved into the legislative process of the new data protection law in the EU, the General Data Protection Regulation, which will enter into force in May 2018. One of my earliest observations was the inaccessibility of the language and the complexity of the document that addresses a very basic human need: to be able to choose when one is out of sight. Instead, the end result is an intricate web of rules and exceptions to the collection of personal information with very vague references to actual perceptions of privacy.

After reading David Graeber’s Utopia of Rules I came to an insight that had previously existed only as a side note in my conceptualization of surveillance societies: the role of bureaucracies. Rather than thinking of data collection as an element of discipline in the Foucauldian sense, I started to think of data collection as part of the bureaucratic system’s inherent logic that is independent from the actual power of surveillance.

The utopian promise of big data is not that of control but of efficiency. The present logic of data maximization defies traditional ideals of data minimization according to which data can only be processed for a specific purpose. The collection of data points is such an essential part of modern bureaucracies, private and public alike, that its role in society is treated as a given. This is why attitudes to data collection and privacy are not divided along the public/private or even the left/right spectra but rather along the lines of strange bedfellows such as anarchism and libertarianism versus socialism and fascism. The goals are of course very different, but the means are similar.

By seeing questions of privacy and surveillance through this lens the GDPR’s legislative process started to make more sense to me. The discourses employed by corporate and public lobbyists were not really about control over information flows, nor were they about disciplinary power. They were about the promise of bureaucratic efficiency.

Advertisements

The mystification of algorithms

Whenever I read stories on big data, it strikes me that journalists hardly ever know or care to explain what algorithms are or what they do. Take this example from the Economist’s recent special report on big data and politics:

Campaigners are hoovering up more and more digital information about every voting-age citizen and stashing it away in enormous databases. With the aid of complex algorithms, these data allow campaigners to decide, say, who needs to be reminded to make the trip to the polling station and who may be persuaded to vote for a particular candidate.

The Economist, March 26th, Special Report p.4

First, few seemed bothered with making a distinction between inferred intelligence and collected data. The quote above is an example of inferring information from existing databases – trying to figure out what kind of behaviour correlates with voting for a specific party. Since most databases are of a commercial nature,  I am guessing that they are trying to figure out if certain consumer behaviour, like buying organic milk, correlates with voting democrat.

In the case of protest movements, the waves of collective action leave a big digital footprint. Using ever more sophisticated algorithms, governments can mine these data.

The Economist, March 26th, Special Report p.4

The second example is about mining social media for data on dissidents and revolutionary action. There the data itself can be a source of “actionable intelligence” as Oscar Gandy would put it. There is nothing inherently sophisticated in looking for evidence of people participating in protest events on Facebook or finding protest movement chatter on Twitter.

Second, while the algorithms might be complex, they are usually employed in programmes that have relatively clear user interfaces. The Citizen Lab at the University of Toronto demonstrated that “net nannying” tools that are employed in schools, homes or businesses are also frequently used by authoritarian states for monitoring a whole nation’s communications.

While these reports give some insight into how data science is used to gain an advantage in politics or law enforcement, they tend to mystify the technologies and techniques involved. We are left confounded by this data magic that somehow changes the playing field. But the guiding principles are not that hard to understand, and using the programmes do not require a degree in computer science. We might not know exactly how the algorithms work, but we know what sources of information they use and what their purposes are.

Prism-slide-9
Slide illustrating how to search PRISM’s counterterrorism database

 

The Privacy Shield – dead on arrival

Captain America Shield 04Five months after the ECJ ruled  that the Safe Harbor agreement is invalid the EU Commission has presented a new “Privacy Shield” that will replace the old agreement.

The Privacy Shield does contain some improvements regarding the rights of European citizens. It does not, however, fundamentally change the national security exception which brought down the agreement in the first place.

Recital 55 of the draft agreement reads as follows:

The Commission’s analysis shows that U.S. law contains clear limitations on the access and use of personal data transferred under the EU-U.S. Privacy Shield for national security purposes as well as oversight and redress mechanisms that provide sufficient safeguards for those data to be effectively protected against unlawful interference and the risk of abuse.

The Commission refers to Presidential Policy Directive 28 (“PPD-28”) regarding limitations on signal intelligence, issued by President Obama on January 17, 2014. The PPD-28 extends the same level of protection to non-US citizens as US citizens.

Sec.4 Safeguarding Personal Information Collected Through
Signals Intelligence

All persons should be treated with dignity and respect, regardless of their nationality or wherever they might reside, and all persons have legitimate privacy interests in the handling of their personal information.

That being said, any presidential policy directives may be overturned by future presidents, as this is a policy document, not an amendment to existing law (which permits the surveillance of non-US nationals, see FISAAA 2008 section 702). If you have the time, see the late Caspar Bowden’s excellent presentation on why agreements such as the Privacy Shield are doomed to fail:

Even if the PPD-28 would be allowed to stay in force AND the PPD-28 would be respected, it still endorses the mass collection of data:

Sec. 2 Limitations on the Use of Signals Intelligence Collected in Bulk
Locating new or emerging threats and other vital national security information is difficult, as such information is often hidden within the large and complex system of modern global communications. The United States must consequently collect signals intelligence in bulk.

Granted, the PPD-28 states that bulk collection must be used only for the detection and countering of (1) espionage, (2) terrorism, (3) weapons of mass destruction, (4) cybersecurity, (5) threats to US armed forces and their allies, and (6) transnational criminal threats. However, by its very definition, bulk collection means that all data is retained and accessible by the intelligence community, and there is no effective oversight on how that data is used. Let me cite Edward Snowden on what this boils down to in practice:

“In the course of their daily work they stumble across something that is completely unrelated to their work, for example an intimate nude photo of someone in a sexually compromising situation but they’re extremely attractive,” he said. “So what do they do? They turn around in their chair and they show a co-worker. And their co-worker says: ‘Oh, hey, that’s great. Send that to Bill down the way.’  (New York Times, July 20, 2014)

Data power conference (June, 22-23), part 1: Disconnect

I recently attended and presented at “Data Power” in what turns out was an excellent conference organized by the University of Sheffield. The conference had called upon academics to submit papers that approached the question of big data from a  societal (& critical) perspective. That being said, the conference papers were more often than not empirically founded and the presenters refrained from adapting a conspiratorial mindset, which might sometimes be the case when discussing big data.

Here are some of the key points that I picked up from attending the different panels:

Disconnect & Resignation / tradeoff fallacy

Stefan Larsson (Lund University) and Mark Andrejevic (Pomona College) both stressed that there is a disconnect between commercial claims that people happily trade their privacy for discounts and services and how people actually feel. In reality, people feel that they are “forced or bribed” to give up their data in order to access a service. Joseph Turow, Michael Hennessy and Nora Draper have recently published a survey on what they call the “tradeoff fallacy” which supports the disconnect and resignation hypothesis put forth by Larsson and Andrejevic.

Access rights are rarely respected

Clive Norris (University of Sheffield) and Xavier L’Hoiry (University of Leeds) had investigated if companies or the public sector (data controllers) actually respect that people have the right to access their own data according to current data protection legislation. Turns out, they don’t:

• “20 % of data controllers cannot be identified before submitting an access request;
• 43 % of requests did not obtain access to personal data;
• 56 % of requests could not get adequate information regarding third party data sharing;
• 71 % of requests did not get adequate information regarding automated decision making processes.”

Instead, the controllers consulted applied what Norris & L’Hoiry call “discourses of denial”, either questioning the rights themselves (we do not recognize them), falsely claiming that only law enforcement would have access to this data or even claiming that the researches were insane to make such a claim (why would you possibly want this information?). The most common response was, however, none at all. Deafening silence is an effective way to tackle unpopular requests.

Self-management of data is not a workable solution

Jonathan Obar (University of Ontario Institute of Technology & Michigan State University) showed that data privacy cannot possibly be better protected through individual auditing of how companies and officials use your personal data, calling this approach a “romantic fallacy”.

Even if data controllers would respect the so-called ARCO rights (access to data, rectification of data, cancellation of data & objection to data processing), it is far too difficult and time-consuming for regular citizens to manage their own data. Rather, Obar suggests that either data protection authorities (DPAs) or private companies would oversee how our data is used, a form of representative data management. The problem with this solution is of course the significant resources it would require.

There is no such thing as informed consent in a big data environment

Mark Andrejevic emphasized that data protection regulation and big data practice are based on opposing principles: big data on data maximization and data protection on data minimization. The notion of relevance does not work as a limiting factor for collecting data, since the relevance of data is only determined afterwords by aggregating data and looking for correlations. This makes informed consent increasingly difficult: what are we consenting to if we do not know the applications of the collection?

Behavioural advertising – Always Be Creeping

There’s a new business logic which permeates most of today’s online commerce. The ABC is no longer Always Be Closing, it’s Always Be Creeping.

But even as behavioural advertising evolves and targeting becomes more sophisticated, sometimes companies may wish to be subtler when offering targeted ads to consumers. In a much-cited New York Times article from 2012, a former employee of Target said that

[W]e started mixing in all these ads for things we knew pregnant women would never buy, so the baby ads looked random. We’d put an ad for a lawn mower next to diapers. We’d put a coupon for wineglasses next to infant clothes. That way, it looked like all the products were chosen by chance. And we found out that as long as a pregnant woman thinks she hasn’t been spied on, she’ll use the coupons. She just assumes that everyone else on her block got the same mailer for diapers and cribs. As long as we don’t spook her, it works. 

Tene and Polonetsky (2013) argue that it’s not the data collection itself which is creepy, but how statistical analysis is used to come to certain conclusions about you.

This is especially the case when “offline” purchases are combined with information on online behaviour, a practice referred to as “onboarding”. We have grown accustomed to personalised ads based on web browsing or Facebook likes, but today’s marketers want a complete picture of our everyday transactions as well.

Whether or not one sees this as invasive is up to each and everyone to decide, but one can bear in mind that one of the industry’s lead data brokers, Acxiom, has “information [on] about 700 million consumers worldwide with over 3000 data segments for nearly every U.S. consumer (FTC report, 2014).” Combined, the biggest data brokers have billions and billions of records on people and businesses.

In their defence, the Digital Advertising Alliance does offer consumers a choice to opt out of data tracking. If consumers know that such an option exists is another question entirely, and the registry only covers companies which have agreed to participate. In the end, such self-regulatory measures directed towards consumers are ineffective, as the most privacy-conscious are likely to use other means to conceal their actions online whereas the vast majority of people are unaware that such options exist.

 References

Federal Trade Commission, 2014: DATA BROKERS: A Call for Transparency and Accountability. 

Tene, Omer and Polonetsky, Jules, 2013: A Theory of Creepy: Technology, Privacy and Shifting Social Norms [September 16, 2013]. Yale Journal of Law & Technology, 2013. Available at SSRN:http://ssrn.com/abstract=2326830.

Big Data Dystopia pt 2: Newspapers and web shops join forces

In an earlier post, I discussed the possible implications of banks and insurance companies converging. This post will focus on the convergence of newspapers and web shops.

In a nutshell, a daily newspaper’s greatest assets have usually been its reach and its credibility.

The past 20 years or so, newspaper subscriptions have been declining in most countries. Other media outlets are just as popular as newspapers’ websites, and the reach of newspapers is no longer as dominant as it used to be.

Credibility, however, works differently. Increased competition does not affect credibility negatively. A good review in the New York Times can lift something or someone fairly unknown from the margins to the mainstream.

It’s not news that many newspapers are struggling in the online ad market, even though the market is growing. Google and Facebook dominate, and little suggests that newspapers will be able to compete with the two online ad powerhouses. However, the two have not yet been that successful in sealing the deal; that is, getting people to actually buy products online.

One of Amazon’s greatest feats is doing exactly that.  With the help of its elaborate recommendation system, Amazon recommends products based on previous purchases and browsing history. Amazon’s algorithm can even identify you (and help you on your way) as a potential drug dealer if you choose to buy a certain scale.

What Amazon tries to achieve is increased credibility  through crowdsourcing customer reviews. Still, an anonymous, non-professional customer review is nothing like an article in The Guardian.

In 2013, Amazon owner Jeff Bezos bought the Washington Post. Bezos’ editorial aspirations aside, the move is likely to spur innovative cross-ownership business models. Similarly, Finnish newspaper Helsingin Sanomat has also launched their own web shop, Mitä Saisi Olla.  Although significantly smaller in scale, the message is clear: if online ads fail, online shops might be the answer.

Now, based on innovations in behavioural targeting and automatized tracking of reading patterns online, newspapers have more information on their readers than ever. Not all newspapers track their users of course, but those wishing to remain attractive to advertisers in this day and age should at least consider doing so. A third asset for newspapers has emerged: Deep knowledge about reading patterns can tell as much or even more than a person’s  Google search history. The articles we read, how much time we spend reading them and whether we recommend them to our peers are essential for understanding not only who we are but also who we strive to be.

This could lead to at least two outcomes. First, reviews and product benchmarks might be published alongside convenient links to the web store. A great book review can be the catalyst for a spontaneous one-click-buy.

Second, data on reading patterns can be  compared to consumption history, creating an even clearer picture of consumer interests. The web shop is no longer fully dependent on browsing history but can also rely on actual information on  consumers’ interests. Similarly, the newspaper can not only speculate on its readers’ consumption patterns, but actually convince advertisers that they know exactly what products their readers will buy.

The crux is  that such actions might damage the newspaper’s reputation. Let’s hope that the newspapers won’t be reduced to mere barkers for web shops.

Geolocation: GPS, IP addresses and Wi-Fi

A study by researchers from Australia’s ICT research centre NICTA revealed in 2012 that geolocation based on IP addresses alone is off by 100 km in approximately 70 per cent of all cases. With regular broadband, it is possible to have more accurate predictions, but since mobile data is, well, mobile, it means that the user moves around quite a bit. If you roam in another country, for example, your IP will still register you as being in the country of your operator.

In other words, for consumer monitoring or surveillance purposes, IP address location data is worthless.

So why does turning on wifi make location data more accurate? Because turning on wifi means turning on the device’s access to a database on wifi access points and radio towers. So-called Wi-Fi-based positioning systems (WPS) are maintained by different companies, most notably Google, Microsoft and Apple.  On the plus side, your phone gets an accurate location read even though you’re inside a building. The downside? You get tracked even though you have your GPS turned off and you’re not connected to a Wi-Fi network, but simply have your phone’s Wi-Fi on.

In some cases, the phone keeps tracking networks even though Wi-Fi is off. Google does acknowledges this with the following statement:

“To improve location accuracy and for other purposes, Google and other apps may scan for nearby networks, even when wifi is off. If you don’t want this to happen, go to advanced > scanning always available.”

If you have a Google account, it could be worth checking out where you’ve been the past year through Google’s location history service.

In light of this, it becomes clear that any data retention laws that governments might have pale in comparison with the data retention gathered as a part of services provided by Google, Apple or Microsoft.

Big Data Dystopia pt 1: Banks and insurance companies converge

One of the more discussed developments in the field of media is that platforms are converging: tv companies are going online, telcos are providing streaming services, radio channels are becoming online media and so on. This convergence or “vertical integration” is of course challenging for law-makers and courts that have to apply antitrust laws to completely new cases.

Vertical vs horizontal integration:

Source: Wikipedia Commons / Martin Sauter

Technological developments in the communications industry are also affecting how other markets work. Let’s take the example of banks and insurance companies.

Insurance premiums are, somewhat simplified, based on calculating risk. In order to calculate what a person’s life insurance premium should cost,  factors like age, work and medical history are taken into account. These factors determine the probability of the insured falling ill or getting into an accident. The more detailed information on the insured’s life, the better.

Banks (and several app developers) are waking up to the fact that electronic payment means that it is possible to know a whole lot more about their customers than in the cash and check days. Consumers can click away and define every single purchase they make, be it clothing, food, home electronics or alcohol. A great service for the banks’ customers,  who can plan their home economy better than before and pinpoint where their salaries are going. Or as Pridmore and Zwich (2011, 273)  would have it, “[c]onsumers often happily participate in the personal information economy and the surveillance practices that underpin it. ”

In some countries, insurance companies and banks are converging – either insurance companies begin to provide banking services or banks buy insurance companies. This way customers can choose to buy their insurance and their bank services from the same provider – often at a better price. This is when it gets interesting.

The consolidation of banking and insurance data means that, theoretically, insurance companies could adjust insurance premiums according to the purchases being made by the same company’s banking clients. Failed to renew your gym membership, money spent on alcohol gone up two thirds? Several visits to the doctor in the past months? Perhaps the next insurance premium won’t be as affordable.

The question is, are the new banking services provided with the consumer in mind, or is this simply another way to solicit data from people in order to create business elsewhere? One thing is certain: insurance actuaries would love to have access to that data.

References

Pridmore, Jason and Detlev Zwick. 2011. Editorial: Marketing and the Rise of Commercial Consumer Surveillance. Surveillance & Society 8(3): 269-277.