The mystification of algorithms

Whenever I read stories on big data, it strikes me that journalists hardly ever know or care to explain what algorithms are or what they do. Take this example from the Economist’s recent special report on big data and politics:

Campaigners are hoovering up more and more digital information about every voting-age citizen and stashing it away in enormous databases. With the aid of complex algorithms, these data allow campaigners to decide, say, who needs to be reminded to make the trip to the polling station and who may be persuaded to vote for a particular candidate.

The Economist, March 26th, Special Report p.4

First, few seemed bothered with making a distinction between inferred intelligence and collected data. The quote above is an example of inferring information from existing databases – trying to figure out what kind of behaviour correlates with voting for a specific party. Since most databases are of a commercial nature,  I am guessing that they are trying to figure out if certain consumer behaviour, like buying organic milk, correlates with voting democrat.

In the case of protest movements, the waves of collective action leave a big digital footprint. Using ever more sophisticated algorithms, governments can mine these data.

The Economist, March 26th, Special Report p.4

The second example is about mining social media for data on dissidents and revolutionary action. There the data itself can be a source of “actionable intelligence” as Oscar Gandy would put it. There is nothing inherently sophisticated in looking for evidence of people participating in protest events on Facebook or finding protest movement chatter on Twitter.

Second, while the algorithms might be complex, they are usually employed in programmes that have relatively clear user interfaces. The Citizen Lab at the University of Toronto demonstrated that “net nannying” tools that are employed in schools, homes or businesses are also frequently used by authoritarian states for monitoring a whole nation’s communications.

While these reports give some insight into how data science is used to gain an advantage in politics or law enforcement, they tend to mystify the technologies and techniques involved. We are left confounded by this data magic that somehow changes the playing field. But the guiding principles are not that hard to understand, and using the programmes do not require a degree in computer science. We might not know exactly how the algorithms work, but we know what sources of information they use and what their purposes are.

Prism-slide-9
Slide illustrating how to search PRISM’s counterterrorism database

 

Advertisements

The Privacy Shield – dead on arrival

Captain America Shield 04Five months after the ECJ ruled  that the Safe Harbor agreement is invalid the EU Commission has presented a new “Privacy Shield” that will replace the old agreement.

The Privacy Shield does contain some improvements regarding the rights of European citizens. It does not, however, fundamentally change the national security exception which brought down the agreement in the first place.

Recital 55 of the draft agreement reads as follows:

The Commission’s analysis shows that U.S. law contains clear limitations on the access and use of personal data transferred under the EU-U.S. Privacy Shield for national security purposes as well as oversight and redress mechanisms that provide sufficient safeguards for those data to be effectively protected against unlawful interference and the risk of abuse.

The Commission refers to Presidential Policy Directive 28 (“PPD-28”) regarding limitations on signal intelligence, issued by President Obama on January 17, 2014. The PPD-28 extends the same level of protection to non-US citizens as US citizens.

Sec.4 Safeguarding Personal Information Collected Through
Signals Intelligence

All persons should be treated with dignity and respect, regardless of their nationality or wherever they might reside, and all persons have legitimate privacy interests in the handling of their personal information.

That being said, any presidential policy directives may be overturned by future presidents, as this is a policy document, not an amendment to existing law (which permits the surveillance of non-US nationals, see FISAAA 2008 section 702). If you have the time, see the late Caspar Bowden’s excellent presentation on why agreements such as the Privacy Shield are doomed to fail:

Even if the PPD-28 would be allowed to stay in force AND the PPD-28 would be respected, it still endorses the mass collection of data:

Sec. 2 Limitations on the Use of Signals Intelligence Collected in Bulk
Locating new or emerging threats and other vital national security information is difficult, as such information is often hidden within the large and complex system of modern global communications. The United States must consequently collect signals intelligence in bulk.

Granted, the PPD-28 states that bulk collection must be used only for the detection and countering of (1) espionage, (2) terrorism, (3) weapons of mass destruction, (4) cybersecurity, (5) threats to US armed forces and their allies, and (6) transnational criminal threats. However, by its very definition, bulk collection means that all data is retained and accessible by the intelligence community, and there is no effective oversight on how that data is used. Let me cite Edward Snowden on what this boils down to in practice:

“In the course of their daily work they stumble across something that is completely unrelated to their work, for example an intimate nude photo of someone in a sexually compromising situation but they’re extremely attractive,” he said. “So what do they do? They turn around in their chair and they show a co-worker. And their co-worker says: ‘Oh, hey, that’s great. Send that to Bill down the way.’  (New York Times, July 20, 2014)