Diving into Domestic Data Mining
Via Boing Boing a very good short animation discussing data mining. This isn’t focused on the NSA program that is currently the source of discussion and dispute but the broader issue of how both companies and governments are able to retain, purchase, and analyze massive amounts of data.
For a deeper dive into data mining, I highly recommend Inside the Matrix, James Bamford’s March 2012 cover story for Wired. Bamford has written four highly regarded books on the history of the National Security Agency. (See this New Yorker profile of Bamford as the NSA’s “chief chronicler.”) His Wired article focuses on the NSA’s new massive data center in Utah.
But, given that so many people seemed shocked– shocked I tell you!– to hear that the NSA is data mining information from domestic calls, one should read what Bamford wrote over a year ago about NSA activities (not to mention what he had written in his books about the NSA’s history of testing 4th Amendment limits, if not transgressing them):
For the first time, a former NSA official has gone on the record to describe the program, codenamed Stellar Wind, in detail. William Binney was a senior NSA crypto-mathematician largely responsible for automating the agency’s worldwide eavesdropping network. A tall man with strands of black hair across the front of his scalp and dark, determined eyes behind thick-rimmed glasses, the 68-year-old spent nearly four decades breaking codes and finding new ways to channel billions of private phone calls and email messages from around the world into the NSA’s bulging databases…
OK, so we’re not talking about a new hire at Booz Allen. Later, Bamford continues:
Binney left the NSA in late 2001, shortly after the agency launched its
warrantless-wiretapping program. “They violated the Constitution setting it up,” he says bluntly. “But they didn’t care. They were going to do it anyway, and they were going to crucify anyone who stood in the way. When they started violating the Constitution, I couldn’t stay.” Binney says Stellar Wind was far larger than has been publicly disclosed and included not just eavesdropping on domestic phone calls but the inspection of domestic email. At the outset the program recorded 320 million calls a day, he says, which represented about 73 to 80 percent of the total volume of the agency’s worldwide intercepts. The haul only grew from there.
Emphasis added. So, the answer, my friend, has been blowin’ in the [stellar] wind for some time now.
Bamford’s article is long and it is excellent. It provides context for today’s debates. And, as for that new NSA facility? Consider this:
Given the facility’s scale and the fact that a terabyte of data can now be stored on a flash drive the size of a man’s pinky, the potential amount of information that could be housed in Bluffdale is truly staggering. But so is the exponential growth in the amount of intelligence data being produced every day by the eavesdropping sensors of the NSA and other intelligence agencies. As a result of this “expanding array of theater airborne and other sensor networks,” as a 2007 Department of Defense report puts it, the Pentagon is attempting to expand its worldwide communications network, known as the Global Information Grid, to handle yottabytes (1024 bytes) of data. (A yottabyte is a septillion bytes—so large that no one has yet coined a term for the next higher magnitude.)
It needs that capacity because, according to a recent report by Cisco, global Internet traffic will quadruple from 2010 to 2015, reaching 966 exabytes per year. (A million exabytes equal a yottabyte.) In terms of scale, Eric Schmidt, Google’s former CEO, once estimated that the total of all human knowledge created from the dawn of man to 2003 totaled 5 exabytes. And the data flow shows no sign of slowing. In 2011 more than 2 billion of the world’s 6.9 billion people were connected to the Internet. By 2015, market research firm IDC estimates, there will be 2.7 billion users. Thus, the NSA’s need for a 1-million-square-foot data storehouse. Should the agency ever fill the Utah center with a yottabyte of information, it would be equal to about 500 quintillion (500,000,000,000,000,000,000) pages of text.
The data stored in Bluffdale will naturally go far beyond the world’s billions of public web pages. The NSA is more interested in the so-called invisible web, also known as the deep web or deepnet—data beyond the reach of the public. This includes password-protected data, US and foreign government communications, and noncommercial file-sharing between trusted peers…
Yeah, you really should read Bamford’s article.