Beware the NSA: Human Vs. Machine Intelligence
What is the real conversation we should be having about NSA data mining?
June 13, 2013
Data mining technology is only as good as the inherently human effort to determine which data are relevant. This is art as much as science.
Unless we begin to value this critical human effort, data mining will not yield results that make us safer.
Americans have always excelled at technological innovation and admired the alluring rationality of science.
The country grew rich on the readiness of Henry Ford and others to ask humans to imitate the regularity and efficiency of machines.
Today, our defense establishment seeks to find ways to eliminate human error from the all-too-human practice of war. Thus, the development of “smart” weapons able to hit targets with superhuman precision.
The era of supercomputers has arguably given rise to the greatest optimism ever about the primacy of technology. The ability for a supercomputer to “crunch” incredibly large data sets has allowed some to argue that we can bypass human analysis altogether.
The underlying belief in the “power of big data” inverts reality. It isn’t the data that are powerful. It’s the people whose insightful grasp of the context of a particular phenomenon who are powerful.
And it is their ability to build algorithms that capture expression of those contexts in massive data sets that we should be focusing on.
Right after the NSA issue broke, former CIA director Leon Panetta’s chief of staff, Leon Bash, remarked that, “If you’re looking for a needle in the haystack, you need a haystack.”
Not so. Actually, what you need is an accurate narrative, or theory, about the needle and how to characterize it.
Moreover, its characteristics must leave digital indicators, if you are planning to search in a digital haystack.
There must be enough examples of needles in the world for researchers to be certain that they can distinguish a needle from a stalk of dried grass.
Without a precise sense of how to recognize a needle, all you will get are a lot of false positives.
Any process for selecting particular data from a larger set represents a story about the world outside the data.
Don’t over collect data
It has been almost a quarter of a century since Princeton professor Orley Ashenfelter used statistics on rainfall and temperature to predict the quality of Bordeaux wines.
The reason that Ashenfelter could compute the value of a wine using statistics is because he had developed a strong theory about how rainfall and temperature combine to produce good wine.
In other words, he imposed a pre-existing story onto data and correctly collected those particular data that served the story.
Imagine what would have happened if he had also collected statistics about the rise and fall of the population in France, the number of agricultural strikes per year and the number of cars traveling national highways each month?
These are all eminently collectable data points, but he may not have had such good success and may have simply drowned in an oversupply of information.
In the last decade, the United States has by all appearances put enormous energy and resources into developing the capability to collect and process digital and digitize-able data, and to make the data available to analysts in a format that makes intuitive sense.
The technological challenges are not insignificant and advances that have been made are impressive.
But these technological challenges pale before the much more complex task of determining the factors and circumstances that lead people to political violence.
And they pose a mere fraction of the challenge analysts have transcending their own and their institutions’ biases and assumptions as they develop their theories about the meaning of the data.
These assumptions, as we know, are often unconscious.
Some of them lie in the human predisposition to take mental shortcuts in order to make sense of the complexities around us.
These include the tendency, described by Nate Silver in his recent book The Signal and the Noise, to elevate the importance of those data that are easy to access and dismiss data that are difficult to collect.
The content of our online searches may not be the best data for analyzing political violence.
But it is easier to collect the data than it is to develop an on-the-ground nuanced understanding of behind-the-scenes conspiracy building in, for example, Peshawar.
In other words, if you are looking for a needle, and collection technology makes it easy to build a haystack in which to look, it would be an entirely understandable human tendency for you to elevate the importance of the haystack.
The task facing those seeking to use data mining to support counterterrorism is not fundamentally different from the detective work that has always faced the investigator.
Intelligence is the job of selecting and putting together evidence into a feasible narrative. But it also requires having a nuanced sense of which evidence to look for to fill in the developing story.
Poor stories will lead to poor data extraction and collecting more data will not solve the problem.
We will not make the necessary advance with an imbalanced focus on the technological capability, without an equally strong focus on our human capabilities.
We need critical thinking that helps us defend against our own biases, knowledge of societies and histories in which we are engaged, and imaginative and nuanced understanding of how statistical data do (and do not) express social patterns.
In order to understand problems that are fundamentally social and political, such as international terrorism, analysts need encouragement from their leadership to relentlessly interrogate their own narratives.
- Is the story we are imposing on these data the right one?
- Are we exploring the right data?
- Are we using these data because they are the right source for insight, or simply because they are available?
This encouragement can be reflected in the allocation of resources to projects that develop the human side of cyber-security.
And it needs to be reflected in the education and training of national security professionals, in the hiring process and via a general culture of appreciation for the degree to which cyber is a human endeavor.
Above all, we Americans should recognize our own technological bias and our tendency to tell ourselves the story that technology has self-generating power.
Perhaps that means developing a greater faith in our ability to stay critically engaged in a complex world using the power of knowledge and imagination. That would be an excellent starting point to learn the right lesson from the NSA story.
Editor’s note: The views expressed in this article are solely those of the author and do not necessarily reflect the policies or positions of the National War College or the U.S. Government.