Big Data and numbers know a lot. But they can’t explain all the whys
April 16, 2013 Leave a comment
April 15, 2013
What You’ll Do Next
By DAVID BROOKS
Over the past few centuries, there have been many efforts to come up with methods to help predict human behavior — what Leon Wieseltier of The New Republic calls mathematizing the subjective. The current one is the effort to understand the world by using big data.
Other efforts to predict behavior were based on models of human nature. The people using big data don’t presume to peer deeply into people’s souls. They don’t try to explain why people are doing things. They just want to observe what they are doing.
The theory of big data is to have no theory, at least about human nature. You just gather huge amounts of information, observe the patterns and estimate probabilities about how people will act in the future.As Viktor Mayer-Schönberger and Kenneth Cukier write in their book, “Big Data,” this movement asks us to move from causation to correlation. People using big data are not like novelists, ministers, psychologists, memoirists or gossips, coming up with intuitive narratives to explain the causal chains of why things are happening. “Contrary to conventional wisdom, such human intuiting of causality does not deepen our understanding of the world,” they write.
Instead, they aim to stand back nonjudgmentally and observe linkages: “Correlations are powerful not only because they offer insights, but also because the insights they offer are relatively clear. These insights often get obscured when we bring causality back into the picture.”
This method has yielded some impressive observations. Analysts can look at Google search terms and pick up where flu outbreaks are occurring. In doctor’s offices, statistical predictions often make better diagnoses than clinical predictions. Wal-Mart executives looked at the data and noticed that, as hurricanes approach, people buy large quantities of Strawberry Pop-Tarts. They began to put Pop-Tarts at the front of the stores with storm supplies.
In my columns, I’m trying to appreciate the big data revolution, but also probe its limits. One limit is that correlations are actually not all that clear. A zillion things can correlate with each other, depending on how you structure the data and what you compare. To discern meaningful correlations from meaningless ones, you often have to rely on some causal hypothesis about what is leading to what. You wind up back in the land of human theorizing.
Another obvious problem is that unlike physical objects and even animals, people are discontinuous. We have multiple selves. We are ambiguous and ambivalent. We get bored, and we self-deceive. We learn and mislearn from experience. Thus, the passing of time can produce gigantic and unpredictable changes in taste and behavior, changes that are poorly anticipated by looking at patterns of data on what just happened.
Another limit is that the world is error-prone and dynamic. I recently interviewed George Soros about his financial decision-making. While big data looks for patterns of preferences, Soros often looks for patterns of error. People will misinterpret reality, and those misinterpretations will sometimes create a self-reinforcing feedback loop. Housing prices skyrocket to unsustainable levels.
If you are relying just on data, you will have a tendency to trust preferences and anticipate a continuation of what is happening right now. Soros makes money by exploiting other people’s misinterpretations and anticipating when they will become unsustainable.
Then there is the distinction between commodity decisions and flourishing decisions. Some decisions are straightforward commodities: what route to work is likely to be fastest. Big data can help. Flourishing decisions are things like who to marry, who to befriend, what career calling to pursue and what college to choose. These decisions involve trying to find people, places and things that harmonize with your subjective self. It’s a mistake to take subjective intuition out of this decision because subjectivity is the whole point.
One of my take-aways is that big data is really good at telling you what to pay attention to. It can tell you what sort of student is likely to fall behind. But then to actually intervene to help that student, you have to get back in the world of causality, back into the world of responsibility, back in the world of advising someone to do x because it will cause y.
Big data is like the offensive coordinator up in the booth at a football game who, with altitude, can see patterns others miss. But the head coach and players still need to be on the field of subjectivity.
Most of the advocates understand data is a tool, not a worldview. My worries mostly concentrate on the cultural impact of the big data vogue. If you adopt a mind-set that replaces the narrative with the empirical, you have problems thinking about personal responsibility and morality, which are based on causation. You wind up with a demoralized society. But that’s a subject for another day.