While everyone can agree that Big Data needs to be approached deliberately and with caution, I found the article “Big data: are we making a big mistake?” in the March 28 issue of Financial Times much too negative. It suffers from misleading arguments and subtly biased phrasing, and is somewhat littered with “straw men.”
We can start with what’s posited to be the “four exciting claims” that turn out to be “overoptimistic simplifications.” First, Big Data analysis certainly provides exciting, new, interesting and relevant results. Whether they are “uncannily accurate” is a question of degree and judgment. Even if it’s accepted as true that they’re not, it doesn’t follow that they’re not useful. Second, I don’t know of any claim that all data points are being captured or need to be. The very much larger volumes greatly improve validity but don’t replace sampling; they just make it better. Third, as some of the online responses mentioned, for many applications, correlations are the point; causes are not relevant. For marketing analysis, the fact that 95 percent of the shoppers who bought beer also bought pizza is helpful to know; why the correlation exists doesn’t matter. I’m not sure what the fourth point is regarding the numbers speaking for themselves.
The negative phrases throughout the article are no doubt deliberately chosen to help bolster the argument, but calling attention to them correspondingly helps to reveal the underlying bias. The “digital exhaust” of web searches can be “user history.” A “messy collage of data points for disparate purposes” can be a “revealing sample of information across multiple applications.” A “messy pile of found data” is simply “found data.” No credit is given to the iterative nature of good data mining: see how good and useful the results are, and refine the searches and analysis appropriately multiple times.
In addition to the four claims, other straw men are used. Sampling bias is a known pitfall to be avoided; holding it out as somehow inevitable is sort of like saying that Java shouldn’t be used to program because the compiler won’t detect logic errors for you. The statement is made that “profitability should be conflated with omniscience.” Who is claiming omniscience? In commercial applications, profitability is precisely the ultimate goal.
Certainly there are cautions and potential problems with Big Data, but the same can be said about all technological innovation. I don’t think that this article makes a convincing case that there is something uniquely dangerous or pernicious about Big Data, and this seems to be confirmed by its tremendous growth and success worldwide.