The big data phenomenon shows no signs of slowing down. Like any good journalist looking for blood in the water, I’m waiting for the backlash, just as we’ve seen with other new technologies, such as BYOD. But there’s been nothing but good news. Even in Forbes this week, Bain Insight posted twice, one a story about companies such as Samsung and Progressive Insurance getting the most out of big data, and one an infographic revealing big spending on big data. Last week, Louis Columbus reported in Forbes on how companies are using big data to get smarter about how they do business – inspirational stuff.
That’s the part about big data – the combination of structured data sources such as databases and data warehouses with unstructured data sources such as documents, e-mail, and spreadsheets – that excites me the most. It’s the opportunity to move past extrapolation of data into analysis of actual data. For someone like me who got a D in graduate statistics, the idea of no longer making assumptions about data is a real spine-tingler.
As InformationWeek’s Chris Murphy noted in Can Analytics Outperform The Machine Whisperer? last week, “rather than applying true predictive analytics based on actual performance metrics, what I see companies doing most often is relying on averages.” He cites the example of the recommendation to replace a fuel pump at 65,000 miles simply because that’s the average time of failure. “That scenario is a lot different from changing the fuel pump because the real-time data tells us that pump is running poorly and will fail and leave you stranded on the roadside at 45,000 miles.” (Earlier in October, InformationWeek also published Big Data Success: 3 Companies Share Secrets.)
Similarly, analyst Simon Sherrington cited in Getting the Best Out of Big Data, also last week, a “classic big data scenario: the need to match geographic network rollout data with customer take-up information and customer profile data in order to analyze which types of customer were being reached by the operator’s marketing, geographically, and which were proving resistant to marketing efforts.”
So while everybody’s so excited, let me throw a little water on the fire. Big data has some little details that, in my mind, everyone seems to be forgetting about.
Big data doesn’t make data management easier. It makes it harder. Companies that have had a difficult time mastering structured data aren’t going to magically master unstructured data. There are little stumbling blocks such as taxonomies, consistency, hierarchies, and so on that have always made getting to a single source of truth a challenge. Is it a zip code or a postal code? Is it a car, a truck, or a vehicle?
Without applying some rules, you could end up being more confused, with data that’s less reliable and less trustworthy than before. My advice: don’t start tackling big data unless you’re really confident that you’ve mastered data of any size.