I railed last week about people who confuse “big data” with “lots of data,” and philosophized about discrete data versus data patterns, so I appreciated the blossoming this week of discussion offering yet another perspective: the idea of small data.
What is small data? In a perfect world, small data is the answer to the question. It’s that one little piece of information that we were looking for – or maybe that one little piece of information that we didn’t know we were looking for.
For example, just for fun, check out this TED talk from data scientist and Pratt Institute professor Ben Wellington, who analyzes municipal data. He started out by using GPS data from taxicabs to figure out what constituted “rush hour” in New York City and discovered that, because traffic was generally slow from 8:30 a.m. to 6:30 p.m., “it was more like ‘rush day.’”
More to the point, he was able to pinpoint an intriguing piece of small data: the worst place to park in Manhattan. By correlating ticket locations with hydrant locations, he discovered a space next to a fire hydrant that generated far more tickets than any other, and went out to discover that it was so poorly market that drivers thought it was a legal parking space. He got the city to paint the curb and make it clearer that it was illegal to park there.
Forbes columnist Mike Kavis, of Cloud Technology Partners, postulated a similar viewpoint last week, insisting that small data is driving the Internet of Things. In his explanation, he buttresses my argument about discrete vs. continuous data: “Small data can trigger events based on what is happening now. Those events can be merged with behavioral or trending information derived from machine learning algorithms run against big data datasets.”
He uses the example of a wind turbine equipped with sensors that transmit small, discrete data “to determine wind direction, velocity, temperature, vibration, and other relevant attributes.” That data is then aggregated with other data “into a data lake where machine-learning algorithms begin to understand problems” such as wear-and-tear, improved maintenance schedules, and other information. (For more on machine learning, see also this terrific piece from Venture Beat last week on how it will fuel innovation; though I have to admit, when you say “machine learning,” I automatically think “Skynet.”)
When you think about it, the idea of small data is inextricably linked to the Internet of Things. If computers collect data for analysis, then the smallest possible computer is a sensor out there somewhere, broadcasting (or more appropriately, narrowcasting) data for aggregation. It can collect data as tiny as “I’m a part that’s failing” or aggregate it into data as broad as “these parts are prone to start failing within X years of installation.”
For a look at ten astonishingly diverse and successful applications of IoT and big data, check out last week’s ZDNet story. No surprise that UPS is using sensors – that company’s been great at using technology for years – but I love Barcelona’s municipal installation of sensors for public transportation and Virgin Atlantic’s sensors for cargo on Boeing 787s. I found Disney’s Magic Band application a little intrusive, but I am also in favor of anything that makes its parks easier to navigate.
The ZDNet piece was part of a series that ran last week, one part of which – looking at the competitive advantage big data brings – also supported the “small data” perspective. As data scientist Max Shron of consulting firm Polynumeral noted, “It’s not how big the dataset is, but how detailed (or fine-grained) it is,” he said in a piece by Bill Detwiler. Detwiler added, “You can always pare down a dataset, but you can’t go back and get it.”
So start small, and go big.
This article was written by Howard Baldwin from Forbes and was legally licensed through the NewsCred publisher network.