At the Strata and Hadoop World conference this week in New York City, Big Data practitioners and vendors will gather once again to learn from and sell to each other. One aspect of this year’s conference to note is the rise of “fake Big Data” products, that is products that have added the adjective Big Data in the hope that the world will take more interest.
The urge to gain from the updraft of a powerful trend is a long established practice of technology marketing. In the dot com boom companies that had nothing to do with the Internet added “.com” to their name. Environmental products are green-washed. Cloud products are cloud-washed. And now we have the rise of Big Data washing.
Does the harm of such exaggerations outweigh the benefit? Perhaps someone will get to know a product they wouldn’t otherwise evaluate because the marketing uses the word “Big Data”. But this technique fools almost no one and frustrates those who are left to struggle with the fake big data technology.
Platfora, a vendor of a Big Data analytics platform, engaged Luth Research to look into the state of the market for big data analytics. The survey asked respondents directly if small data solutions were being repackaged as Big Data solutions: 55% said yes. About half of the respondents said that they had to break Big Data into smaller chunks to analyze it and that linking small data solutions to big data was frustrating. The Luth Research analysis concluded: “Big Data Analytics tools that don’t work have a negative impact on morale. Respondents who were not satisfied with their tools were more apt to describe their Big Data Analytics experience as stressful, frustrating and time-consuming.” (See this infographic for more detail from the Luth Research.)
“You keep using that word (big data). I do not think it means what you think it means.”
My view is that a company does itself a disservice by naming itself a Big Data product when that is really not its core value. Earlier this year, I created some content for DataRPM, a company that makes an excellent product that I have written for about in these articles: “Why Automated Semantics Will Solve The BI Dashboard Crisis” “How Semantics Can Make Data Analysis Work Like A Google Search”. Since I met them, I have been advising them that Big Data is not the right way to describe the value of their product. But in the latest press release I got, DataRPM described itself as “Big Data company”.
DataRPM solves the Top of the Funnel BI problem (“Why Top Of The Funnel BI Will Drive The Next Wave Of Adoption”) using natural language, automatic dashboard creation, and semantic modeling of data. Using DataRPM an untrained user can start playing around with data by asking a question in natural language. A dashboard is generated in response. Then by adding more language, the dashboard can be refined. In this way, users can find their way to data sets they may not have known existed. The penetration of BI, which now reaches around 30 percent of the people in company, can be greatly expanded through products like DataRPM.
But what does this have to do with Big Data? I guess you could access queries against Big Data repositories from DataRPM. But you can do that from a spreadsheet. Is a spreadsheet a Big Data technology? Is a pad of paper? In my view, DataRPM is obscuring its true value by claiming affinity with Big Data.
Qlik and Tableau have lots of partnerships with big data companies and are used all the time to explore Big Data, but in their marketing these companies focus on what they really do best as their value proposition: enabling exploration and discovery. Looker, a newcomer in the same space, has a unique approach to exploration and discovery that is catching on among some savvy early adopters, but they couldn’t resist putting a paragraph about “Unlocking Massive Datasets” on their product page.
For its part, Platfora offers another way to make data more easy to access and accessible. Platfora is about making the diversity of data stored in Hadoop available to a wider audience. The Hadoop connection makes the link to big data strong. But Platfora rightly doesn’t make Big Data the point. Platfora’s secret sauce is opening up the end-to-end process of transforming and manipulating data to the data analyst. The goal: stopping the bottleneck that comes when IT acts as an intermediary. Given how much data is going into Hadoop makes it natural for Platfora to focus there, but I suspect at some point, Platfora will be able to work its magic for data in all sorts of repositories.
“To me, an analysis becomes a Big Data analysis when new, massive data sets are included and dots are connected to know more about patterns and outcomes,” said Ben Werther, CEO and Founder of Platfora. “When you combine the usually distinct silos of customer interactions, transactions, and machine data, you are in the realm of Big Data. We think that the critical challenge is making it possible for every business analyst to ask questions that matter right away without an IT bottleneck.”
What does Big Data Mean Now Anyway?
The Luth Research report confirms Werther’s perspective. Respondents were asked about the following capabilities:
- Results within Hours/Days
- Add Data Source w/o IT
- Iterative Analysis
- Access to all Data Sources
- No Need to Break Data into Pieces
- Easy Sharing of Results
- Analysis w/o IT
- Data in Centralized Reservoir
- Analyze any Volume of Data
- Automated Real-Time Analysis
- Can Easily add New Datasets
- Results in Visual Form
- Analyze all Varieties of Data
More than half of current Big Data Analytics users said that they had all these capabilities. Only about a quarter of those planning to use Big Data Analytics in the future had these capabilities. Note that only a few of these capabilities are strictly related to Big Data. Most of them are just aspects of better BI technology.
So, what is fake Big Data technology? Technology that doesn’t really do much to make large data sets useful by the masses or help a data scientist break new ground. As in the case of DataRPM, such a technology solution may be great, but it isn’t really a Big Data technology.
What is real Big Data technology? Technology that rocks, and makes it easier to use Big Data in some meaningful way. In other words, Big Data has become another way to say something is good, oh, and also that it might be able to handle lots of data. To tell the difference between real and fake Big Data technology, when a vendor explains their story ask if the technology would help your Mom or Dad use Big Data, or if a data scientist could use it to do something they could never do before. If it passes one both of these tests, then call it a Big Data technology. If not, but you think the technology is great, just call it better BI.
Follow Dan Woods on Twitter:
Dan Woods is CTO and editor of CITO Research, a publication where early adopters find technology that matters. For more stories like this one visit www.CITOResearch.com. Dan has done research for Platfora, Qlik, Tableau, and DataRPM and many other companies in the BI and big data analytics space.