The buzz this week about big data centered on the idea of whether it should be in the cloud or not. Notwithstanding the idea that extremism in technology (all data will be in the cloud, every app will be mobile) is as flawed a notion as extremism in politics, there were arguments on both sides.
In some cases, big data and the cloud are considered inextricable, as in the acronym SMAC, the last two letters of which stand for analytics and the cloud; technology consultant Kurt Marko alludes to this in his Forbes piece earlier this month on using big data and machine learning, as did Andrew Brust last week in his article on cloud machine learning on ZDNet. But cloud and analytics aren’t necessarily inextricable.
As EnterpriseTech’s George Leopold noted earlier this month, “most big data production workloads are hosted on-premises,” if only for security reasons. But he cited a Wikibon survey noting that the percentage of cloud services within the big data market was increasing: from $1.3 billion in 2014 to $5.69 billion by 2020. Leopold continuied, “Wikibon foresees the big data ‘center of gravity’ shifting to the cloud as more enterprises deploy cloud infrastructure or embrace hybrid cloud options as a way to leverage cloud technology while securing sensitive or proprietary in-house.”
InfoWorld columnist Matt Asay also recently posited that big data is all about the cloud. Not surprisingly, he quoted Amazon’s data science chief as saying “analytics is addictive,” so your infrastructure needs to be able to keep up. Score one for the cloud, even considering the source.
However, I prefer the insight of Shaun Connolly, strategist for Hadoop developer Hortonworks, who also used the gravity analogy: “Because the laws of physics prohibit the easy movement of hundreds of terabytes or petabytes of data across the network, Connolly says customers will have Hadoop clusters [on-premises] and on various clouds to be able to do the appropriate analytics wherever the bulk of the data has landed. His term for that is ‘data gravity.’ When the newer data sets — such as weather data, census data, and machine and sensor data — originate outside the enterprise, the cloud becomes a natural place to do the processing.”
Still, the hybrid view isn’t monolithic in the industry. As Modern Infrastructure editor-in-chief Alex Barrett posited on Tech Target this month, compute- and I/O-intensive big data workloads won’t stray to the cloud yet as security and existing infrastructure keep analytics in the data center. He was, of course, careful to add “at least for now.”
Still, Barrett cites an Enterprise Strategy Group survey that found “when it comes to new big data infrastructure, 18% of respondents said they are planning to use dedicated (non-virtualized) servers for analytics workloads; 30% are looking to traditional virtualized infrastructure; and 21% are considering dedicated analytics appliances from the likes of Oracle and Teradata. Only 21% are considering public cloud, while another 10% are thinking about a public/private hybrid deployment.”
In Barrett’s piece, ESG analyst Nik Rouda makes a highly pertinent point – that it may depend less on the technology than on the people managing the technology: “Sometimes, the thinking is, ‘We’ve always done it this way,’ and so people go to their built-in biases or best practices.”
As an opponent of extremism, I’m still inclined back to what Hortonworks’ Connolly said: the cloud versus premises argument doesn’t matter. There is no single place that data should live. If you want to hoard your data internally, have a good time. But there is undoubtedly going to be a moment when you’ll want to bring in data from somewhere else, and that other place is likely to be a cloud. Data is already coming at you from multiple directions – the cloud is just one more source.
This article was written by Howard Baldwin from Forbes and was legally licensed through the NewsCred publisher network.