Is cloud making a difference in Big Data efforts? Can it deliver the agility and scalability required to handle petabytes and exabytes’ worth of data coming out of enterprises and the Internet of Things?
The answer to both questions is “yes, but,” according to highly engaging panel that was part of last week’s Data Summit, held in New York. I had the opportunity to moderate the panel, joined by David Mariani, CEO of AtScale, Wendy Gradek, senior manager with EMC, and Andy Schroepfer, chief strategy officer at Hosting.
Interestingly, the appeal of cloud is not about the cost at all — it’s about the change in the way enterprises view technology solutions. “People are fine with paying as you go,”said Mariani. “And they’ll probably pay more at the end of the day than they would if they paid up front.” Scalability is a huge factor in the appeal of cloud, said Gradek. “You have to think about all the data that’s being collected from you. Just by being read. Where’s all that information going to get stored? It’s got to be stored somewhere. It’s got to be scalable.”
As a result, there’s a whole new approach to developing data insights in being developed. “It’s data as a service — not traditionally what we think of hardware and software,” Mariani said.
That being said, not all data is going to the cloud. In fact, very little existing on-premises data will ever make its way into public clouds, panelists agreed. “Data needs to be generated in the public cloud to make sense to store in the public cloud,” said Mariani. “A lot of our clients are big financial institutions. For them to generate the data onsite, and move it to someone else’s cloud, that’s never going to happen. The public cloud makes a whole lot of sense, but not for data generated on-prem.” Security is one reason why such a migration won’t happens. Another reason is that the entire process of moving data is expensive, and simply not worth it.
There’s also another significant roadblock to the wholesale moving of data to public cloud, Mariani explained. “I always tell people, ‘we’re not going to move data to EC2 or somebody else’s public cloud overnight. Moving the data is not going to happen. We don’t have the bandwidth to do that.’ At the end of the day, it’s the network, and the network infrastructure is move data around.”
Another topic of discussion is the reported high rate of failure of Big Data analytics projects — up to 70 to 80 percent by some estimates. Along with the question of why it’s failing, there’s the question of how does an enterprise know if a big data effort is failing? Schroepfer points out that all too often, enterprises and their managers don’t set specific goals for their efforts. “You need to know your goal,” he said, likening the experience to prepping a thesis paper. “You’re going to have a hypothesis what you think the outcome will be.” That being said, he went on to observe that “too many people fail to know why they’re crunching their data. They may have a generic goal of wanting to have more personalization, assuming if they go grab more of these personal data points, they should have a better picture than just grasping in the dark.” However, he continued, “having a general goal of being a ‘little bit better’ is not going to deliver a desired outcome.”
Panelists also urged the formation of “data lakes” in enterprises — central repositories of raw data that is simply collected, and structured and processed at a later time — when needed by an application. This helps address the greatest challenge for many enterprises today is disparate data sources, and the inertia it creates within enterprises, said Gradek. “I don’t know how many times I’ve been told the information we need is six months out, or it’s about a year out. That’s not going to work for the business — their goals are very much weekly driven, especially in sales, where if you don’t make your numbers, and you don’t have visibility into your data, you’re running blind.” The key to resolving this supporting disparate data sources in a single enterprise location, she continued. “We need it to be in a central repository in its original state, so when we have those questions we can go to it and apply the logic as close to query time as possible and get what we need quickly.”
(Disclosure: the author is a contributor to Database Trends & Applications, published by Information Today, Inc., host of the Data Summit mentioned above.)
This article was written by Joe McKendrick from Forbes and was legally licensed through the NewsCred publisher network.