The Third ‘V’ of Big Data – Velocity


John McKinney

April 26, 2015

Business leaders are asking for new insights, enabled by Big Data and Fast Data. They want new real-time predictive analytics capabilities to gain cost efficiencies, generate new revenue streams and mitigate disruptive threats from startups and parallel industries. As discussed in Capgemini’s white paper, Big & Fast Data: The Democratization of Information, the winners are most likely to be those that master the seismic cultural shift and rapid technology evolution now underway.

The key to understanding “Big Data” solutions that deliver business insights at the point of action are the three V’s: Volume, Variety and Velocity (Gartner (META) 2001). Best understood are Volume and Variety – most new big data platforms are built with large, parallel commodity infrastructure, while open source data management tools economically store and retrieve both structured and unstructured data. Less mature, but crucial to realizing business value, is the third V, Velocity. We think velocity implies both the speed of change and the actual processing speeds (e.g. response times) required by the business requirements. Particularly crucial for today’s competitive landscape is the latter, because you can’t just scale out slow, batch processes.Bringing insight to the point of action often means doing so in real-time.

Many of the key business requirements being addressed by the Business Data Lake(BDL) focus on these real-time predictive analytics solutions. The business is less interested in what happened last year, last month or even yesterday.  Instead, the focus has changed to knowing what is happening now, and impacting what is going to happen next, using predictive analytics. Clearly, for example, it is better to predict failures of expensive equipment like turbines so corrective action can be taken to prevent costly outages. Customer-facing solutions also are driving real-time solutions.  It is a proven fact that sub-second response times dramatically increase shopping cart conversions, and complex analytics need to be executed during the customer interaction to affect desired outcomes.

To deliver real-time solutions, we bring the business logic and the data closer with in-memory solutions. This concept is not new. Optimal control over manufacturing processes requires millisecond response times. In finance, speed wins, so real-time platforms have been deployed for competitive advantage. In these and many other cases, in-memory systems have been used for years. So what has changed?  As Moore’s law continues to teach us, as technology evolves, new techniques become possible.  For instance, the price of memory keeps dropping, giving us the opportunity to do things previously too expensive to be done in the real world. Imagine how much memory is in your Smartphone.  There will be twice as much in less than two years, for the same price. As it is said, disk is the new tape and memory is the new disk. Mature, open source based in-memory data grids, like Gemfire, and emerging in-memory technologies like Spark and Tachyon enable us to deliver the real time analytics on Big Data that was impossible to fathom even a few years ago.

Real time processing is not new. Applying it to big data problems is.  We strongly feel that it is one of the emerging competitive advantages companies must use or risk becoming irrelevant. The Business Data Lake was the first architecture to combine the scale and economics of big data, and the speed and transactional capabilities of in-memory solutions. By leveraging these capabilities, we can help our clients gain true business advantage in this data centric environment.

Note: This is the personal view of the author and does not reflect the views of Capgemini or its affiliates. Check out the original post here

Great ! Thanks for your subscription !

You will soon receive the first Content Loop Newsletter