In a recent blog post I noted that the average enterprise data center saves all data forever. For many of the storage administrators at these data centers, that’s petabytes of data growing at the compounded annual growth rate (CAGR) of 30% and more. Yes, the practice becomes hugely expensive and therefore seems irrational. And yes over the long term we can all agree that it’s unsustainable. But as I said in my previous post, it’s been this way for years—in fact, decades. Save-everything-forever is the default data retention policy and will be for the foreseeable future unless the enterprise rethinks one key aspect of this dilemma: data ownership.
Since the early days of the mainframe, enterprise IT systems have created and collected data. Because the enterprise bought the systems and employed the people to run them, its executives can rightly assume that their data is the property of the enterprise. Consequently, considerable sums of money are spent on storing and securing this data, increasingly regarded as life-blood. Luckily, the advancement of object-based storage systems will lessen the financial burden.
But suppose for a minute that it doesn’t have to be this way. Data ownership is assumed and never questioned. However, in the light of unsustainable storage practices, it is worth asking a pivotal question: Does the enterprise have to own data in order to derive value from it? If you don’t have to own it, you don’t have to save it forever. That’s someone else’s burden. And this can be done by merely accessing someone else’s data.
As the era of Big Data advances, data stores that that either charge on a fee-for-use basis or offer free access to data are becoming increasingly common. Some examples:
Gnip (ping spelled backward), now owned by Twitter, offers access to social media and messaging data feeds—in real time if you want—from sources that include Facebook, Twitter, Instagram, YouTube, WordPress, etc. Converging these sources with customer data is a practice now called customer sentiment analysis, a hot topic in marketing departments. Gnip shows that the enterprise doesn’t have to instantiate systems that produce data in order to derive significant value from it. Gnip and other sources like Gnip offer multiple ways to acquire it. Save it “on prem” if you must.
And there’s an already long and still growing list of sources in what are now known as Data Marketplaces. Big names in these data malls include Amazon Public Data Sets, IBM ManyEyes, Google Public Data Explorer, and Microsoft Azure Data Marketplace. Early startups in this space are Factual and Infochimps. Vendors here typically also offer analytic services as well.
Finally, one could spend an entire day searching-out all of the free sources of data now available on the web. Some offer many categories of data—Freebase for example. There are subject-specific sources like NOAA for weather data and NASA for data generated by satellites and space probes. There are industry-specific sources as well. Some free sources will allow commercial use of data under the assumption that any form of data propagation is goodness.
Enterprise IT is moving in the direction of ownership of nothing. In some ways the trend is a throwback to the mainframe era where 60% of mainframe users leased the iron in their data centers from a financial institution that actually owned it. However, making the decision not to own infrastructure (think cloud services providers) is easier than the one that reaches the conclusion to not own data. Regulatory authorities assume data ownership of the entities they monitor—even when it’s in the cloud. And then there’s the whole data-is-life-blood thing. All I’m really suggesting here is a rethink before investing in more stuff to produce the data you think you need to own. It may already be “out there.” If it is, you’ll have less data baggage to carry around with you in perpetuity. Let someone else do that.