I have often seen the familiar tension between IT and the business in the struggle to wrangle the data and get some value from it. Too often I have seen IT prevent the business from getting access to the data that is rightly theirs using the justification that it is in the wrong format, they wouldn’t understand the data model, they have to protect the data etc. Many legacy investments and reference architectures are set up in this way to perpetuate this model. The business can only get access to their data through expensive and cumbersome data warehouses, operational reporting systems or limited reports. They are then reduced to getting extracts of data any way they can and having to create a whole “Grey IT” world to get their data.
The intended goal of IT to protect the data by having highly structured and managed analytic systems has, in fact, had the opposite result. So how do we solve this? I think this is the real sweet spot for a data lake. By bringing all the data or at the very least metadata into one central hub and allowing business users access to data as close to its raw form as possible we can get to a new operating model whereby the business no longer is dependent on IT to even get access to their data and insights.
They can quickly prototype ideas, explore hypotheses and only once repeatable KPI’s have been defined can they be built into the data warehouse along a model of ‘design by using’ as Gartner call it in their report “Organizing Your Teams for Modern Data and Analytics Deployment” published in March 2017. Does this mean the data warehouse is dead? No but I think it becomes needed only for very specific requirements and highly structured reports.