My article in August about “What the Data Warehouse Should Become In the Cloud?” led to discussions with several players in the market that ended up improving the framework and raising more important questions. Here’s what I learned.
In that article I outlined the following goals for a Cloud Data Warehouse:
- Separate Storage and Compute
- Create Workload Specific Engines on Demand
- Reconstruct the Optimizer Based on the Power of the Cloud
- Dramatically Simplify Creating and Managing High Performance Workloads
- Handle All Important Workloads
- Handle Big Data Volume and Variety
- Execute Queries Across Multiple Repositories
- Have a Scalable Data Movement and Replication Capability
- Include the Whole Product Needed for a Successful Data Warehouse
The discussion of these dimensions led me to the next stage of the analysis. The fact is that the possibilities opened up by a cloud data warehouse can be combined into many different types of products. Just as there were many visions and implementations of the data warehouse in the on-premise world, so there will be in the cloud. In future articles I will explore how specific products implement their vision. Here I’m going to take a look at the different needs a cloud data warehouse may satisfy. Product managers of cloud data warehouses are going to have to decide which of these needs are most important as they plan development of their products. Users are going to have to decide which of these requirements are most important as they make choices between products.
Do you want a hybrid architecture or hybrid technology?
One of the most interesting questions facing CIOs and CTOs today is how they plan to live in a hybrid world, one in which computing infrastructure is spread across on-premise data centers, the public cloud, and SaaS applications.
For the Cloud Data Warehouse, an crucial question is: Do you want to implement your hybrid architecture with hybrid technology or native technology? Here’s the challenge, some vendors are focusing their efforts on being the best data warehouse they can be in the cloud. The argument here is that a hybrid architecture will be best served by components that take advantage of all the cloud has to offer. This is a bet that a cloud native data warehouse will best serve the most workloads.
Other vendors are creating hybrid technology, that is technology that works the same way both on-premise and in the cloud. The argument here is that you should build one set of skills that you can use both on-premise and in the cloud.
Of course, you cannot answer this question or the others I will ask in isolation. You must consider other factors as well. For example, is the cost of learning new skills to adopt a new data warehouse worth the advantages you will gain? That is a question that depends on the nature of the workloads and amount of institutional experience at a company making such a decision.
(Note to the nerds in this space, I’m purposely not mentioning the vendors that take these positions. I want to cover the argument for and against these approaches in full in later articles.)
What are the advantages of being born in the cloud?
Building a data warehouse from scratch to take full advantage of the cloud is an exciting endeavor. It involves taking a well established use case and some challenging engineering problems and solving them based on new assumptions made possible by a new computing platform.
Part of the excitement is solving hard problems such as creating an SQL query optimizer in new ways. But to me the bigger benefit will be to make the cloud data warehouse the data warehouse we always wanted. Both cloud native and hybrid data warehouses must deliver on this promise.
For example, cloud data warehouses open up new models for sharing data by granting access rather than moving data. A related technology, 1010data, took off in the aftermath of 9/11 by providing a model for sharing financial data. Now cloud data warehouses offer the same opportunity. The challenge of course is not just offering the capability but appropriately productizing it so that it is easy to share data with appropriate security, auditing, and other safeguards.
The best cloud data warehouses will expand the idea of the data warehouse into many new dimensions.
Are new ways of working important to you?
Another key question users must answer is how important are the new dimensions that the cloud makes possible. For some users, sharing data by providing access instead of moving data will have a huge impact. For others, the quality of the SQL optimizer to allow high quality responses to machine generated SQL will be more important. For others, the ability to quickly create and then deprovision special purpose data warehouses will be vital.
To understand the right choice of a could data warehouse, you must understand what you are not getting from your current situation and also have an opinion about how much new capabilities will matter.
Is the cloud data warehouse the center of your data supply chain?
One of the most important factors in designing a hybrid architecture is understanding the role you are asking the cloud data warehouse to play. In some cases, the cloud data warehouse will take over a certain number of workloads that are better served in the cloud. In other cases, the cloud data warehouse will become the main event. The capabilities you are looking for will depend on the role you want the data warehouse to play in the short and long term.
For example, query federation may be a crucial capability or may not be that important. If the cloud data warehouse is going to be the center of your world, then the ability to execute federated queries doesn’t really matter that much. If you planning on a hybrid architecture then the ability to execute queries from any point in the ecosystem that gathers data from a variety of repositories may be important, depending on how your analysts do their work.
What level of product and ecosystem do you need?
A related question to the previous one is what level of productization and ecosystem support do you need. If you are looking for the cloud data warehouse to handle certain types of big data workloads that are better handled in the cloud and then move the distilled data back to an on-premise data warehouse, then you need a much less complete product.
But, if the cloud data warehouse is going to be the center of your data supply chain, then having attendant capabilities such as the ability to create data labs or sandboxes, data catalogs, tools for data migration and so forth become more important.
In addition, the more central the cloud data warehouse will be, the more important to have a supporting services ecosystem to provide a choice of partners and of productized integrations.
Do you want a data warehouse or an analytics application stack?
Finally, when considering the idea of a cloud data warehouse, it is important to ask the bigger question. What are we trying to accomplish? It may be that the idea of a cloud data warehouse is not the best fit to your needs if your goal is to create analytics applications. There are a variety of new combinations of technologies, many involving Spark, Hadoop, or other commercial platforms such as Datameer, Platfora, MapR, TIBCO, Qlik, Sisense, and Tableau that can create as standardized architecture that allows rapid creation of analytics applications.
If the most important task is to create analytics apps, perhaps the data warehouse is not the right choice or should be a supporting technology to an analytics application stack.
These questions are far from exhaustive, but I will use them as a starting point to assembling the right questions to ask a would be cloud data warehouse user what they are looking for.
Follow Dan Woods on Twitter:
Dan Woods is on a mission to help people find the technology they need to succeed. Users of technology should visit CITO Research, a publication where early adopters find technology that matters. Vendors should visit Evolved Media for advice about how to find the right buyers. See list of Dan’s clients on this page.
This article was written by Dan Woods from Forbes and was legally licensed through the NewsCred publisher network.