Its 2019 – Isn’t the Data Warehouse dead?
With the advent of self-service data discovery tools and big data platforms, the bold announcements of “the data warehouse is dead” started to ring out around the world.
So why, you would ask, are we building a new startup that is focussed on allowing analysts to create data warehouses via self-service?
Are we looking to provide a solution that is focussed on the long-tail of companies still investing in a “legacy” data warehouse approach?
We believe the data warehouse is still a core component of any organisations data management strategy, albeit a modern and agile data warehouse. And we are not alone, there were a number of articles in 2018 that suggest the same thing.
The future of data warehousing
In the Eckerson Groups 2018 whitepaper The Future of Data Warehousing they state:
Despite declarations by pundits, data warehousing is not dead. Recent surveys show that more than 60% of companies are operating between two and five data warehouses today. Fewer than 10% have only one data warehouse or none at all. Nearly one-third of respondents work in an organization with six or more data warehouses. Although the vision from the past generation of BI and data warehousing—one data warehouse that serves as a single version of the truth—has not been realized, it is clear that data warehousing continues to provide value to these organizations.”
Data warehouses meet the information needs of people and continue to provide value. Many people use them, depend on them, and don’t want them to be replaced with a data lake. Data lakes serve analytics and big data needs well. They offer a rich source of data for data scientists and self-service data consumers. But not all data and information workers want to become self-service consumers.
We are surprised how often we hear the new breed of data visualisation and data wrangling solution providers infer that senior executives or front-line operational staff want to spend hours wrangling with data or creating their own dashboards from scratch. While we agree these capabilities need to be democratised throughout an organisation, rather than being tightly held within the IT group, there are still roles within those organisations which just want to access a single consistent answer to the same common question.
Why do we still need a data warehouse?
Self-service analytics does not replace data warehousing; it extends and complements. Published data(warehousing) and ad hoc data (self-service) work together to meet a broad spectrum of information needs.
People continue to need well-integrated, systematically cleansed, easy to access data that includes time-variant history.
Again we couldn’t say it better ourselves, the core reason we built the data warehouses of old, still exists today.
Many people—perhaps the majority—continue to need well-integrated, systematically cleansed, easy to access relational data that includes a large body of time-variant history. They want to meet routine information needs with data that is prepared and published with those needs in mind. These people are best served with data warehousing that provides:
- Subject-oriented data that is organized around major business subjects such as customer, product, employee,etc.,and that is readily mapped to business semantics.
- Integrated data where disparity among data sources is resolved to provide a consistent, reliable, and trusted source of data for reporting and analysis.
- Time-variant history that is captured at uniform time intervals, is kept beyond its lifespan in operational source systems, and is organized to support trending and time-series analysis.
- Non-volatile data where history is retained without revision, supporting reliable and repeatable reporting and analysis of past business events.
- Cleansed data transformed to mitigate the risks inherent in data quality defects.
- Published data that is delivered on a regular schedule and that is known, repeatable, and ready for use.”
What about the data lake?
We agree with Eckerson when they say:
The data lake is not a silver bullet. It is a valuable source of data for many analytics use cases, but it is not designed to be subject-oriented, integrated, non-volatile, and time-variant—not well positioned to deliver the value of data warehousing described earlier.
The ideal solution is a data lake and data warehousing working together in a data management ecosystem that provides the right data for the full The Future of Data Warehousing spectrum of use cases from basic reporting to advanced analytics and data science. Achieving that ideal solution requires architectural, operational, and technological modernization of legacy data warehouses.
AgileData relies on the landing of your data from your systems of record into our history layer (a data lake-ish pattern). From there we enable your analysts to define the core business events and core business concepts that represent your business plus create business rules that map the data from your lake to those events and concepts.
Modernise your data warehouse with AgileData
AgileData helps you modernise your data warehouse, not by lifting and shifting the masses of technical debt you currently have embedded in your current ETL laden processes or by “cloud washing” your current ETL based tools, but by letting you quickly and incrementally create a new agile data warehouse, based on your organisations core business processes at a speed that would amaze you.
Find out more
As well as reading Eckerson Groups 2018 whitepaper The Future of Data Warehousing you can also listen to a podcast by its author Dave Wells called Data Lakes Are Cool, But You Still Need A Data Warehouse.