I just read the August 2011 Gartner report titled Does the 21st-Century “Big Data” Warehouse Mean the End of the Enterprise Data Warehouse? (Subscription Required).
Clearly EDW’s have their shortcomings particularly in this new Big Data world. One of the reasons Hadoop and NoSQL repositories have gained in popularity is in part due to the difficulty to analyze non-structured data in a third normal form, SQL centric database which is also burdened by inherited data governance from authoring applications. This makes rationalization and extensions difficult.
The ideal enterprise data warehouse has been envisaged as a centralized repository for 25 years, but the time has come for a new type of warehouse to handle “big data.” This “logical data warehouse” demands radical realignment of practices and a hybrid architecture of repositories and services.
Gartner’s Mark Beyer and Don Feinberg do a very nice job of framing the article in the context of existing BI tools, federated and ESB technologies and how they can complement existing EDWs. The focus is on the dataprocessing or information management logic, not the physical infrastructure defining the notion of a “logical data warehouse” (LDW). They define LDWs as an information management and access engine that takes an architectural approach which de-emphasizes repositories in favor of a semantic directive to orchestrate the consolidation and sharing of information assets, as opposed to one that focuses exclusively on storing integrated datasets.
I’m personally excited by the LDW concept, particularly since my current company RainStor is nicely aligned as a repository that can hold unlimited amount of historical data (on Hadoop or other platforms) from an EDW or act as a front end to feed parsed and aggregated data into an EDW. Additionally our recent partnership announcement with Composite Software illustrates that corporations are waking up to the fact that new platforms and environments such as Hadoop can be used to supplement EDWs and support the concept of a LDW.
Where to from here? I predict that as Hadoop’s popularity grows, there will be a question around is an EDW even necessary? If raw data can be held inexpensively and in a more flexible environment like Hadoop, why bother paying what could be millions for traditional EDW infrastructure?
Replacing EDW’s with all the skills, investment and backing of mega-vendors who have major technical and political backers within large enterprise organizations does not happen overnight, or even over a couple of years. New startups and those championing Hadoop will have to prove that their products are secure, scalable and enterprise-grade. They must also overcome skills shortages and the religious zeal in which DBAs and others have their careers and skills coupled to popular databases that are the foundation for EDWs.
It is an interesting time though with SQL support for Hadoop environments available (e.g. RainStor) and others developing new offerings. Stay tuned it’s going to be fun!