Rethinking Data Movement for Real-Time Business

TDWI – What Works: Volume 17, May 2004

By Ramon Chen, Sr. Director of Product Marketing, GoldenGate, and Tim Mueting, Director of Product Management, BMC Software

Of all the changes wrought by the Internet over the past decade, one of the most significant is the acceleration in the pace of business. The phrase “real-time” has become something of a cliché in recent years. But for many organizations, getting information and services to decision makers, partners, and customers when they want it and where they want it has simply become a reality of doing business. For instance, individuals outside the organization have come to expect instantaneous online shopping, banking, and customer service. Likewise, constituencies within the organization are demanding immediate access to information on sales metrics, supply chain, operations, and financials. For IT managers, these new business drivers are directly translating into a need to rethink traditional approaches to systems architecture. And in many cases, the most logical and effective way to meet real-time demands inside and outside the organization is through active, or real-time data warehousing.

   

DATA WAREHOUSING YESTERDAY AND TODAY
In the 1990s, the growth of enterprise applications like ERP and CRM prompted the need for data warehouses to address issues such as degrading system performance, escalating reporting requirements, and the limitations of the installed infrastructure. By implementing data warehouses comprising separate servers and databases, organizations could run reporting applications and perform queries without negatively affecting the transaction processing system. As demand for real-time information and services grew in the late 1990s and early 2000s, many IT executives found that their data warehousing infrastructures were unable to keep up with the faster pace of business. In particular, traditional methods of capturing data from production databases and moving it to the data warehouse were not able to cope with the need for speed and high-volume operation.

Within many traditional data warehouse environments, moving data from the production database to the data warehouse is accomplished using extraction, transformation, and load (ETL) utilities. These tools are reliable and capable of performing many required data movement tasks. However, nearly all ETL tools are batch-oriented, which means there is an inherent lag between the time data changes on the production system and when it is available for query by business users. An alternative approach to populating the data warehouse is to leverage the expertise of DBAs or programmers to complement ETL tools using custom-coded scripts, but this can be a costly layer to add to an infrastructure. Specialized coding also delays implementation of new features that users require, which can reduce system flexibility and decrease access to timely data. In addition, custom scripts are hard to maintain as the infrastructure changes.

THE NEED FOR SPEED, VOLUME, AND DIVERSITY
For many IT executives, it became clear that data movement was key to real-time operation within a data warehouse environment and that the solution to this problem would have to be capable of capturing, transforming, and moving massive amounts of data across the enterprise at sub-second speeds. Ideally, this new solution would be able to keep two or more databases perfectly synchronized at all times, regardless of their make, model, or location, or the data loads involved. In essence, real-time data warehousing would require a data movement solution with three basic capabilities:

 
Speed Throughput of thousands of transactions per second while using a minimum of network bandwidth.
Volume Multi-record transfers of massive amounts of data to and from disk as well as over the network.
Diversity Compatibility with all major operating systems and database environments.

Leave a Reply