THE UNDERESTIMATED, CHANGE DATA CAPTURE ?
The functionality data platforms should deploy.
ENTERPRISE BI – DATA AND ANALYTICS
CHANGE DATA CAPTURE
The arguments for a data warehouse are well understood – ensure the users have a performant data source that is easy for them to work with.
Typically, data warehouses start out with a bulk update of data that simply grows over time. This is relatively easy to implement but it means massive utilisation of system resources and lengthy build times that if it fails overnight – creates massive headaches in the morning.
Organisations want real time data (or near real time data) and reliable refreshes of the data during the day, and reduced risk around the overnight build processes.
ISSUES SOLVED BY CDC
Using SQL Server Change Data Capture, let’s take only the data that’s changed in the source system – that is data that has been added, updated or deleted to the log file and only process this as incremental data throughout the build of the data warehouse.
Refreshes are completed within minutes to provide near real time analytics from the data warehouse. Multiple refreshes during the day are possible. CDC is scalable, doesn’t affect SQL server performance and is fast.
EBI SERVICE OFFERING
CDC APPLICATION - To get to a data warehouse refresh using minimal system resources and regular updates you can build yourselves in-house using existing resources with Enterprise BI mentoring and quality assurance or let us build and maintain the solution for you.
Data platform - Real Example
Problem: A mature data warehouse that was providing the analytical capability required was taking 14 hours overnight to refresh. Increasing data volumes meant a way of reducing this risk to the business as well as the need to run refreshes during the day was required.
Solution: SQL Server Change Data Capture functionality with graphical tools in SQL Server Data Tools to capture only inserted, deleted and updated rows in the source system and editing the existing ETL to handle this changed data only throughout the data warehouse build.
Result: Refresh times in les than 10 minutes permits regular refreshes during the day as well as removing the risk of failed overnight runs. Massive reduction in system use with associated reduction in cost (Cloud D/W). Also allows additional subject areas to be added to the data warehouse without risk.