Change Data Capture (CDC) Made Easy Using Mapping Variables
At times we may need to implement Change Data Capture for small data integration projects which includes just couple of workflows. Introducing a Change Data Capture framework for such project is not a recommended way to handle this, just because of the efforts required to build the framework may not be justified. Here in this article lets discuss about a simple, easy approach handle Change Data Capture.
We will be using Informatica Mapping Variables to building our Change Data Capture logic. Before even we talk about the implementation, lets understand the Mapping Variable
Informatica Mapping Variable
What is Mapping Variable
These are variables created in PowerCenter Designer, which you can use in any expression in a mapping, and you can also use the mapping variables in a source qualifier filter, user-defined join, or extract override, and in the Expression Editor of reusable transformations.Mapping Variable Starting Value
Mapping variable can take the starting value from
- Parameter file
- Pre-session variable assignment
- Value saved in the repository
- Initial value
- Default Value
The Integration Service looks for the start value in the order mentioned above. Value of the mapping variable can be changed with in the session using an expression and the final value of the variable will be saved into the repository. The saved value from the repository is retrieved in the next session run and used as the session start value.
Setting Mapping Variable Value
You can change the mapping variable value with in the mapping or session using the Set Function. We need to use the set function based on the Aggregation Type of the variable. Aggregation Type of the variable can be set when the variable is declared in the mapping.
- SetMaxVariable. Sets the variable to the maximum value of a group of values. To use the SetMaxVariable with a mapping variable, the aggregation type of the mapping variable must be set to Max.
- SetMinVariable. Sets the variable to the minimum value of a group of values. use the SetMinVariable with a mapping variable, the aggregation type of the mapping variable must be set to Min.
- SetCountVariable. Increments the variable value by one. In other words, it adds one to the variable value when a row is marked for insertion, and subtracts one when the row is marked for deletion. To use the SetCountVariable with a mapping variable, the aggregation type of the mapping variable must be set to Count.
- SetVariable. Sets the variable to the configured value. At the end of a session, it compares the final current value of the variable to the start value of the variable. Based on the aggregate type of the variable, it saves a final value to the repository.
Change Data Capture Implementation
Now we understand the mapping variables, lets go ahead and start building our mapping with Change Data Capture.
Here we are going to implement Change Data Capture for CUSTOMER data load. We need to load any new customer or changed customers data to a flat file. Since the column UPDATE_TS value changes for any new or updated customer record, we will be able to find the new or changed customer records using UPDATE_TS column.
As the first step lets start the mapping and create a mapping variable as shown in below image.
- $$M_DATA_END_TIME as Date/Time
Now bring in the source and source qualified to the mapping designer workspace. Open the source qualifier and give the filter condition to get the latest data from the source as shown below.
- STG_CUSTOMER_MASTER.UPDATE_TS > CONVERT(DATETIME,'$$M_DATA_END_TIME')
Note : This filter condition will make sure that, latest data is pulled from the source table each and every time. Latest value for the variable $M_DATA_END_TIME is retrieved from the repository every time the session is run.
Now map the column UPDATE_TS to an expression transformation and create a variable expression as below.
- SETMAXVARIABLE($M_DATA_END_TIME,UPDATE_TS)
Note : This expression will make sure that, latest value from the the column UPDATE_TS is stored into the repository after the successful completion of the session run.
Now you can map all the remaining columns to the down stream transformation and complete all other transformation required in the mapping.
That’s all you need to configure Change Data Capture, Now create your workflow and run the workflow.
Once you look into the session log file you can see the mapping variable value is retrieved from the repository and used in the source SQL, just like shown in the image below.
You can look at the mapping variable value stored in the repository, from workflow manager. Choose the session from the workspace, right click and select 'View Persistent Value'. You get the mapping variable in a pop up window, like shown below.