Change Data Capture (CDC) Implementation Using CHECKSUM Number
Typically we use a date column or a flag column to identify the change record for change data capture implementation. But there can be scenarios where you source do not have any columns to identify the changed records, especially when working with legacy systems. Today in this article lets see how to implement Change Data Capture or CDC for such scenarios using checksum number.
What is Checksum
A checksum is a value used to verify the integrity of a file or a data. Checksums are typically used to compare two sets of data to make sure they are the same. If the checksums don't match those of the original file or data, the data may have been altered.How to find Checksum
Informatica provides the function MD5() for Checksum generation. This function returns a unique 32-character string of hexadecimal digits 0-9 and a-f.
Syntax : MD5( value )
Return : Unique 32-character string of hexadecimal digits 0-9 and a-f.
Now create a LookUp transformation to get CUST_ID and CHK_SUM_NB from the target table. Use the LookUp Condition IN_CUST_ID = CUST_ID
Informatica Implementation
Design Scenario
Lets consider a workflow to load CUSTOMER table from a flat file, which is generated from legacy mainframe system. Any new customer information will be inserted and any changed customer information will be updated else rejected to bad file. Note that the source file do not have any indicator to identify the changed record.
Datamodel Needs
Apart from the customer attributes columns, we need to create an additional database table column to store the CHECKSUM number. It is 32 character hexadecimal value. So add the column to the CUSTOMER table and below is the target table definition.
Informatica Mapping
Lets start with the mapping. Create CHECKSUM using MD5() function in the expression as sown in below image.
- MD5(CUST_NAME || ADDRESS1 || ADDRESS2 || CITY || STATE || TO_CHAR(ZIP))
Now create a LookUp transformation to get CUST_ID and CHK_SUM_NB from the target table. Use the LookUp Condition IN_CUST_ID = CUST_ID
Now find out the records for INSERT, UPDATE using the columns from the lookUp Transformation with the expressions below.
INSERT : ISNULL(LKP_CUST_ID)
If the Customer is not existing in the target table, set the record for INSERT.
UPDATE : NOT ISNULL(LKP_CUST_ID) AND CHK_SUM_NB <> LKP_CHK_SUM_NB
If the Customer is existing in the target table and the Checksum of the source record is different from the lookup, set the record for UPDATE.
REJECT : Any other records not satisfying the above conditions will be passed on to the DEFAULT group and ignored.
Now add the Router Transformation with two groups with the expressions explained above.
INSERT : ISNULL(LKP_CUST_ID)
If the Customer is not existing in the target table, set the record for INSERT.
UPDATE : NOT ISNULL(LKP_CUST_ID) AND CHK_SUM_NB <> LKP_CHK_SUM_NB
If the Customer is existing in the target table and the Checksum of the source record is different from the lookup, set the record for UPDATE.
REJECT : Any other records not satisfying the above conditions will be passed on to the DEFAULT group and ignored.
Now add the Router Transformation with two groups with the expressions explained above.
Note : CHK_SUM_NB is Inserted and Updated into the target table, this value is used by the lookup to determine the insert and update.
All you left now is to create and run the workflow. Hope you enjoyed this tutorial. Please let me know if you have any questions or comments.