Use CHECKSUM Number to Make the LookUp Transformation Simple
Lookup is one of the frequently used transformations in Informatica PowerCenter. This transformation is considered to be relatively simple, but at times this can become complex because of the amount of data to be cached, number of columns used in the lookup condition, checks on columns fetched from lookup table etc... In this article lets see how we can take care of couple of these issues using CHECKSUM number.
What is Checksum
A checksum is a value used to verify the integrity of a file or a data. Checksums are typically used to compare two sets of data to make sure they are the same. If the checksums don't match those of the original file or data, the data may have been altered.Informatica Implementation
Lets consider one of the complex lookup scenario, which we use in SCD Type 2 mapping for our demonstration purpose. You can check out the complete SCD Type 2 Implementation from one of our previous article.
In normal SCD Type 2 mapping we use a Lookup to fetch all the columns (attributes) from the dimension tables and compare the attributes with the records from the source. If there is a changed value in any of the attributes, the record will be set for update else insert. If you are loading a dimension with hundreds of attribute, you can imagine the complexity involved in the lookup in terms of number of columns in the lookup, lookup cache size and comparison of the attributes.
Lets consider the Customer Dimension load, we will be identifying the update based on any change in any of the columns CUST_NAME, ADDRESS1, ADDRESS2, CITY, STATE or ZIP else the record will be inserted into the Dimension Table.
In normal SCD Type 2 mapping we use a Lookup to fetch all the columns (attributes) from the dimension tables and compare the attributes with the records from the source. If there is a changed value in any of the attributes, the record will be set for update else insert. If you are loading a dimension with hundreds of attribute, you can imagine the complexity involved in the lookup in terms of number of columns in the lookup, lookup cache size and comparison of the attributes.
Lets consider the Customer Dimension load, we will be identifying the update based on any change in any of the columns CUST_NAME, ADDRESS1, ADDRESS2, CITY, STATE or ZIP else the record will be inserted into the Dimension Table.
Datamodel needs
We need to create an additional database table column to store the CHECKSUM number. Checksum number created by Informatica is 32 character hexadecimal value. So add the column below to the Dimension Table.
- CHK_SUM_NB VARCHAR2(32)
This Column in the target table will be populated for every record inserted into the Dimension Table, using the function provided in the next section.
Checksum number generation
Informatica provides the function MD5() for Checksum generation. This function returns a unique 32 character hexadecimal value. Below expression generates the checksum number for the Customer Dimension.
- MD5(CUST_NAME || ADDRESS1 || ADDRESS2 || CITY || STATE || TO_CHAR(ZIP))
Note : This value needs to be populated for every record inserted into the Dimension Table by the mapping.
Lookup Transformation
Now we can create a simple lookup transformation to fetch the checksum number (CHK_SUM_NB, which is already stored in the Dimension table T_DIM_CUST) and the surrogate key (CUST_KEY) from the dimension table.
You can see the lookup is really simple, with one input and two output ports.
Now we just need a simple expression to identify the records for INSERT and UPDATE as below
IIF(ISNULL(CUST_KEY),'INSERT',
IIF(NOT ISNULL(CUST_KEY) AND CHK_SUM_NB <> NEW_CHK_SUM_NB, 'UPDATE'))
Note : NEW_CHK_SUM_NB is the Checksum for the incoming record from source table. Calculate the Checksum in an Expression Transformation using MD5() function. MD5(CUST_NAME || ADDRESS1 || ADDRESS2 || CITY || STATE || TO_CHAR(ZIP))
Design Advantages
- Lookup Transformation made simple and the mapping as well.
- Less number of columns in the lookup cache, hence improved lookup performance.
- Avoid complex expression logic to make the mapping simple.
Hope you guys enjoyed this and you will be able to implement this in your upcoming projects. Please leave us a comment in case you have any questions of difficulties implementing this.