Friday 9 December 2011

How to filter a fuzzy lookup reference table in SSIS

SQL Server Integration Services uses  the fuzzy lookup data flow transformation component to find near matches between a data source and a reference table. The data source can be a SQL Server table, in which case you can define a SELECT query with a WHERE clause that only returns those records you need. But the reference table does not provide this sort of filtering.

In my current work I looking for and matching duplicates in two separate databases on a regular basis, so I need to filter both the source and the reference table so that they only include records that have not been matched previously. One simple way to achieve this in SSIS is to create a new table in the reference database that only includes unmatched records:

SELECT *
INTO DBName.dbo.UnmatchedRecords
FROM DBName.dbo.AllRecords
WHERE AllRecords.MatchIdField IS NULL

Then when I select the reference table in the fuzzy lookup component, I can choose the UnmatchedRecords table, instead of the original base table.

Of course, you need to ensure that the Unmatched table is kept up to date when you add new Match Ids to the original table, but this work-around ensures that the result set output by SSIS does not include already matched records.

No comments:

Post a Comment