Friday 9 December 2011

How to filter a fuzzy lookup reference table in SSIS

SQL Server Integration Services uses  the fuzzy lookup data flow transformation component to find near matches between a data source and a reference table. The data source can be a SQL Server table, in which case you can define a SELECT query with a WHERE clause that only returns those records you need. But the reference table does not provide this sort of filtering.

Saturday 3 December 2011

No drivers for Canon 20D and Windows7 x64

My previous OS was Vista and I couldn't connect my Canon 20D camera without going through Picasa 3. So when I upgraded to Windows7 I was pleased to see that Canon said no driver was required. However, they do not say that your communication settings must be changed before the OS will recognise your device.

Friday 2 December 2011

Handling NULL values in SSIS fuzzy lookup

I'm using the SSIS Fuzzy Lookup data flow transformation in my current ETL package to find account name matches between source and target. So I compare the similarity of a number of fields using a conditional split transformation like:

(_Similarity > .70 || _Confidence > .70)
|| (_Similarity_CleanedCompanyName > .70 && _Similarity_CleanedABN > .70 && !ISNULL(CleanedABN))
|| (_Similarity_TradingNames__c > .70 && !ISNULL(TradingNames__c) && _Similarity_CleanedABN > .70 && !ISNULL(CleanedABN))

But SSIS gives a score of 1 to fields that are NULL in both the source and reference tables. E.g. if the phone number is null in the source table and the reference table phone number is also null then SSIS marks this as an exact match.

Thursday 1 December 2011

SQL Server intellisense disabled after windows update

Last week my corporate network administrator pushed out a couple of windows updates for SQL Server 2008 R2 and Visual Studio 2010. After installing them I discovered that my intellisense in SQL Server management studio had disappeared. No matter what I tried, I couldn't re-enable it.

The intellitrace in VS2010 SP1 causes an issue with the intellisense in SSMS 2008 R2 and Microsoft are aware of the problem and have issued a hotfix. Scott Guthrie blogged about the issue but doesn't mention the fix, and Microsoft don't make it easy to get your hands on it either.

http://support.microsoft.com/kb/2531482

Add a unique constraint to a nullable column

Enforcing uniqueness is a simple matter of adding a UNIQUE constraint with the following code:

ALTER TABLE [dbo].[TableName]
ADD CONSTRAINT UNQ__TableName__ColumnName UNIQUE ([ColumnName]);


But if your column includes null values, then multiple nulls are considered non-unique and they are not allowed.

Tuesday 29 November 2011

Change encrypted password in data loader CLI

One of the processes I run regularly is a batch file calling Apex Data Loader to export a number of objects from my Salesforce org. I use these files to populate a fresh backup of the Salesforce data, and when my password changed recently I needed to generate a new encrypted string to use in the proces-conf.xml file. However, even though the encryption process is fairly simple, the documentation provided isn't clear.

Wednesday 9 November 2011

Salesforce Date fields are really DateTime fields

Migrating data into Salesforce takes up a lot of my time, and I use Apex Data Loader to carry out most of the heavy work. Usually I test migrate into my client's sandbox and then migrate into their production instance. However, one recent client insisted that their own support team carry out the final migration into production. So I handed over the csv files and my migration notes, but they struck a number of errors with environmental differences, especially dates.

Add a LastModified column to SQL Server table

I had a job recently where I needed to find the delta from a number of Microsoft Access database tables. The db designer had made a lot of interesting architectural decisions, but the biggest problem I had was finding which records had been touched in the past few months. It is easy enough to detect the changed tables, but at a field level I had to run a diff of the old and new copies and extract the changes using DiffMerge.

UPDATE table1 from table 2

I'm using a SQL Server staging area for my current ETL work and I find it easiest to update cleaned IDs using temporary tables that can be created and deleted as necessary. The syntax is simple and I use it in stored procs for persistent matched data and ad hoc queries for manual matching.

NOT IN sub-select query with NULL values

I was wondering why my NOT IN query was failing to return the expected amount of records, so I googled and found the 'SQL with Manoj' post at wordpress. If there are NULL values returned by your sub-select query then you need to handle the NULL values explicitly.