Dimension Load Using the MERGE Statement

The T-SQL MERGE statement was introduced in SQL Server 2008, and in many ways simplified common tasks such as dimension loading.  However, like most things in life there are a few gotchas to consider.  For example; the merge statement will ”…perform insert, update, or delete operations in a single statement…”, MSDN.  The keyword here is OR.  In other words each source record (based on USING clause) will result in one operation being performed.  This works great for dimension loads which include only SCD1 type attributes where we will either insert a new record or update existing records that have changed (upsert). 

Things get a little hairier when you need to update a dimension with SCD2 attributes.  In this case we need to perform two operations when a source record is matched with an existing dimension record.  The existing record needs to be expired, and a new record needs to be inserted (current).  No problem.  We can use the MERGE OUTPUT and wrap the merge statement with an INSERT statement.  By doing so we end up with a statement that will perform the two needed operations (UPDATE & INSERT).

There is another limitation that you should consider.  INSERT statements that have a nested MERGE statement such as described above will fail if the target of the INSERT statement is used on either side of a constraint.  It is very common to include such constraints when designing a dimensional model.  I’m not exactly sure if this is a bug or a behavior that is by design, but it is logged as a bug here…

Reference:  Microsoft Connect – 435031

To get around this little issue we can simply insert into a temporary table first, and use this temporary table to load the target dimension in a subsequent statement.  This does, of course, add some overhead to the load process. 

After taking all of the above into consideration and adding a few other needed features, a dimension load statement can quickly become very complex.  LeapFrogBI boils all of this complexity down to a very simple selection.  When deciding which dimension template to use consider the following definitions.

D1000 – Single Batch

Source must be limited to 1 dimension key version per load.  Only use if this is always the case.

D1001 – Multi Batch Load

Source can include multiple dimension key versions (each must have unique recordtime).

D1002 – Multi Batch Load w/ Constraints

Same as D1001 but adds handling for SCD2 attributes with constraints in place. 

D2000 – Existing Dimension

Use an existing dimension which is maintained outside of LFBI.  Result in a view being created which points to the existing dimension.

Below is an example of a D1002 dimension load merge statement with the the following functionality included:

  • Keep track of record counts for logging purposes
  • Handles hybrid dimensions (combination of SCD0, SCD1, and SCD2 attributes)
  • Accepts multiple versions of records with same dimension key (unique effective time required)
  • Overcomes insert with nested merge & constraint limitation described above

Dimension Load Using MERGE

  1. --Create load count variables
    DECLARE @Insert AS int = 0
    DECLARE @Upsert AS int = 0
    DECLARE @SCD2 AS int = 0
     
    --Create current load table variable
    --Holds dimension key and recorddate for current batch
    DECLARE @Current AS table
    (
    [ProductID] int,
    [SYS_Start] datetime2(7)
    );
     
    --Load source local temp table
    --Group by key and recorddate to ensure that duplicates are not included
    SELECT
    p227.[SYS_RecordTime] AS [SYS_Start],
    p227.[ModifiedDate] AS [ModifiedDate],
    p227.[ProductID] AS [ProductID],
    p227.[Name] AS [Name],
    p227.[ProductNumber] AS [ProductNumber],
    p227.[MakeFlag] AS [MakeFlag],
    p227.[FinishedGoodsFlag] AS [FinishedGoodsFlag],
    p227.[Color] AS [Color],
    p227.[SafetyStockLevel] AS [SafetyStockLevel],
    p227.[ReorderPoint] AS [ReorderPoint],
    p227.[StandardCost] AS [StandardCost],
    p227.[ListPrice] AS [ListPrice],
    p227.[Size] AS [Size],
    p227.[SizeUnitMeasureCode] AS [SizeUnitMeasureCode],
    p227.[WeightUnitMeasureCode] AS [WeightUnitMeasureCode],
    p227.[Weight] AS [Weight],
    p227.[DaysToManufacture] AS [DaysToManufacture],
    p227.[ProductLine] AS [ProductLine],
    p227.[Class] AS [Class],
    p227.[Style] AS [Style],
    p227.[ProductSubcategoryID] AS [ProductSubcategoryID],
    p227.[ProductModelID] AS [ProductModelID],
    p227.[SellStartDate] AS [SellStartDate],
    p227.[SellEndDate] AS [SellEndDate],
    p227.[DiscontinuedDate] AS [DiscontinuedDate],
    p227.[rowguid] AS [rowguid]
    INTO #Source
    FROM [austx01\ss2008r2].[ProtectedFields].[psa].[p_Product] AS p227
     
    --Create temp table which will hold SCD2 current version records
    --Temp table used to overcome SQL limitation related to insert with nested merge where constraints exist
    SELECT TOP 0 * INTO #SCD2Insert FROM [austx01\ss2008r2].[ProtectedFields].[mart].[d_Product2]
     
    --Load each batch.  Manual break when all records are loaded.
    WHILE 1=1
    BEGIN
     
    --Delete last load from source table variable (no deletes on first batch)
    DELETE FROM #Source
    FROM #Source s
    Inner Join @Current c ON
    s.[ProductID] = c.[ProductID] and
    s.[SYS_Start] = c.[SYS_Start]
     
    --Clear current table variable
    DELETE FROM @Current
        
    --Load current table variable  
    --Identify current batch (group by dimension key and return minimum recorddate).
    INSERT INTO @Current
    SELECT
    [ProductID] ,
    min([SYS_Start]) AS [SYS_Start]
    FROM #Source
    GROUP BY
    [ProductID]
         
    --Load dimension if there are records to process
    IF @@ROWCOUNT > 0
    BEGIN
     
    --Insert and SCD1 (Upsert)
    MERGE
    [austx01\ss2008r2].[ProtectedFields].[mart].[d_Product2] AS tgt
    USING
    (Select
    s.[SYS_Start],
    s.[ModifiedDate],
    s.[ProductID],
    s.[Name],
    s.[ProductNumber],
    s.[MakeFlag],
    s.[FinishedGoodsFlag],
    s.[Color],
    s.[SafetyStockLevel],
    s.[ReorderPoint],
    s.[StandardCost],
    s.[ListPrice],
    s.[Size],
    s.[SizeUnitMeasureCode],
    s.[WeightUnitMeasureCode],
    s.[Weight],
    s.[DaysToManufacture],
    s.[ProductLine],
    s.[Class],
    s.[Style],
    s.[ProductSubcategoryID],
    s.[ProductModelID],
    s.[SellStartDate],
    s.[SellEndDate],
    s.[DiscontinuedDate],
    s.[rowguid]
    FROM #Source s
    Inner Join @Current c
    ON
    s.[ProductID] = c.[ProductID] and
    s.[SYS_Start] = c.[SYS_Start]) AS src
    ON
    (tgt.[ProductID] = src.[ProductID] or (tgt.[ProductID] is null and src.[ProductID] is null))
    WHEN NOT MATCHED
    THEN /*Insert New Records*/
    INSERT
    ([SYS_Start],
    [SYS_End],
    [SYS_Current],
    [ModifiedDate],
    [ProductID],
    [Name],
    [ProductNumber],
    [MakeFlag],
    [FinishedGoodsFlag],
    [Color],
    [SafetyStockLevel],
    [ReorderPoint],
    [StandardCost],
    [ListPrice],
    [Size],
    [SizeUnitMeasureCode],
    [WeightUnitMeasureCode],
    [Weight],
    [DaysToManufacture],
    [ProductLine],
    [Class],
    [Style],
    [ProductSubcategoryID],
    [ProductModelID],
    [SellStartDate],
    [SellEndDate],
    [DiscontinuedDate],
    [rowguid])
    VALUES
    (src.[SYS_Start],
    '9999-12-31 23:59:59.9999999',
    'TRUE',
    src.[ModifiedDate],
    src.[ProductID],
    src.[Name],
    src.[ProductNumber],
    src.[MakeFlag],
    src.[FinishedGoodsFlag],
    src.[Color],
    src.[SafetyStockLevel],
    src.[ReorderPoint],
    src.[StandardCost],
    src.[ListPrice],
    src.[Size],
    src.[SizeUnitMeasureCode],
    src.[WeightUnitMeasureCode],
    src.[Weight],
    src.[DaysToManufacture],
    src.[ProductLine],
    src.[Class],
    src.[Style],
    src.[ProductSubcategoryID],
    src.[ProductModelID],
    src.[SellStartDate],
    src.[SellEndDate],
    src.[DiscontinuedDate],
    src.[rowguid])
    /*SCD1 Updates*/
    WHEN MATCHED AND
    tgt.[ModifiedDate] <> src.[ModifiedDate] or (tgt.[ModifiedDate] is null and src.[ModifiedDate] is not null) or (tgt.[ModifiedDate] is not null and src.[ModifiedDate] is null) or
    tgt.[Name] <> src.[Name] or (tgt.[Name] is null and src.[Name] is not null) or (tgt.[Name] is not null and src.[Name] is null) or
    tgt.[ProductNumber] <> src.[ProductNumber] or (tgt.[ProductNumber] is null and src.[ProductNumber] is not null) or (tgt.[ProductNumber] is not null and src.[ProductNumber] is null) or
    tgt.[MakeFlag] <> src.[MakeFlag] or (tgt.[MakeFlag] is null and src.[MakeFlag] is not null) or (tgt.[MakeFlag] is not null and src.[MakeFlag] is null) or
    tgt.[FinishedGoodsFlag] <> src.[FinishedGoodsFlag] or (tgt.[FinishedGoodsFlag] is null and src.[FinishedGoodsFlag] is not null) or (tgt.[FinishedGoodsFlag] is not null and src.[FinishedGoodsFlag] is null) or
    tgt.[Color] <> src.[Color] or (tgt.[Color] is null and src.[Color] is not null) or (tgt.[Color] is not null and src.[Color] is null) or
    tgt.[SafetyStockLevel] <> src.[SafetyStockLevel] or (tgt.[SafetyStockLevel] is null and src.[SafetyStockLevel] is not null) or (tgt.[SafetyStockLevel] is not null and src.[SafetyStockLevel] is null) or
    tgt.[Style] <> src.[Style] or (tgt.[Style] is null and src.[Style] is not null) or (tgt.[Style] is not null and src.[Style] is null) or
    tgt.[ProductSubcategoryID] <> src.[ProductSubcategoryID] or (tgt.[ProductSubcategoryID] is null and src.[ProductSubcategoryID] is not null) or (tgt.[ProductSubcategoryID] is not null and src.[ProductSubcategoryID] is null) or
    tgt.[ProductModelID] <> src.[ProductModelID] or (tgt.[ProductModelID] is null and src.[ProductModelID] is not null) or (tgt.[ProductModelID] is not null and src.[ProductModelID] is null) or
    tgt.[SellStartDate] <> src.[SellStartDate] or (tgt.[SellStartDate] is null and src.[SellStartDate] is not null) or (tgt.[SellStartDate] is not null and src.[SellStartDate] is null) or
    tgt.[SellEndDate] <> src.[SellEndDate] or (tgt.[SellEndDate] is null and src.[SellEndDate] is not null) or (tgt.[SellEndDate] is not null and src.[SellEndDate] is null) or
    tgt.[DiscontinuedDate] <> src.[DiscontinuedDate] or (tgt.[DiscontinuedDate] is null and src.[DiscontinuedDate] is not null) or (tgt.[DiscontinuedDate] is not null and src.[DiscontinuedDate] is null) or
    tgt.[rowguid] <> src.[rowguid] or (tgt.[rowguid] is null and src.[rowguid] is not null) or (tgt.[rowguid] is not null and src.[rowguid] is null)
    THEN
    UPDATE SET
    [SYS_LoadTime] = sysdatetime(),
    [ModifiedDate] = src.[ModifiedDate],
    [Name] = src.[Name],
    [ProductNumber] = src.[ProductNumber],
    [MakeFlag] = src.[MakeFlag],
    [FinishedGoodsFlag] = src.[FinishedGoodsFlag],
    [Color] = src.[Color],
    [SafetyStockLevel] = src.[SafetyStockLevel],
    [Style] = src.[Style],
    [ProductSubcategoryID] = src.[ProductSubcategoryID],
    [ProductModelID] = src.[ProductModelID],
    [SellStartDate] = src.[SellStartDate],
    [SellEndDate] = src.[SellEndDate],
    [DiscontinuedDate] = src.[DiscontinuedDate],
    [rowguid] = src.[rowguid]
    ;
    SELECT @Upsert += @@ROWCOUNT
    --
    /*SCD2 Updates*/
    --Load into temp table to overcome nested merge with constraint sql limitation
    INSERT INTO #SCD2Insert
    ([SYS_Start],
    [SYS_End],
    [SYS_Current],
    [ModifiedDate],
    [ProductID],
    [Name],
    [ProductNumber],
    [MakeFlag],
    [FinishedGoodsFlag],
    [Color],
    [SafetyStockLevel],
    [ReorderPoint],
    [StandardCost],
    [ListPrice],
    [Size],
    [SizeUnitMeasureCode],
    [WeightUnitMeasureCode],
    [Weight],
    [DaysToManufacture],
    [ProductLine],
    [Class],
    [Style],
    [ProductSubcategoryID],
    [ProductModelID],
    [SellStartDate],
    [SellEndDate],
    [DiscontinuedDate],
    [rowguid])
    SELECT
    [SYS_Start],
    [SYS_End],
    [SYS_Current],
    [ModifiedDate],
    [ProductID],
    [Name],
    [ProductNumber],
    [MakeFlag],
    [FinishedGoodsFlag],
    [Color],
    [SafetyStockLevel],
    [ReorderPoint],
    [StandardCost],
    [ListPrice],
    [Size],
    [SizeUnitMeasureCode],
    [WeightUnitMeasureCode],
    [Weight],
    [DaysToManufacture],
    [ProductLine],
    [Class],
    [Style],
    [ProductSubcategoryID],
    [ProductModelID],
    [SellStartDate],
    [SellEndDate],
    [DiscontinuedDate],
    [rowguid]
    FROM
    (
    MERGE
    [austx01\ss2008r2].[ProtectedFields].[mart].[d_Product2] AS tgt
    USING
    (Select
    s.[SYS_Start],
    s.[ModifiedDate],
    s.[ProductID],
    s.[Name],
    s.[ProductNumber],
    s.[MakeFlag],
    s.[FinishedGoodsFlag],
    s.[Color],
    s.[SafetyStockLevel],
    s.[ReorderPoint],
    s.[StandardCost],
    s.[ListPrice],
    s.[Size],
    s.[SizeUnitMeasureCode],
    s.[WeightUnitMeasureCode],
    s.[Weight],
    s.[DaysToManufacture],
    s.[ProductLine],
    s.[Class],
    s.[Style],
    s.[ProductSubcategoryID],
    s.[ProductModelID],
    s.[SellStartDate],
    s.[SellEndDate],
    s.[DiscontinuedDate],
    s.[rowguid]
    FROM #Source s
    Inner Join @Current c
    ON
    s.[ProductID] = c.[ProductID] and
    s.[SYS_Start] = c.[SYS_Start]) AS src
    ON
    (tgt.[ProductID] = src.[ProductID] or (tgt.[ProductID] is null and src.[ProductID] is null))
    WHEN MATCHED AND
    tgt.[SYS_Current] = 'TRUE' and
    tgt.[SYS_Start] < src.[SYS_Start] and
    (tgt.[ReorderPoint] <> src.[ReorderPoint] or (tgt.[ReorderPoint] is null and src.[ReorderPoint] is not null) or (tgt.[ReorderPoint] is not null and src.[ReorderPoint] is null) or
    tgt.[StandardCost] <> src.[StandardCost] or (tgt.[StandardCost] is null and src.[StandardCost] is not null) or (tgt.[StandardCost] is not null and src.[StandardCost] is null) or
    tgt.[ListPrice] <> src.[ListPrice] or (tgt.[ListPrice] is null and src.[ListPrice] is not null) or (tgt.[ListPrice] is not null and src.[ListPrice] is null) or
    tgt.[Size] <> src.[Size] or (tgt.[Size] is null and src.[Size] is not null) or (tgt.[Size] is not null and src.[Size] is null) or
    tgt.[SizeUnitMeasureCode] <> src.[SizeUnitMeasureCode] or (tgt.[SizeUnitMeasureCode] is null and src.[SizeUnitMeasureCode] is not null) or (tgt.[SizeUnitMeasureCode] is not null and src.[SizeUnitMeasureCode] is null) or
    tgt.[WeightUnitMeasureCode] <> src.[WeightUnitMeasureCode] or (tgt.[WeightUnitMeasureCode] is null and src.[WeightUnitMeasureCode] is not null) or (tgt.[WeightUnitMeasureCode] is not null and src.[WeightUnitMeasureCode] is null) or
    tgt.[Weight] <> src.[Weight] or (tgt.[Weight] is null and src.[Weight] is not null) or (tgt.[Weight] is not null and src.[Weight] is null) or
    tgt.[DaysToManufacture] <> src.[DaysToManufacture] or (tgt.[DaysToManufacture] is null and src.[DaysToManufacture] is not null) or (tgt.[DaysToManufacture] is not null and src.[DaysToManufacture] is null) or
    tgt.[ProductLine] <> src.[ProductLine] or (tgt.[ProductLine] is null and src.[ProductLine] is not null) or (tgt.[ProductLine] is not null and src.[ProductLine] is null) or
    tgt.[Class] <> src.[Class] or (tgt.[Class] is null and src.[Class] is not null) or (tgt.[Class] is not null and src.[Class] is null))
    THEN
    UPDATE SET
    [SYS_End] = Dateadd(nanosecond, -100, src.[SYS_Start]),
    [SYS_Current] = 'FALSE'
    OUTPUT
    $Action AS [MergeAction],
    src.[SYS_Start],
    '9999-12-31 23:59:59.9999999' as [SYS_End],
    'TRUE' as [SYS_Current],
    src.[ModifiedDate],
    src.[ProductID],
    src.[Name],
    src.[ProductNumber],
    src.[MakeFlag],
    src.[FinishedGoodsFlag],
    src.[Color],
    src.[SafetyStockLevel],
    src.[ReorderPoint],
    src.[StandardCost],
    src.[ListPrice],
    src.[Size],
    src.[SizeUnitMeasureCode],
    src.[WeightUnitMeasureCode],
    src.[Weight],
    src.[DaysToManufacture],
    src.[ProductLine],
    src.[Class],
    src.[Style],
    src.[ProductSubcategoryID],
    src.[ProductModelID],
    src.[SellStartDate],
    src.[SellEndDate],
    src.[DiscontinuedDate],
    src.[rowguid]
    ) AS MergeOutput
    WHERE
    [MergeAction] = 'Update'
    ;
     
    --Load from temp table to target (overcome nested merge with constraint SQL limitation)
    INSERT INTO [austx01\ss2008r2].[ProtectedFields].[mart].[d_Product2]
    ([SYS_Start],
    [SYS_End],
    [SYS_Current],
    [ModifiedDate],
    [ProductID],
    [Name],
    [ProductNumber],
    [MakeFlag],
    [FinishedGoodsFlag],
    [Color],
    [SafetyStockLevel],
    [ReorderPoint],
    [StandardCost],
    [ListPrice],
    [Size],
    [SizeUnitMeasureCode],
    [WeightUnitMeasureCode],
    [Weight],
    [DaysToManufacture],
    [ProductLine],
    [Class],
    [Style],
    [ProductSubcategoryID],
    [ProductModelID],
    [SellStartDate],
    [SellEndDate],
    [DiscontinuedDate],
    [rowguid])
    SELECT
    [SYS_Start],
    [SYS_End],
    [SYS_Current],
    [ModifiedDate],
    [ProductID],
    [Name],
    [ProductNumber],
    [MakeFlag],
    [FinishedGoodsFlag],
    [Color],
    [SafetyStockLevel],
    [ReorderPoint],
    [StandardCost],
    [ListPrice],
    [Size],
    [SizeUnitMeasureCode],
    [WeightUnitMeasureCode],
    [Weight],
    [DaysToManufacture],
    [ProductLine],
    [Class],
    [Style],
    [ProductSubcategoryID],
    [ProductModelID],
    [SellStartDate],
    [SellEndDate],
    [DiscontinuedDate],
    [rowguid]
    FROM #SCD2Insert
     
    SELECT @SCD2 += @@ROWCOUNT
     
    --Clear temp table prior to next batch load
    TRUNCATE TABLE #SCD2Insert
     
    END
    ELSE
    BEGIN
    --All source records are processed
    BREAK
    END
    END
    --
    SELECT @Insert AS [Insert], @Upsert AS [Upsert], @SCD2 AS [SCD2]

     

The good news for LeapFrogBI users is that you really don’t need to worry about this level of detail, but it is always good to know what is going on in case you ever need to troubleshoot issues.  In summary, MERGE is a great tool, but it also warrants special attention when used to perform dimension loads.