Profile Multi-Set Select Statements

At times we need to select from two or more sources and stage the selected source.  As an example, you may want to collect all historical records from an archive on the initial load, and collect only new records from an online source for all subsequent executions.


There are a number of ways to accomplish this.  The challenge comes when trying to get the SQLNCLI to detect the metadata correctly.  We have found that using If Else logic is the most problematic.


So, avoid doing this:

If [MyStat] = [InitialStatValue] 
  Select from Table1 
  Select … from Table2 


Instead, use set operations like this.

Select … from Table1 Where [MyStat] = [InitialStatValue] 
Select … from Table2 Where [MyStat] <> [InitialStatValue]


While both of these queries can return the exact same results, you will have a much better chance of correctly detecting the fields and data types using the latter option.  Full joins along with field selection based on the current statistic value can work as well, but would require more scripting.


The key to making this work is the creation of a statistic (typically in PSA) with an initial value that will never appear again.  Trigger on the initial value to do you initial load from an archive or other initial source.  All subsequent loads will be automatically redirected to the online source.  This simplifies your ETL by preventing the need to have duplicate stage & tables that will need to be combined through some type of transformation later in the data flow.