Top 3 Reasons to Always use Surrogate Keys in Data Warehousing

There are many things about data warehousing that are subjective.  The use of surrogate keys is clearly not one of them.  Let’s start with a quick refresher.


Surrogate Key:  A system generated value which uniquely identifies a record.  Surrogates keys have no business meaning.


Natural Key:  A value that represents a real world object.  Aka – Business Key.  Example; SSN is a natural key representing a person.


In dimensional models we have fact tables which contain foreign keys pointing to all related dimensions’ surrogate keys.  Therefore, the decision to use surrogates will affect the primary key of each dimension as well as the foreign key values in fact tables.  So, why do we always use surrogates as the primary key in dimensions?


1. Slow Changing Dimensions


It is often a requirement to track historical values of dimension records.  For example; a customer dimension may contain a zip code field.  This dimension has a natural key (aka business key) named CustomerID.  To keep track of the customer zip code throughout time we need to include multiple versions of this customer record in our customer dimension; one for each zip code value.  Using the CustomerID as our dimension primary key is simply not an option since it will not uniquely identify a dimension record.


2.  Changing Source Systems


Using natural keys tightly ties a data warehouse’s integrity to the stability of the source system.  While it is not common for a source system to change natural keys it does happen.  Specifically, when data is archived or when no available values remain, OLTP system will do a variety of things to handle the situation.  Probably more common is the need to source a dimension from multiple disparate systems.  In this case there is a chance that the same key value is used in both source systems making it no longer a candidate for our dimension’s primary key.


3.  Performance


Referring back to the two design goals of a dimensional model per Kimball; create an easy to understand data structure while providing fast query response times.  Fast response times often mean efficient joins between facts and dimensions.  The last thing we want to do is join on text or other meaningful values as this is not going to provide the best response times.  By using surrogate keys we can select a data type, integer, which offers high performance.


For these three reasons alone it is simply not suitable to use anything other than a surrogate key as the primary key of dimensions.  The good news for LeapFrogBI users is that this is one of many best practices that is automated requiring no developer time to implement.


What We Do

We provide SMBs with custom data analytics services.

Built to Spec
We design custom solutions.

Low Cost per User
Empower as many report users as you like.

You Own It
Everything from the data solution to reports is yours to keep.

No upfront costs and no long-term commitment.

Available 24/7
We monitor and maintain everything 7 days a week.

Unlimited access to a team of data experts.

Find out how 
our clients
are getting

10x ROIwithin 6 months


Become Part of the LeapFrogBI Community

Join our newsletter to receive our latest insights and articles on automated, customized reporting in the healthcare industry.


Have any questions? Reach out to us, we would be happy to answer.