Top 3 Data Warehousing Mistakes

Many organizations have built data warehouses successfully, and some have failed. There’s no reason at all for you to learn the hard way. I’m going to tell you what I believe are the top three reasons for data warehouse failure. Stick around.

1.  Not implementing a persistent staging area.

I’m not going to go into the technical details of exactly what is a PSA (persistent staging area). I’ll link to some content down below. But, what I will do is I’ll say this… a data warehouse is completely dependent on the systems of record that are used to load that data warehouse, and those systems of record are not under our control. They do things like archive data or simply take it offline. They overwrite values in place with no trace of the prior value, or they may even delete entire records. These things are not acceptable to a resilient data warehousing process. We need a reliable source and that reliable source is PSA.

When I have PSA in place, I can go the business and look at the requirements for that day. Then, I can build a model to fulfill those requirements. And in the future when those requirements change, no problem. I have PSA. I can fulfill those requirements to the extent possible with the data with no dependency whatsoever on the source systems.

2. Cowboy coding.

Data warehousing is all about taking thousands, tens of thousands, maybe more individual data elements from your systems of record and moving that data reliably into your data warehouse. Along the way you’re applying transformations. You might be integrating data you may be applying data cleansing logic. You may be applying business rules. If you do this using an ad hoc cowboy coding approach, you’re going to quickly end up with spaghetti code. And spaghetti code is extremely difficult to change. Oftentimes it’s faster to simply throw the code away and rebuild it from scratch than it is to try to figure out what’s going on in that code and modify it.

Well this is a situation that it’s simply going to kill your data warehouse. A data warehouse has to be able to keep pace with the changing business environment that it supports. It also has to be able to keep pace with the changes in the system record. Cowboy coding has no role to play. So, if you’re building a warehouse, established standards. Establish design patterns. And most importantly, enforce them.

3. Kitchen sink syndrome.

Now, I realize this isn’t a very technical term, but it is a very important one. When building a warehouse, we often have the tendency to load everything and the kitchen sink into the target model. And the data industry as a whole perpetuates this idea because we’re constantly hearing about how much data is being produced every second of every day. We have these exciting new technologies that are capable of crunching huge amounts of data in a very short amount of time which is awesome. But here’s the deal. The size of your warehouse does not define the success of your warehouse. The success of your warehouse should be defined only by how effectively it enables the business users to make decisions that have successful outcomes.

So really, the smaller the better. If you can implement a 100-megabyte warehouse and meet the needs of your business, then congratulations. You have just implemented a successful data warehouse. In fact, if a data warehouse is smaller it’s simpler to maintain. It costs less. It’s easier to extend. Your business users are going to thank you for a smaller data warehouse because it’s going to be simpler to navigate and it’s going to perform better than a monstrosity.

So do yourself a favor. Do your business a favor and build a warehouse that fulfills the needs today. When you have additional requirements, absolutely add additional data elements. For now, keep it simple. Your business users are going to thank you, and you’re going to help them see the forest through the trees.

Here is what I believe are the top three reasons for data warehouse failure. Really, I think I could summarize this whole article and just simply say that the largest contributor to data warehouse failure is the failure to plan for change.

The reason we’re not putting a persistent staging area in place is because we’re not considering how we’re going to support changing source systems and business requirements without the loss of data. The reason we’re using a cowboy coding approach is because we’re not thinking about how costly and time-consuming and difficult it’s going to be to modify that code, again, once our requirements change. The reason we’re taking a kitchen sink approach to data warehousing is because we’re trying to support all of today’s requirements as well as those requirements that we’re anticipating in the future, but really, we have no idea what those requirements might be.

So my opinion is, the largest contributor to data warehouse failure is the failure to plan for change. What do you think? Do you agree? Do you disagree? If you’re planning on implementing a data warehouse,  we would be happy to help.

Persistence Staging Area References:

SAP Definition of PSA –…

LeapFrogBI; “Why You Need PSA” –…