Why Data Warehouse Projects are a Bad Idea

A data warehouse offers the benefits of fact-based decision making, and these days nearly everyone agrees on their value. But data warehouse project have an alarmingly high failure rate. In this video we explain why and offer a way you can succeed where others have failed.

You can find our post on data lakes here.

You can find our video series on dimensional modeling here.

Full Transcript:

If you’re thinking of starting a new data warehouse project then this video is for you. I’m Adam, with LeapFrogBI. We’ve been involved with lots of data warehouse projects over many years and I’m here to tell you that, in our opinion, they’re a bad idea. Yes, I’m a data warehouse consultant telling you that the very idea of data warehouse projects is bad. Stay tuned and in this video you’ll find out exactly why.

So you want to build a data warehouse. All right, well this is not going to be a how-to video. If you’re interested, we do have a good series on dimensional modeling. But what we’re going to talk about today is actually how to approach the process of developing a data warehouse and in particular what you need to know before you take on a data warehouse project.

Probably the most important thing you need to know about data warehouse projects is that the majority of them are not successful. So why don’t we start by looking at some data? Here the hard, cold facts. In 2005 Gartner cautioned us that more than 50% of data warehouse projects would have limited acceptance. In 2012 a Dresner survey found only 41% of data warehouse projects were considered successful. As recently as 2015 Forrester claimed 64% of analytics users had trouble relating the data that was available to the business questions they were trying to answer. In other words, data warehousing has and continues to get really bad marks.

So right about now you might be asking yourself whether or not this even applies to you. Maybe you’re not sure you need a data warehouse. Maybe you’re thinking about some type of big data solution like a data Lake on Hadoop. I want to share one last data point with you which is this email I just got today here from Pentaho. It looks like Gartner is even less optimistic about Big Data solutions than they are about traditional data warehouse implementations, so that’s something that keep in mind. Also, if you are thinking about a data Lake and you’d like to understand better the differences between a data warehouse and a data Lake, we have this article here available on our website and I’ve linked to it below, so I’d recommend that.

Ok so let’s say you decide to move forward with the BI project anyway. What are the risks? Well for one thing, you’re putting your investment at risk. If the project isn’t successful you stand to lose a substantial sum of money. Also, let’s say the project fails, well that could impact you personally, let’s say, like your career. But most importantly what happens is, if you fail then there’s a huge lost opportunity because in this environment you can’t expect to be competitive without strong analytics at your fingertips. So if your project fails then you don’t recognize the benefits of fact-based decision-making and you can’t compete.

Ok, so this brings us to the fundamental question: why do bi projects fail? Lots has been written about this so let’s just take a quick rundown of the answers that are usually given. First of all, could it be a lack of knowledge? We hear this a lot but the truth of the matter is bi is a hard science. Data warehousing has been around for 25 years and there is literally a recipe for success, here, how to build a dimensional model and implement it with complex integration scenarios for a positive outcome. We know it can be done, so it isn’t the recipe. How about the level of difficulty? Ok, these projects are complex. They involve a lot of actors; they can go wrong in a number of ways. But once again there’s nothing new here. It’s not like every time a data warehouse project is undertaken there’s a new reason for failure. In fact, it’s the same reasons again and again and again. So, if they’re well understood we should be able to navigate around them. Ok maybe the projects aren’t getting the attention they need? I’m sure in some cases that’s possible, but we also know that BI continues to be a top 5 priority or top 3 priority for most companies year after year after year. So BI projects are getting, in general, the executive buy-in and the funding that they need so that they should be successful.

So, if we have the recipe, and we know how to follow it, and we have all the necessary ingredients to make this (beautiful outcome), why is it that we keep ending up with this (a failed outcome)? The short answer is we’re a little bit confused about what it is we’re trying to make so we’re not choosing the best tool for the job. It’s like we’re trying to bake a cake but for some reason we think it’s a jello so we’re using a refrigerator instead of an oven. Except in our case we’re trying to make a BI system, and we think it’s a project when it isn’t a project at all. It’s actually a process. In fact BI systems are most successful when they’re developed using an iterative approach – but there’s a fundamental conflict here between iterating and trying to develop a BI system within the confines of a waterfall project. So there you have it. The real problem with BI projects is that they’re projects in the first place.

Here’s why agile BI is better than the traditional project approach to BI. Consider a typical scenario where a company wants to implement some type of reporting system. At the very beginning of the process very little is known. IT knows what data sources exist for systems of record, and they may or may not be aware of all the unofficial data sources that have been created by industrious business users. For their part, the business users often don’t know what they want because they haven’t had adequate access to data, so they don’t have a clear idea of what’s possible. Even if the business can define some requirements, IT won’t yet know what can and can’t be delivered because they must first understand the business requirements then compare them to the available source data to find any gaps. Effective discovery, it turns out, requires a significant back-and-forth exchange of information between the parties to arrive at the potential goals that can be supported and will provide the greatest business benefits. When you adopt a project methodology you’re forced to complete this entire discovery process up front, without the opportunity to test assumptions, and this often yields misunderstandings and missed information. The result is a technical design that will require significant amending throughout the life of the project.

With the project approach the team tries in vain to develop a project plan and estimate without the benefit of the important details that only come after significant discovery work is completed. This is one reason why bi projects so rarely go according to plan. Another reason is that the individual tasks in a bi project aren’t serial in nature – they don’t have linear dependencies. Instead, tasks often require several rounds of rework. So an individual task may start and stop several times rendering traditional project planning tools useless. After discovery is compressed into the shortest possible timeframe and the project plan is completed and approved, development begins. Here the project approach has another major drawback. After discovery, it excludes end-users from the process until the entire system has been developed and is ready for testing. By then months have passed since discovery was conducted. During this time business priorities, and therefore reporting requirements, may have shifted. Similarly, new data may have become available, and source systems may have been upgraded or migrated. So even if the team succeeds in building a working solution it may no longer be what the business needs, or at the very least it will need further modification.

For all the reasons given, even if you try to force a project methodology on this process it will end up being somewhat iterative in nature. You’ll be adapting, reworking and issuing change requests the whole time. You’ll be having discovery conversations throughout the process all the way into the final validation. You’ll be constantly tweaking the order and timing of tasks in your project plan. This is because BI is not a project, it’s a process, and you can’t effectively shoehorn it into a project box. And, of course, the best methodology you can apply to an iterative process is going to be agile.

As soon as you accept this and adopt an agile methodology, everything goes smoothly. Users are involved constantly so there are no more surprises. It’s highly adaptable to changing requirements and data without unnecessary delays because each cycle brings a new opportunity to reevaluate everything we know and adjust the goals and strategy accordingly. If you aren’t already familiar with this methodology there will be a learning curve as you adopt and fine-tune the tools of the trade. You may also run into difficulty if you don’t automate ETL development and testing in order to keep cycle times short. And in the face of pressure from stakeholders you may face temptation to revert to an objective-driven rather than time-box-driven approach. So, while it may be difficult to adopt agile BI, and it’s going to take some focus and investment, companies that make this shift and begin treating BI like a program or a function, rather than a project, are experiencing better success. I hope you’ll give it a try.