Reporting tools have become pretty good, and companies buy them with the expectation that doing so will solve their reporting challenges. But is this really the case? Are they really up to the job? This is a conversation we frequently have with clients, and we hope it may benefit you.
If you’d like to learn more about centralized data solutions, a topic we discuss below, moe information can be found in these articles:
Reporting Tools Make Reporting Possible, Don't They?
For years companies have been buying reporting tools, like Cognos or Business Objects, thinking that they could solve their reporting problems with a single purchase. Today companies are buying newer tools, like Power BI and Tableau, with similar expectations.
So the question is, can you solve your reporting challenges with a single tool purchase? Are these tools really up to the job? This is one of the questions we hear most often in our practice, and today I’m going to answer it for you.
So here’s a question we get all the time and basically it comes down to this…
And before we can answer that question let’s take a quick look at what we mean when we say data preparation.
Most of us know reporting is hard, but many aren’t sure why. Usually it comes down to the simple fact that data in source systems is not what we’ll call “reporting ready,” and this can be for a number of reasons. One of them is that OLTP systems use a normalized data structure, but the types of queries that are needed for reports are very, very complex and require you to join a lot of data together. Therefore they work better on a denormalized data structure.
Another reason is that you usually want to report on things that don’t exist in the source system so you have to create new data elements or calculated fields before you can produce the report.
Another example is the need to filter out old or invalid data. As systems change over time, data may no longer be meaningful and you want to strip it out before it ends up in a report.
Lastly, you often need to join data not just from different tables within the same source system, but from entirely different source systems, in order to get a complete picture of the subject matter that you’re trying to report on.
For an overview of the most common data quality issues…
Net net, there’s challenging data preparation work that has to be done to produce accurate and meaningful reports. We have to do this data preparation, so the key question we need to ask is, what’s the best way?
Preparing data for reporting with Excel
The great majority of companies rely on Excel to do this work. It’s easy to use Excel, in a sense, because you already have it. And you can combine it with some sequel or access queries, and maybe PowerPoint, and you have some kind of reporting solution, voila!
But it’s not a great solution because what you’re doing is you’re essentially doing the data prep in the report itself, and this has disadvantages. For one thing, it’s a very slow process. Every time you want to create a report now you’ve got to go back and you’ve got to update the spreadsheets and the formulas and everything else, including the data in the spreadsheet that produces that report. It’s also prone to error. It’s very difficult to find and fix errors and formulas in Excel. And then lastly, the logic that you use to do your data transformation is now going to be trapped inside individual spreadsheets, making it very difficult to reuse. So there has to be a better way,k,.
Preparing data for reporting with Tableau or Power BI
Here’s what we’re being told. I took this from the Tableau website, but all of the modern reporting tool vendors say essentially the same thing. What they’re claiming is…
And if you have to do more extensive data transformation, then they support that too. You can use the tool to integrate, cleanse and organize data however necessary to make it easy on the end-user to develop reports.
In theory, the proposed solution looks something like this. You take the reporting tool, you point it directly at the source, and you do any necessary data transformation directly in the reporting tool.
Can it work? Yes, absolutely. If you can do this work with Excel you can certainly do it with a more sophisticated and more expensive tool like Tableau or Power BI.
I think a better question to ask is will it work?
Unfortunately, the reality often looks more like this. Individual departments and business analysts already have expertise in Excel. They probably already have historical data in Excel that may not be available anywhere else. And they probably have macros and formulas that do data transformation already in Excel. So when given the option to leverage all of that, or try to recreate it starting from scratch, what do they choose? Well I’ll tell you this, the number one data source feeding Tableau today is, you guessed it, Excel.
So where does that leave us? Let’s hear what the data warehousing community has to say about this. This is Margy Ross, one of the most influential thought leaders in data preparation for reporting. Her work has literally helped thousands of companies achieve success with reporting. To summarize, Margie says if you have to do data preparation to make the data useful to business people for analysis, you can centralize this effort in a data warehouse and do it once instead of trying to recreate the wheel in different departments, each doing it their own way.
What Margie is suggesting looks something like this. You build a custom centralized data solution, possibly a data warehouse, to automate the process of doing all the necessary data preparation, and then you can use one or more reporting tools of your choice to consume the already prepared data.
This is a proven solution, but it’s also not without its disadvantages. So let’s take a look at the pros and cons of each approach.
For an explanation off centralized data solutions …
Pros and cons of decentralized data preparation
If you’re going to decentralize, using only a reporting tool to do your data preparation, there are certain advantages that come with that decision:
- For one, you don’t need to get organizational buy-in or executive sponsorship, you just get the tool pointed at some data and start working.
- You also don’t need to invest in other software or hardware, just a reporting tool, that’s pretty much all you need.
- There’s also no need to gather other departments and get everyone on the same page about naming conventions or anything else. You’re just working in a silo.
- Lastly, this is definitely the fastest path to get to initial reports.
There are some disadvantages, however, and those include:
- Whatever work you do, it’s going to be in that reporting tool. In other words, you’re not going to be able to reuse that data preparation in other reporting tools very easily.
- You’re also going to be duplicating effort across departments because different departments may be using the same data and trying to do the same types of transformations.
- You’re also likely to experience, in the long run, much longer development times than you would with a centralized solution where all of that work is automated once and then reused.
- Another challenge may be lower usability depending on the expertise that you have in-house to do data preparation in the first place. This work is definitely possible in the reporting tools, but that’s not necessarily where you’re going to get the best end result.
- Lastly, and perhaps most concerning, if different folks are doing data preparation different ways then it’s very possible that you will end up with different answers to the same questions.
Pros and cons of centralized data preparation
Let’s take a look at what happens with a centralized data solution. First of all, the advantages:
- For one thing, because it’s centralized, you can use multiple reporting tools. Point as many tools as you want at it.
- Also, what you’re going to find is that it’s going to be much easier to maintain and modify the logic in a centralized data preparation solution, so this is going to lower your total development effort in the long run.
- It’s also most likely going to provide you with higher usability for report development.
- Importantly, it also eliminates the Excel burden.
- And perhaps most importantly, you’re going to end up with one version of the truth where you won’t see any longer conflicting answers to the same business questions.
But of course there’s some downside:
- Namely, it requires some kind of cross-organizational buy-in, and typically to be successful these types of efforts require true executive sponsorship.
- Also, you’re going to have to invest in additional software and hardware.
- Also, it requires a coordinated effort across the organization.
- And lastly, it may not be the fastest path to get started. In the long run it’s going to be more efficient, but in the short run it’s going to take longer to get to those first reports.
At the risk of stating the obvious I should also add that both of these approaches, in order to be done correctly, require significant expertise in both requirements and data analysis. So if you lack the skills necessary in your organization to do this work, then attempts to use either approach will likely fail without outside help.
So, should you use reporting tools for data prep?
In summary, data has to be prepared before it can be used for reporting. So what’s the best way to do that? If you want you can use reporting tools to do it — they’re a viable option. If you can do it in Excel you can certainly do it in a more expensive and sophisticated tool. But should you consider a centralized approach to the data preparation? I would say this: if you’re going to invest resources in reporting and trying to run your business with analytics, then you should absolutely consider centralizing the data preparation effort before you go down another road. I will tell you that we get a lot of calls from companies that have already purchased a reporting tool and they’re struggling to get the results they want with it. And nine times out of ten what we end up doing is coming in and helping them develop a custom centralized data preparation solution like a data warehouse. Your results may be different, so you know it depends on the circumstances, but I would seriously consider this before you go down another road.