How To Estimate the Effort to Build Reporting Solutions
I want to talk about a common challenge that we run into and that is estimating the effort required to build Reportopia solutions. You know, it’s not too hard to understand that anybody who is building a solution is going to want to know what the cost is associated with building that solution. It’s critical to have some understanding of the cost because, we’re trying to add value. In order to determine if we’re going to be able to add value or at least have a chance of adding value, we need to have some understanding of what the effort is going to be, the cost is going to be in comparison to the value that we’re adding.
If that comes out to a positive number, then we have something that is going to aid the business. If it doesn’t, well, then we don’t. So clearly estimating the effort in some manner is important. I’m going to go through some of the things that I think are important to consider when we’re talking about estimates. Now the request to get an estimate comes in a number of different ways. We might hear, how quickly can this be done? What’s the effort required? Can this be done today?
There are different ways that these estimates come across. Sometimes people will simply say, “Can I get an estimate for this unit of work?” Before I go into the details, I want to say that at LeapFrogBI, we don’t look at this as a project. “This” being the building of an automated reporting solution. It’s really a business function, but we certainly understand that individual tasks and features that we’re trying to create can be looked at as a unit of work.
We can certainly do our best at trying to estimate those individual units of work. We all know, estimates are just that, they’re estimates. If you bring your automobile to the mechanic and you drive in thinking that you need a break job, they’re going to give you an estimate on a break job. That estimate comes with assumptions. For example, someone having break problems assumes they need to change their brake pads. That may be what they go into the mechanics saying. “I need some new brake pads.” The mechanic will respond with an estimate for labor and cost of parts. Then the mechanic starts working and investigating the situation. They might find the person needs new calipers, new drums, or whatever it may be. There are things that are simply unknown, both to the person making the request and the person doing the estimate. They could make the estimate invalid at the end of the day. The same is true when we talk about building business intelligence solutions.
We are all doing our best to provide good faith estimates, both from our assumptions going into an estimate and the actual estimate itself. We’re basing that on empirical data that says typically it takes X amount of time to do this process. If everything is known at the beginning and accurate in our scope, well then, this project should also take X amount of time. Estimates are estimates. They could be off by 5% if we’re lucky or they could be off by 50%, 100%, or more if we’re not lucky. Sometimes we simply don’t know all of the variables required to get to an accurate estimate.
I will preface this and say that at LeapFrogBI we don’t do fixed bid projects. But if we were to do so, the first step we would have to go through is a thorough research process. We would have to not only estimate based on empirical knowledge of how long things take, but we must vet out each of the things that need to be done along the way. Basically, we’d have to have a project together to get the estimate because we can’t give an accurate estimate, especially a fixed bid if we don’t know all the details.
How does the system of record behave? Are the data elements available required to meet the request? There’s a lot of things that are unknown and you must first go through if we’re going to create a fixed bid and understand all those things. The reason we don’t do fixed bidding is that process is simply a waste of time. You’re going to spend way more money at the end of the day going through this estimating effort if you do it in the way I’m talking about as opposed to, frankly, having a little bit of a leap of faith. It does take some mutual trust between all the parties involved. But if you get out there, get the job done, take on little task at a time in a typical agile manner, things seem to run a lot more smoothly.
Estimating Feature Requests
I want to get into some examples of actual feature requests that we get regularly. I’m sure all of you either have had these requests or you will have these requests if you’re building solutions. I’m going to talk about the effort required to deliver on those requests. Hopefully by the end, I can give you some understanding of the different types of variables involved in estimating these different feature requests.
The Four Layers in Building a Report
One more thing I want to do before I jump into this is talk quickly about the layers in these solutions. We are going to get into a little bit of technical language here, but it is required because understanding effort means you’ve got to understand what needs to be done. It doesn’t do much good to say it’s going to take 10 hours to do something versus 1,000 hours. If you begin questioning that number and you’re thinking, why does it take so long to do A versus B. It’s important to understand the layers and it’s important to understand the testing process.
Most of these solutions are going to have at least three, maybe four layers in them. I’m talking about high level architectural layers. First, you’ve got a system of record. This is going to be your business application. In most cases, this is where we are collecting data. It is where data originates and this is going to become our data source.
Secondly, you will have your data warehouse layer. Now this is, of course, assuming that we’re building a data solution that includes the data warehouse. This doesn’t necessarily have to be the case. I’m not going to go through all the types of solutions, but let’s just assume we’re in a data warehousing scenario. We’ll pull data out of your systems of record, which is layer one, into a data warehouse, which is layer two.
The third layer and I don’t think all organizations do this, but at least we LeapFrogBI typically will do this across the board. We’ll have a semantic layer that’s downstream of the data warehouse. What this does is it gives us a buffer between the data warehouse and the reporting layer. This is where we might put in business logic. We might use our reference data to join in to decode certain values. It provides us with a lot of flexibility and the semantic layer is a typical layer in most architectures.
Then you have your front-end layer. This is your business intelligence tools as they’re called, Power BI, Tableau, and so on. In the front-end layer, you have several sub layers as well. I’m not going to go into that now, but let’s just call it the front-end layer. This is where we’ll do our final preparation so that we can get data visualizations in front of someone in the form of a report that someone can traverse in an ad hoc manner or so on.
Two Parallel Environments
Let’s talk about testing. In most scenarios, we’re running two parallel environments at a minimum. One environment is a development environment and the other environment is production. These are two completely isolated environments, meaning that there’s no dependencies across those two environments outside of the data sources and reference data. We do this so that we can have a production environment that isn’t going to have the volatility involved with ongoing development.
Simply put, we can do our development in the development environment without risking interrupting the production environment, which is in use on a daily basis. Once we’re happy with the results in development, test it thoroughly, then we can go through the process of promoting any features we put in development into production. The two environment type of architecture is critical in ensuring that we have a solution that is reliable. Your end users should be able to know that when they’re looking at reports that they’re fully validated and they’re going to be delivered reliably every day or whatever the interval might be.
Variables in Estimating Requests
With all of that said, I want to talk about a few examples of requests that you may have for your development teams and maybe give some insight into what it takes to complete those feature requests. Let’s start out with what I would consider a simple request. Let’s just suppose that you’re looking at your business intelligence front-end and you have a report, it’s already authored. What’s already authored? It’s already in use and you simply want to make a change to the visual layout of that report. Meaning, you might have a bar chart today and you want to change it to a line chart or you want to move visualization A over a little bit to the left and then widening visualization B. Something like that.
Basically nothing that requires any sort of data changes. It’s just front-end visualization manipulation. Could be color changes in a line chart. It could be adding a logo to a report. There’s a number of different ways to aesthetically change the way reports are laid down. So this type of request is pretty much one of the simplest to fulfill. The process that would need to be followed here is we would first go into the development version of that report and we would carry out whatever requirements we have. Again, we’d make that change. We’d change the bar chart to a line chart. We’d widen the visualization. We would add the logo, whatever it is. Then that would be delivered to the person that requested the change to test what was changed.
In this case, testing is simple. It’s just visually observing the report and see if it looks and feels the way it was requested. If it does, the test is passed. Now we can go through the process of promoting that to production. What we do when we promote, at least in a Power BI environment, we promote a Power BI report from development to prod.
When we do that, we’re taking what is called a PBIX file. And we are going to redirect that PBIX from consuming a development data source to a production data source. We save another copy of it in a production folder that we can use to ensure that we always have a production copy of the report. Then we just publish that out to the production environment. Then we do one final round of validation to make sure that nothing happened along the way.
We go out and look at that report in the service. Make sure it looks the way that we wanted to look. Send again the requester, the note saying, “This is now in production. Please review, let us know if there’s any issues.” So that’s a pretty simple process. The entire process end-to-end might take if you talk about actual development time, not the time it takes to wait for one person to test and get feedback and so on. If you talk about actual development time to do something like that, it could be as little as just an hour, even less to do something like this. Maybe if the requests are quite particular where we’re trying to get into individual fonts.
If you have a very detailed spec that’s trying to be followed, maybe it would take a little while to go through a lot of visuals and make aesthetic changes to them. But these types of requests are quick. Within a handful of hours in nearly all cases, this type of thing is done. So that’s an easy scenario, but you can still see that even in an easy scenario, we don’t just go into production and start making changes. That’s not the way this is typically done. It’s not the way we recommend doing it. We must go through a development process to make sure that we understood the requirements and that we delivered on the requirements correctly before we push something into production.
It’s just too easy for miscommunication to occur. And it isn’t worth risking exposing mistakes to a production environment when those things happen. It’s a lot better to do that in a controlled environment, such as development so we can catch those changes, We don’t push them to production. And ultimately when you push changes to production that are incorrect, we’re really diminishing trust. We’re making it more difficult for our end users to actually utilize and trust these reports that we’re working so hard to get out there. So we just don’t want to do that.
Let’s talk about something a little more complicated. If a business user needs to have a change in the way something is calculated. We have a metric, metric A and this metric today is calculated as however, it doesn’t really matter. We want to make a change to how that calculation is going to happen. This might seem extremely easy on the surface and it might be very easy or it could be very complicated. It really depends on what the change is and the calculation.
It depends on whether we have the needed data elements available to make the change quickly. It depends on whether the system of record has that data available. So there’s a lot of variables. I’m just going to go through kind of the simplest to the most complicated scenario to try to give some insight into the effort required to get this type of thing done. So, first of all, if we’re thinking about the layers in this solution, the calculation may be defined at pretty much any layer in the solution.
If the calculation is being done in the front-end, I am going to use Power BI. If Power BI is where the calculation is happening, meaning that all of the data required to perform the calculation is being fed into Power BI and then Power BI is actually doing the calculation. If that’s what we’re talking about, then the effort is likely a lot lower than any other scenario that I’ll talk about today. Because if we’re in the front-end, meaning all of the needed elements are available, and we’re just performing a calculation, we might be doing that in Power BI terms. It would be an M in Power Query. It could be done in a calculated column. It could be done in DAX.
But whatever the case is, all of that front-end work is typically pretty quick and a lot faster, which means it’s cheaper and it’s just faster delivery overall. In that case, it would be the same scenario I just mentioned for visualization changes, perform the change in development, do the testing, and then if it’s successful, promote it to production. Pretty quick. Now that may not be the case. The calculation might be in that semantic layer, which is one layer back from the front-end. And in the semantic layer, it could be pretty easy to perform the change or it could be pretty complicated.
But in semantic, we have some pretty good tools to do the job in most cases. Again, assuming that we have the data elements needed to actually make the calculation change, in that semantic area we can usually do that pretty quickly. I want to be careful when I say something like that because not everything is quick and easy. Some calculations might seem quick and easy, but they might involve things like merging data with different cardinalities together. It might mean that we need a new reference source. There’s a lot of scenarios.
Just if all the elements that we need are available in semantic, and we’re just making a simple calculation change with available data, that’s straightforward to do in semantic. And the same testing process would apply. Do it in development, do the testing. If it succeeds, then we promote it to production and then we validate those changes. Now, if the elements that we need are not yet in the data warehouse, meaning they have not been extracted from your systems of record, then we have a larger task at hand.
We have to actually find the elements that we need in the systems of record. If we’re doing integration across multiple data sources, we got to figure out how that is impacted if it’s impacted. Then we need to plum that information through the solution. So we need to bring it into the data warehouse. We got to put it into semantic. We got to get it into ultimately the front-end reporting tool. And of course, all of that needs to go through a testing process.
We’ve got to make sure that the work was done correctly in development, test it, promote to production, and then validate it. So that is quite a mouthful talking about doing a calculation change. Really, I could change the scope of that conversation and cover calculation changes, as well as adding new data elements. One thing I would say is depending on how you’re building your solution or how your development team is building it, you may have a persistent staging area in place.
This is often an area upstream of the data warehouse where we collect more information, I would say than what we initially have in scope. So, we might go into basically a table in the data source and instead of trimming off fields that we don’t need, we might include those fields when we’re doing our initial staging process. I mentioned that because that means that the process required to add the elements might not require us to go all the way back to the data source.
It really depends on how the solution was built, whether or not we had the foresight when the initial implementation was done to pull in additional elements, thinking that we might need them in the future or not. You can’t really knock the developer for doing it or not doing it. It’s just a matter of what was known at the time and how the solution was built to meet the initial requirements.
Adding a Data Source
Another scenario would be we need to add a new data source. This is I on the larger side of request. You have a business application that’s being used as the source for the data warehouse, and now we want to add a second business application or a third or fourth. This happens routinely. It could be through mergers and acquisitions. You have acquired another organization. A new organization is not using the same application that you’re using, so now you’ve got two applications.
In some cases, it’s the same application, meaning the same tool that’s being used. But now you have two instances of that tool. That’s one scenario or it could be two completely different tools just providing sort of the same functionality. They serve the same business purpose. In both cases, if you want to be able to see a consolidated view of your business, which most of us do, you will need to be able to extract and conform that information to some standard. That is the target model in your data warehouse. The target model of your data warehouse should be describing your business, not the source system.
If you built your data warehouse from system A, you should still have the same things, that you need in order to plum in system B. So when you go through this process of adding a new data source, there are a few things that are important to keep in mind. Let’s just talk about customers. Let’s say you’re working on a system that includes a list of customers. Well, the system that you built initially is going to have an ID that uniquely identifies every customer. It may just be a number. So number one represents customer John Doe. Number two represents customer whoever and so on. Now, we need to integrate system B. That system also has a list of customers in it and it also has a customer number 1 and a customer number 2. So clearly, we can’t have customer number 1 represent two different people. We need to have a way of identifying those two different people, regardless of which system they originated from.
This is where we often add some type of an identifier that represents the source. So now we have system A customer 1 as the unique identifier, which will not collide with system B customer 1. This is called a composite key and it’s something that’s typically required whenever we’re integrating multiple systems. I’m going through this because we are talking about adding this business application, which is serving a similar purpose and has overlapping data elements.
This process requires us to look at how everything has been keyed within the existing data warehouse. It may be that we had the foresight to know that you are in a phase of acquiring other organizations and you may have additional business applications serving the same role. In that case, we probably would’ve preceded this source identifier so we can have a composite key that identifies a customer uniquely whenever we built the system originally. Or maybe not. Maybe it was unexpected. In that case, we may have to go back and add those source IDs.
In both cases, this type of process of adding a new data source is going to be one of the most extensive development efforts that you have to undertake. That doesn’t mean it’s going to have to take a long time, but it might. It just really depends on what requirements we have. It also doesn’t mean that we’re going to be able to assume that the effort of system B being added to the solution is the same effort as system A, because we’d already completed building the solution on top of system A and that’s because they’re different systems.
It may be much faster because we’ve already got a template in place. We don’t have to do the target modeling, or it may be much slower. Maybe system B, it’s going to be challenging from simply the data acquisition standpoint. Maybe we have to work with an API that’s simply not designed for the purpose that we are using it for, data warehousing. There’s a lot of things that can happen and it’s very difficult to estimate.
Fixed Bids = Paying More
I mentioned earlier, if we were going to provide a fixed bid for such work, we would have to do the work to really understand what the effort is going to be. In that case, we would be able to give an accurate fixed bid. The only alternative, when you’re talking about fixed bidding something, is to take on the risk. So as a vendor, that is the role that LeapFrogBI typically plays. If we were going to fix bid something, we would have to take on the risk of that estimate that we would have to do, which we based our fixed bid on. We must take on the risk if that estimate is wrong.
Most of the time we’re going to be right, which means that essentially, we’re having to add on the cost of insurance to our fixed bid, because there are unknowns that we must account for. So usually, the client is better off not asking for a fixed bid. If you ask for the fixed bid, you’re going to end up paying more. And frankly, you’re probably not going to have the results that you’re looking for.
Other Common Feature Requests
Let’s talk about a couple more types of feature requests that I think are common. One is, “I need to get a report out to certain people. Can you get this report out to these four people every day? I want these reports delivered by email.” This type of thing is quite simply done. It is front-end work only. It’s done simply in the reporting tool. And this is one scenario where we will not do the same thing in development. We won’t go through the development process because you don’t want these people getting a subscription from a development environment and a subscription from a production environment. You just want them getting the subscription once a day.
So, setting up a subscription is usually pretty straightforward. It only takes a few minutes to do in nearly every tool. Now, when we talk about subscriptions, it can get a little bit more complicated. How about row-level security? Sometimes you want people to get a subscription and you want the subscription to only contain the information relevant to the person receiving the subscription. You want regional manager A to get region A’s report, and region B, we go to region manager B and so on. That type of thing could be quick and easy, or it could take a substantial amount of work to get that done. This is where we’re saying we could either create reports essentially that are for region A and a separate report for region B and then subscribe to each other’s reports that are already filtered down to those regions. Or we can do something that enables the end-user to consume those reports and have the reports dynamically present the information relevant to that user. This is done by detecting which user has been logged into the system. If region A manager logs into the system, we now know who that person is. We can dynamically filter the report down based on that user’s role in the organization. This is where we get to row-level security.
Row-level security is where we’re going to take that row, where we’re going to apply some type of filter on top of that row so that we can filter the rows of data out that are not relevant to that person. This type of thing could get very complicated. It is an important feature for sure. It has a lot to do with how security is managed overall. It ensures that data that is sensitive doesn’t get into the hands of people that it should not. So, row-level security is a really important tool. It’s not the only way to handle this type of request, but as far as estimating the effort to implement such a thing, unfortunately can vary widely.
You could have a very simple scenario. Again, we talked about region A, region B. We are just going to create separate reports for each one and we’re going to give access to each report to the correct people. Something like that is quite simple to do. Of course, if you’re talking about 100 different rows and you’ve got to manage all the assignments of people to those rows, well, that’s a little more time intensive. It’s that type of thing that gets us to the need to do dynamic row-level security.
Implementing row-level security while not difficult, it can take a little bit of time. It also requires a unique type of testing process. You must be able to essentially log into the system as if we are one of the people, that let’s just say, we’re region A manager. We’ve got to be able to log in and act as if we are that person so we can see what they will see. That’s required because the whole purpose with implementing this row-level security is to actually make sure that people are only seeing the data that they’re required to see.
That’s been quite a few examples there. At the end of the day, I wish I could be more precise about how these estimates are done or even give you some solid numbers, frankly. I know that would be a lot more comfortable for everybody, but I think if you work with your development team over any period of time, you will naturally start to understand how long things take to complete.
You’re going to be able to work together and understand how long something took to complete. If you need another process done that is the same, you know how long it will take to complete. You also will get more familiar with the testing process of your organization so that you know how long that process will take. You will know if it is going to take at least a couple of days to get through this testing process unless we need to expedite it. And then of course, other things can be done to make it go faster.
One other thing I want to mention related to testing is hot fixes. So sometimes things actually don’t go right in the testing process and we get a bug in production. For Example, we’ve got a report that is simply incorrect in one way or another. It could be trivial or it could be a serious issue. In that case, we will typically go through a hot fix process and this is where we’re going to first fix the production environment. Second, push those changes down to the development environment.
The reason for this, maybe it’s obvious, is that if production is broken today, the first thing we want to do is get that fixed. If we can do that in the production environment quicker than going through the testing process in development and then promoting up to production, that is what we want. It may take us a day or two to get that done. The objective is to fix what’s broken in production first. This should be a pretty rare scenario. We shouldn’t be doing hot fixes every day, but at the end of the day if you’ve got something broken in production, then the priority is getting that fixed in production first.