Top 3 Reasons Why Metadata is Key to Data Warehousing Success
The primary goal of a data warehouse is to enable business users to make better decisions. Business users, however, often have little insight as to what is going on in the black box we call data warehousing. This leads to common head scratcher questions such as… Where is the data coming from? Why is it going to take 4 weeks to make one minor change? Why doesn’t my BI report match my source system report?
Metadata (data about data) has the power to make or break a data warehouse. Like most innovation, a metadata driven approach to data warehousing solves common challenges by taking the process to a higher level. Lanterns did a good job of lighting a room, but they also took a lot of work to maintain and were the cause of many fires. Why not flip a switch instead?
1. Business users do not (want to) understand your code.
Not only do business users not understand your code, but they also don’t have the time or desire to try. This presents a major problem. Requirements are coming from business users, but a business user has no way of verifying that a process is implemented properly outside of validating the resulting dataset. Needless to say, this leave plenty of room for errors related to miscommunication.
DW systems commonly include some level of business metadata which describes the data access layer in business friendly terms. On the other hand, business users often have little insight into what transformations have occurred between the source system and the presentation layer. By providing metadata that describes the data flow and execution process, business users can easily understand the full meaning of data. With understanding comes willful involvement, eased communication, and an overall improved process. Without understanding comes finger pointing, mistrust, and lack of confidence that “this whole data warehousing thing is the right solution”.
2. Low level development is a recipe for failure.
Some of the most skilled people I know are developers. They spend years studying and practicing their trade. It is only natural for developers to solve problems by… well, developing. When building a data warehouse most of the development is concentrated on the process that extracts data, transforms it, and loads the target model (ETL). Developers often choose to write custom apps in a familiar language, hand code stored procedures, or use generic data integration tools to create ETL processes.
None of these low level development methods are understood by business users leading to issues previously discussed. Additionally, low level development processes are slow to create, are difficult to adjust to changing requirements, are often faster to recreate than to transfer to a new developer, are error prone, and are often full of inefficiencies.
Metadata driven ETL is a clear solution to these challenges. The concept is very simple. Instead of developing the ETL process we define it in easily understood terms. The definition (metadata) is then read by a metadata engine which generate the required code. The metadata engine become a knowledge repository that automates common tasks, enforces user standards, and enables developers to respond to changing requirements with ease. Just as importantly, business users can easily read and understand the definitions which is essential if we expect business user to be involved in the data warehousing process.
3. Ongoing monitoring should not cost a fortune.
Data warehousing typically includes huge cascading arrays of dependencies. When something goes wrong the symptoms can show up in a number of downstream processes. The ability to determine when a problem has occurred, identify the source of the issue, and resolve the issue quickly is clearly very important. Properly deployed metadata has the ability to make ongoing monitoring and issue resolution an efficient process.
All ETL processes should include event logging which stores information such as the time a process started, lapsed time, row counts, and any related error messages. Real time execution status and historical events should be stored as process related metadata. The method chosen to create such metadata should be systematic. While low level development method can create logs just as well as any other method, why waist developer time & resources manually creating a process that can be fully automated using a metadata driven approach? Assuming the ETL process was also created using a metadata driven approach, modifications can be completed quickly by changing the project definition.
If you are building a new data warehouse or finding that your existing data warehousing effort is becoming terribly difficult to maintain or extend, then consider taking a metadata driven approach. LeapFrogBI and other DW tools are based on these concepts and can save you a lot of time, money, and heartache. More importantly, these tools will allow you to deliver the decision support system you need to your business users quickly, accurately, and with confidence that you will be able to keep up with changing business requirements without breaking the bank.