A recent survey conducted by CrowdFlower and summarized on Forbes found data scientists spend most of their time massaging rather than modeling or mining data for insights. It seems 79% of their time is spent either accessing or preparing data, leaving only 21% for everything else. Far from being a new problem, this same issue has […]
The ”data lake”, a catchy new buzzword in analytics circles, has many people wondering if they still need a data warehouse. You may have heard that you can run analysis directly against the data lake, and that’s true. This quickly leads to the question, why build a data warehouse when you can have a data […]
Metadata has the power to make or break a data warehouse. Like most innovation, a metadata driven approach to data warehousing solves common challenges by taking the process to a higher level.
Data warehousing success depends on properly designed ETL. In this short video we walk though the foundation design pattern step-by-step.
What exactly is a data warehouse? Why create a data warehouse? This short video provides non-technical answers that are easily understood by anyone.
Some of the most challenging data warehousing situations come in the form of external data mashups. Because the term “data mashup” has taken on a number of meanings over the past few years, I’ll clarify how the term is used in this post. External Data Mashup Source Data Description Systems of record are not in […]
There are many ways to deal with records that have been deleted from a source system. The first decision is to determine if the deletion needs to be reflected in the target model. Then we need to figure out how we will know when a delete has occurred and identify the deleted records. For an […]
Anyone who has been involved with data warehousing knows that there are plenty of things that can go wrong. Mistakes can be made when researching the data sources, collecting requirements, designing a dimensional model, etc… Assuming all of the analyst and modeling work has been done perfectly, we still need to be sure that the […]
Data warehousing has a clear set of objectives such as data persistence, single easy to navigate data model, fast query performance, etc… While it is not the role of the data warehouse to mimic the data in source systems, the data warehouse clearly must account for changes in the source system. Three types of changes […]
Agile software development has become the preferred methodology of countless teams worldwide. The principles behind the agile manifesto serve as the rule book for those of us who claim to be part of this paradigm shift. A quick read through these 12 principles will make it very obvious that agile is about much more than tools. Agile […]
Data warehouses provide organizations with a knowledgebase that is relied upon by decision makers. ETL (extract, transform, load) is the process that is responsible for ensuring the data warehouse is reliable, accurate, and up to date. This post presents a design pattern that forms the foundation for ETL processes. What are the goals? Before jumping […]
Building anything from houses to software applications requires that we make decisions about how to efficiently convert raw materials into the desired product. Building a home in 1492 started by cutting down raw trees. Today builders use dimensional lumber or prebuilt partitions. Efficiency is gained by streamlining many of the tasks related to building with […]