A recent survey conducted by CrowdFlower and summarized on Forbes found data scientists spend most of their time massaging rather than modeling or mining data for insights. It seems 79% of their time is spent either accessing or preparing data, leaving only 21% for everything else.
Far from being a new problem, this same issue has been reported routinely for years. A survey last year by Trifacta reported that 80% of the work in any data project is wrangling the data. In 2014 The New York Times ran a story titled For Big-Data Scientists, ‘Janitor Work’ is Key Hurdle to Insights which reported that 50-80% of an analyst’s time is spent “collecting and preparing unruly digital data, before it can be explored for useful nuggets.” In 2013 a Lavastrom survey 41% of analysts reported that their single biggest challenge was accessing or integrating data sets. I didn’t look further, but you get the idea. I suspect this problem has been around as long as people have been doing analysis.
Just to recap here, the average data analyst is spending 4 days every week organizing data, leaving only one day per week to conduct analysis and develop reports.
If you create analytics, or if you rely on them to do your job, this directly impacts your ability to succeed, and therefore your company’s success as well. There are actually two distinct costs here. First, there is the direct cost of resource hours, approximately equal to 80% of the salaries of any data analysts employed by your company. Chances are good that these direct costs also extend to managers and others in your organization who also spend some portion of their time trying to transform data into insights. But all of these direct costs combined most likely pale in comparison to the second category — indirect costs.
So what are these indirect costs? When an organization can’t devote sufficient resources to actual analysis the result is a lag between when information is needed and when it becomes available. Unfortunately business can’t wait and often this lag means decisions must be made before all the facts are in. The odds of making poor decisions go up as the amount of data available for decision support goes down. Poor decisions lead to a lack of operational and marketing optimization, which is another way of saying that available profit goes unrealized. Conversely, superior analytics leads to superior decisions which leads to maximum profit. As IBM found in their Global CFO Study, companies utilizing advanced analytics to drive decisions (“value integrators”) had twice the revenue growth, and 20 times the EBITDA of competitors using gut feel to make decisions.
If you’re running a business in today’s competitive environment you obviously can’t afford to throw away 80% of your analyst resource investment and give up the opportunity to double your revenue growth. But identifying the problem is only a first step. Before this critical problem can be solved we must look beyond the downstream impacts and begin to understand the upstream causes.
(This is part one of a three-part series illuminating the challenges of data analytics and describing a proven solution to the problem. Read part two: “5 Reasons Why Your Data Analyst Can’t Analyze Data“.)