Mo Data, Mo Problems: 7 Inconvenient Truths About Big Data

February 14, 2013

Type the term, “big data” into Google, and you’ll indeed get big results. Almost a billion of them.

By my rough estimate, about a third of those results will be startups that profess to be changing the world with big data. Another third are VCs, OpenView included, quoted in TechCrunch about their latest Big Data investment. The remainder are Fortune 500 executives explaining how Big Data is their number one priority for 2013.
In short, everyone wants data, and the bigger the better.
As a member of the research and analytics team at OpenView Labs, it’s my job to translate data, big or small, into insights for our portfolio companies. The buzz-wordiness of the big data movement and its ability to universally inspire awe among business executives has certainly helped raise interest in our services. But it’s also engendered deep misconceptions about big data, principal among them that more data is always better. This mentality can actually impede companies rather than enlightening them.
Surrounded by this behavior, I feel it’s my civic duty to issue a warning about the limitations and dangers of big data. Before you fire all of your mid-level executives and replace them with Hadoop, consider the following seven inconvenient facts:

1) Data ≠ Knowledge. Data x Analysis = Knowledge

Given a properly sophisticated analyst with the right tools and enough time, a big data set can be a treasure trove of insights. But data on its own is just numbers, and numbers can’t run a company. This is a topic I touched on previously in a blog about Big Analytics, but what I didn’t mention is that the opposite is also true: no matter how many advanced degrees and fancy tools your data scientist has, he/she can’t give you a shred of insight without the right data. Both of these inputs have to be in place for a project to be successful.

2) Data and Analysis Compete for Valuable Resources

Regardless of how you choose to assemble, store, and clean your data, large data sets are expensive and time consuming. Given a limited budget (what budget isn’t limited), building big data will suck resources away from the analysis of that same dataset. Since Data x Analysis = Knowledge, leaving too little resources for one input will cripple your ability to make data-driven decisions.

3) Data Always Seems Important. Often It Isn’t

Let’s say you want to collect data on prospective customers to help your salespeople give more personalized pitches. More data can’t hurt, right?
Actually, it can.
In a fascinating study conducted at Princeton and Stanford, psychologists discovered that the quest to assemble relatively unimportant information can actually distract us from the few relevant facts. This “addiction to data” inhibits our decision-making. So even if you do have the maiden name for the mother of every one of your prospects, you may not want to give it to your salesperson: their over-sensitivity to this data point may distract them from the information that can actually help their pitch.

4) Small Data Can Have a Huge Impact

Another misconception widely held by executives is that you need big data to make data-driven decisions. Often, only a little data is enough.
Small data sets won’t always give you the full picture, but the most impactful conclusions often actually require the least data. That’s because more glaring discrepancies in a data set require a smaller sample size to reach statistical significance. So don’t wait until you have a ‘complete data set’ to begin making decisions.

5) There’s No Such Thing as a Complete Data Set

In the previous paragraph, ‘complete data set’ is in quotes because one doesn’t exist. The world contains an infinite amount of data and almost all of it is irrelevant to whatever you’re trying to measure. Attempting to quantify the effect of everything on everything else won’t just waste a lot of effort, it’s also likely to land you with a type I error, such as the Superbowl Indicator. Don’t bother collecting data unless you think there’s a strong possibility it will be relevant.

6) Granularity Often Makes Decision Making Harder, Not Easier

Suppose you’re a B2B company trying to decide which industry makes the best customers. Given the option of multiple levels of granularity in the SIC code system, many executives will opt for the most granular choice.
This is usually a mistake. The objective of analysis is to boil down an impossibly complex world into a digestible set of approximations. Always choosing the most granular option works against this, making it easier to miss the forest for the trees.

7) Big Data is Useless if You Can’t Communicate it Clearly to the End User

Hopefully my stats or econometrics professors aren’t reading this, because if they are, they might fail me retroactively. But from my perspective, clarity in how you present your analysis is often much more important that the precision of your model or completeness of your data set. A logistical model may be more accurate than a linear one, but if you can’t explain it to a layman, it’s not likely to have any impact on your organization. Business people don’t trust black boxes, and they won’t follow advice when they don’t understand the rationale behind it.
*****
If there’s one piece of advice I have for executives looking to bring big data to their companies, it’s to focus on the insights, not the size of the data set. Often, much more can be learned from a small, targeted one that’s properly cleaned, collected, and analyzed, than from a hulking one that’s dirty, old, and granular. Big data can be a star for your organization, but it can’t do it without the right supporting cast of technology and manpower to support it.

Behavioral Data Analyst

Nick is a Behavioral Data Analyst at <a href="https://www.betterment.com/">Betterment</a>. Previously he analyzed OpenView portfolio companies and their target markets to help them focus on opportunities for profitable growth.