The Challenge Of Figuring Out The Right Big Data Questions


Howard Baldwin, Contributor

May 4, 2015

You know the great thing about databases? You ask them a question – or, to be more technical, you generate a query – and the answer comes back. That’s highly simplified, of course, because for your query to be successful, it has to at least match the data therein.

But what if you have so much data that you’re not even sure what question to ask? Typically, you know what you don’t know, and that’s why you ask a question. But what if you don’t know what you don’t know? It takes real insight to admit that you have no insight; I’ve worked with people in my career who didn’t have this kind of insight, and they’re really no fun to interact with.

That’s the challenge that Crossmark faces everyday – not dim-bulb people but rather so much data that they’re not always sure what they’re looking at. Crossmark labels itself as a sales and marketing services company for the consumer goods industry, but that really doesn’t begin to tell the story of its big data challenge.

Crossmark has 30,000 employees in North America, Australia, and New Zealand that visit retail locations on a weekly basis on behalf of its clients, which could be the retailers themselves or the manufacturers. These employees collect lots and lots of data: point-of-sale data; where are the products in the store; how much shelf space do they have; how much are the competitive products selling for; what does the display look like. Part of the work falls into the category of competitive intelligence; part of it falls into the category of compliance – if the manufacturer paid for a special endcap display, is the retailer complying?

That’s just the in-store data. There’s also the external PESTLE data, which stands for political, economic, social, technological, legal, and environmental data. For example, in the environmental category, it’s demand for snow shovels in Massachusetts and demand for suntan lotion in California. Lest you think this is a new term, it was actually originally coined in 1967.

That’s a lot of detailed data from a lot of places. Alexandros Siskos, Crossmark’s vice-president of analytics and insights, refers to it as “big data, small insights,” adding that Nielsen Media Research has outsourced some of its data collection work to Crossmark at stores it considers statistically significant.

As Rob Saker, Crossmark’s chief data officer notes, “I’m not going to use the trope line that we’ve always been a big data company. We may have been generating big data for a long period, but we weren’t capitalizing on it.”

Now they are. Why? Several reasons. First, because so much data leads to noise, static, and other confusion. But also because demographics are always changing.

Siskos cites just one of Crossmark’s challenges: the Hispanic market in North America is the fastest growing demographic group, with $1.6 trillion in potential purchasing power. “If a store has issued a loyalty card to someone named Maricella, and the market basket has feminine hygiene products, diapers, and incontinence products, who is Maricella?” he asks. “If you know it’s a multigenerational household, is she the mother or the grandmother?” It’s a trick question. It turned out that Maricella is the youngest generation, and the diapers were actually for one of her grandparents.

“Maricella is what we call a two-hundred-percenter,” says Siskos. “She’s a millennial who’s 100% Mexican at home and 100% American at school or the mall. She behaves like any other kid in that age group.” But if you assume Maricella is the mother instead of the daughter, you run the risk of missing the target. It’s even more difficult, notes Siskos, because the Hispanic market isn’t monolithic. You can’t use something Mexican like mariachis to market to Puerto Ricans, because you run the risk of not only not connecting, but actually being insulting.

Complicated? You bet. Inscrutable? Not anymore. “We had to figure out a way to fluidly and fluently walk through all the data,” says Siskos. Before Saker came in, Crossmark was using lots of tools and defaulting to Excel spreadsheets to rationalize the data. That was painful. “We would run out of space in terms of columns and rows, or Excel would hang because we’d run out of memory.” Crossmark deployed software from Alteryx for ad hoc analysis to ease its pain (it also uses Tableau for data visualization, among other tools, including Microsoft Analytics Platform System and Hadoop).

What Saker loves about the ad hoc tools Crossmark has deployed is the speed they give him to tackle “what-if” questions. “I can go from idea to concept in a day with Alteryx, rather than taking weeks using tools for ETL, database queries, and statistical analysis. It’s great for testing concepts.” It’s not built for day-to-day analysis, queries that you run over and over again, but it’s “terrific for ad hoc testing.”

“These technologies are great because they cut out the wait time. It allows us to do the what-if and actually filter out the noise to get different views of the business,” says Siskos. Sometimes working with clients is like walking into the middle of a movie, where you only have a vague idea of what’s going on. “Sometimes our clients can’t even articulate the question, but Alteryx lets us shine a light in places we didn’t know existed.”

Here’s where the question of not knowing what you don’t know comes in. “An expert is someone who knows all the answers and doesn’t need to ask questions. Big data opens up that concept, and made us realize that what’s key is not being an expert, but actually asking the right questions. Big data allows us to ask different, clarifying, probing questions. Our goal is not to be the expert, but be the best question asker.”

The takeaway here is simple: don’t be discouraged by cascades of data. Consider the value you might derive from asking preliminary questions just to help you understand what you’re looking at. If you can identify the Maricellas in your demographic, it’ll make a big difference.

This article was written by Howard Baldwin from Forbes and was legally licensed through the NewsCred publisher network.

Comment this article

Great ! Thanks for your subscription !

You will soon receive the first Content Loop Newsletter