Social media platforms have become a dominate source of data used by governments, corporations and academics to study human society. Yet, in the rush towards ever-more sophisticated algorithms and visualizations to analyze trends from social media, we are ignoring the critical questions of how well social media actually reflects societal trends and just how to use all of the analyses we produce.
Perhaps one of the most dominate themes of the “big data” era is the tendency of data scientists to grab every shiny new dataset or tool and derive meaning from it without spending the necessary time to understand the underlying biases or nuances that might impact the questions being asked of it. Immensely sophisticated algorithms are increasingly wrapped into turnkey user-friendly environments that make it easy for non-technical users or even technical users unfamiliar with those particular techniques, to apply them without understanding the constraints or limitations of the algorithm or enforcing limitations of the datasets and parameters applied. Even when software implementations of algorithms warn of improper parameters or incompatible datasets, those warnings are frequently lost in long software pipelines such that the end user is happily delivered a result regardless of whether it is actually meaningful.
Today’s big data world is driven more by computer scientists than domain experts, which has created a situation in which technological sophistication is frequently prioritized over methodological soundness and accuracy. Writing for Wired in 2014 I noted that one of the primary driving forces behind the limitations of current sentiment mining tools is that current systems have been largely built by the computer science disciplines using assessment techniques that have remained largely unchanged since the first sentiment mining system built more than half a century ago for punch card computers. Yet, such systems are now finding use in policing, where an innocent tweet about a card game called “rage” can result in an increase in one’s police threat score.
Yet, perhaps the greatest threat to today’s social media analytics is that in a world of astounding algorithms processing billions of data points to create stunning visuals, we rarely pause to ask whether the results are meaningful. For example, I once sat through a briefing to defense and policy officials that purported to summarize views of the Syrian population across the country. Amidst claims of analyzing tens of billions of data points to generate what were truly beautiful visualizations, I was the only one in the room to ask where the data came from. The answer was that they had keyword searched tens of billions of tweets to locate English language posts that contained certain emotional words and were also GPS tagged. Yet, in Syria at the time Facebook was a more popular communications platform, few communications were GPS tagged and most were in local Arabic dialects rather than fluent English.
This example also raises another critical limitation of many social media analyses: the size of the sample analyzed, rather than size of the sample searched. While the authors keyword searched tens of billions of tweets, their final sample of matching English-language GPS-tagged tweets containing the desired emotional keywords and sent from within Syria was extremely small. It is important when interpreting a social media analysis to look not at the number of starting data points searched (which tends to be what is most often reported), but rather the number of data points actually incorporated into the final analysis (which tends to be vastly lower).
There are also many questions about the representativeness of social media and its penetration into each geography and demographic around the world. As I explored last August, Twitter has largely stalled in its global growth and geographically has failed to expand substantially beyond the footprint it had four years ago. This can create critical blind spots, such as the Yemen conflict, in which a Scud missile fired into Saudi Arabia was covered heavily by social media users in Saudi Arabia, but covered little by Yemeni users due to differing social media penetration. In this case, social media offered a rich and detailed view of the launch through the eyes of Saudi Arabian citizens, but at a cost of capturing just half of the conflict. Few analyses of social media take the time to study the penetration into each community or geography of interest or adjust or normalize their findings to account for underlying nuances and biases.
The social media platforms themselves often provide conflicting accounts of their penetration or use ever-changing metrics. Tinder made headlines when its CEO claimed in an interview that the service had “80 million users worldwide and 1.8 billion swipes per day” while the company’s SEC filings stated it had only 9.6 million daily active users and 1.4 billion daily profile swipes. Twitter has constantly evolved the way it reports growth and user engagement to the point that it caught the attention of the SEC last year. Airbnb raised concerns earlier this month over potential filtering of its listings in a report design to assuage concerns over its use in New York City. The bottom line is that even publicly traded companies redefine their metrics of growth so frequently or provide such conflicting information that it can be difficult to robustly assess their penetration into an area of interest.
Analyses drawn from social media can often yield results wildly off from reality, even in social-saturated markets such as the United States. In the hours leading up to the Iowa causes, Bernie Sanders led Hillary Clinton in Facebook mentions by 73% to 25%, yet when the votes were tallied they were nearly tied. In 2012 Twitter produced an interactive visualization of tweets about candidates Barack Obama and Mitt Romney that showed Obama dominating engagement in Southern states that ultimately were won by Romney. Similarly, Facebook recently released by-county data showing Sanders sweeping the entire country on the Democratic side, with Trump dominating overall. While those prognostications may ultimately come true, they also reflect the fact that Facebook’s demographics are heavily skewed and raise the question of what precisely “liking” the page of a candidate means.
In the 2012 election, Facebook allowed users to tell their friends that they had voted, with more than 9 million users taking advantage. On the surface, the data appeared to show that Obama overwhelmingly turned out women voters, with a two-to-one ratio of women declaring they had voted compared with men and more Democratic than Republican voters. Yet, when Facebook explored the data further, they noted that “women are disproportionately more likely to share in general on Facebook” and in fact women shared twice as much on Facebook as men about all topics. In short, the two-to-one ratio of women to men voters was simply an artifact of two-to-one sharing in general on Facebook, not a reflection of the election itself. Democrats were also much more likely to share with their friends that they had voted compared with Republicans. Age also played a role, with Democrats 18-44 more likely to share that they voted and Republicans over the age of 65 more likely.
While every platform has its biases, the problem here is that only Facebook itself has the ability to assess the biases of its platform and it does not publish those statistics together in a single machine-friendly dataset that is regularly updated. Think about it for a moment – a researcher wishing to understand whether Obama turned out women voters in large numbers would only have the ability to search for posts matching particular search criteria. The final analysis would show that women voters outnumbered men by two-to-one. An ordinary researcher would not have the ability to go back and compute the total volume of all posts by women versus men across all of Facebook on all topics to discover that this is simply an artifact of Facebook itself, rather than a seminal finding about the election.
While it is possible to approximate aspects of gender bias by using portfolios of searches on different topics, no one but Facebook’s own researchers have the ability to properly normalize results generated from their platform. It is thus impossible to know whether results derived from the platform, such as Sanders’ enormous lead over Clinton in Iowa, were due to a last-minute change in sentiment as voters stood in line at the polls, or whether Facebook users in Iowa were heavily biased towards the particular demographics that favor Sanders.
Twitter in particular has become one of the primary data sources used for societal study because of its machine-friendly API access that allows users to directly ingest the live Twitter stream for analysis. Yet, in many areas of the world Twitter is not the primary conversation platform for social communication. In China, for example, Weibo would likely be a far better source than Twitter. Facebook offers basic API access, but not on the scale of Twitter’s interface, and, most critically, the majority of Facebook content is not public access like Twitter’s. In fact, the public square experiment of social media appears to be rapidly fading.
Even those users that remain on a given social platform may change the way in which they use it to express themselves over time. In fact, the representativeness of a social media platform may be inversely related to its ubiquity in society. As one college student put it in an interview with CNBC, “Millennials post less now and about less trivial things. Perhaps this is because we’re older and more mature, but it’s also because Facebook’s omnipresence has given everyone we’ve ever known access to our lives. This has forced us to create and maintain a public persona: We’re less likely to broadcast all of our lives authentically, and more likely to engage in selective sharing.” She notes that “the presence of parents, teachers and future employers on Facebook” has necessitated far more selective sharing and that newer platforms like Instagram with its “filters and photo manipulation [have] create[d] a photo-sharing network that is more visually perfect — and unreal — than ever before.”
Social media analytics has become so commoditized today that nearly every tool on the market offers the ability to enter a set of keywords and get back a volume timeline, a basic sentiment timeline and a word cloud of top hashtags and keywords. Yet, how exactly should such tools be used? In the absence of platform-wide statistics like those enumerated by Facebook in their 2012 study, it is impossible to normalize or adjust results for the underlying biases of each platform. At the same time, social media may not always offer the best data source for a given question. Monitoring for disease outbreaks in the forested regions of Guinea or public sentiment in war-ravaged areas of Yemen may not necessary be questions that are most amenable to social media insight. Even a question like consumer preferences among young social-savvy college students in the United States might be biased by underlying changes in how that demographic uses the platform, even if they are heavily represented among its users.
There is also the question of how social metrics correspond to reality. I once saw an analysis of Twitter that showed Justin Bieber as the most influential person in the world about Syria. While a tweet by Bieber to his tens of millions of followers will no doubt be widely read, it is unlikely that his musings on the Syrian peace process will suddenly sway the warring factions and yield overnight peace. In fact, this is a common limitation of many social analyses: the lack of connection between social reality and physical reality. A person who is highly influential in the conversation on Twitter around a particular topic may or may not yield any influence in the real world on that topic.
The question of how to use measures derived from social media data (or indeed any kind of data) is often overlooked in favor of bewitching visualizations. Generating a word cloud of the most frequently used hashtags appearing in conversation about Syria is not useful in itself – someone must actually use those results to effect change. More often today that word cloud will simply be copy-pasted into a PDF report as a pretty illustration on the cover, rather than used as part of a critical analysis of the insights it does or does not provide into the Syrian peace process.
As I have written many times before, as the “big data” and analytics worlds grow up, they must move beyond their technological roots to involve domain experts, ask questions about and understand the biases of the data being analyzed and move beyond pretty visuals to real decision making for the power of data to truly reach its potential.
This article was written by Kalev Leetaru from Forbes and was legally licensed through the NewsCred publisher network.