Earlier this month the UK Conservative party won the British general election outright, conquering a majority of 331 of 650 parliamentary seats to Labours 232. None of the major pollsters predicted this outcome. Their research generally pointed to a hung parliament with a slight advantage to the Conservatives over Labour.
Why is this relevant to big data projects?
The similarities between a poll and a big data project may not be apparent; however a poll is actually a “small” big data project. It ingests multiform data (telephone and web surveys, political, socio-demographic and geographic information), distils it (produces a forecast within a confidence band) and enables decision making (e.g. how to tweak a political campaign).
In fact, there are some important lessons to be learned from the polls debacle…
Don’t fall in love with black boxes
For example, not knowing exactly what is the bias induced by using “web-polls” in which the responding population is self selected, might have had significant consequences on the accuracy of the forecasts even when using sophisticated methodologies to correct the bias
The more realistic the expectations, the less likely someone will end up eating up their own hat.
Fail fast – succeed faster
For example, there is wide consensus that a large portion of electors made up their minds at the last minute and the turnout was lower than expected. This uncertainty could not be integrated in the models and contributed to the inaccuracy.
What to do to avoid these mistakes? Well, nothing…
Mistakes will happen. Be prepared to accept them, quickly understand the root cause of the issue and adapt your infrastructure to improve the output. Let the “Fail fast – succeed faster” mantra become part of your company’s culture.
And remember, your users are not expecting perfection – they are expecting you to get it right ASAP.
Note: This is the personal view of the author and does not reflect the views of Capgemini or its affiliates. Check out the original post here.