Data & Analytics Capgemini: Capping IT Off

Historical Data can make a big, hairy mess in Machine Learning

Author

Xavier Chelladurai

August 10, 2017

Machine learning is a type of artificial intelligence (AI) that provides computers with the ability to learn without being explicitly programmed. Machine learning is the development of computer programs from historical data called training data. Let us consider the most successful examples in Machine Learning first.

We have access to a lot of texts from official governments, embassies, NGOs, and United Nations Organization that publish texts in English and hundreds of other languages. Machine translation tools such as Google Translate use these translated library of documents as their training data. For example, the English – German translation training happens using English and German versions of various documents. When the user wants to translate a specific text from English to German, it happens successfully by choosing the right pair of sentences.

IBM and Memorial Sloan Kettering are training Watson in Oncology using the massive amount of patient medical records across the world. For each of the patients the historical data has the detailed records of symptom parameter values and the diagnosis and treatments given. Watson learns from the historical data of <<strong>Symptoms, Diagnosis> and .

The following instance shows that learning through the historical data has the potential risk of misguiding us resulting in dangerous consequences.

Every morning a priest used to go with fruits, flowers and a jug of milk and open the temple for Morning Prayer. He used to keep all these on the steps of a pool inside the temple, take a bath in the pool and then start the prayer. In the temple, there were rats and they started to disturb the fruits and others. After struggling with the mischief of the rat for a few months, one of the devotees of the temple brought a cat to protect the offerings from the rat. The presence of the cat controlled the rats, but after a few days the cat started the mischief; tried to drink the milk and created a mess.

In order to manage the issue, the following process was agreed upon.

  1. Priest enters the temple with fruits, flowers and milk
  2. Priest ties the cat to a pillar using a rope
  3. Priest takes a bath and then conducts the prayer
  4. At the end of the prayer, the cat is untied

Years rolled by, several priests changed, the popularity of the Morning Prayer increased but the above process was strictly followed as a mandatory practice. So, tying the cat to a pillar became a traditional custom before the priest can take a bath and pray.

One day after several years, the cat died. Now the temple management committee and the priest were so sad and said, we cannot do the prayers without having a new temple cat.

The above illustrates that learning and decision making purely based on historical data patterns is not always successful.  It has the threat of making grave mistakes. So, the Data scientists have to consider the context and the detailed analysis to understand the cause and effect of the data to make the machine learning complete and successful. I shall elaborate it in my next BLOG.

 

This article was written by Xavier Chelladurai from Capgemini: Capping IT Off and was legally licensed through the NewsCred publisher network. Please direct all licensing questions to legal@newscred.com.

Great ! Thanks for your subscription !

You will soon receive the first Content Loop Newsletter