Machine Learning Goes Open Source

Author

Matt Asay

August 8, 2014

The machines are taking over. Or they will, if we keep teaching machines to think for themselves. And we can’t seem to stop.

Two years back GigaOm’s Derrick Harris opined that “it’s difficult to imagine a new tech company launching that doesn’t at least consider using machine learning models to make its product or service more intelligent.” And that’s true. But engineers at Google, Twitter and new startups have largely been forced to roll their own machine learning libraries and systems. 

PredictionIO founder Simon Chan

What’s been missing are open-source projects that provide essential building blocks for easily embedding machine learning into applications. The Apache Software Foundation has sought to change this with Apache Mahout, and now PredictionIO just raised $2.5 million in an effort to take open-source machine learning even further.

See also: Three New Tools Bring Machine Learning Insights To The Masses

I sat down with PredictionIO founder Simon Chan to better understand the market and why open source matters in the complex world of machine learning.

Making Machine Learning Simple

ReadWrite: You call yourself the “MySQL of prediction.” What does that mean?

Simon Chan: Before the birth of MySQL, database management systems (think Oracle, DB2, etc.) were largely inaccessible to many developers and companies. Such systems are complex, expensive and proprietary. MySQL has rewritten the history of the relational database industry. It allows every website and application, regardless of the size, to be powered by a database server.

See also: How The Internet Of Things Will Think

The current world of machine learning is similar to the old days of the database industry. Machine learning is still inaccessible to most companies and developers. The cost of development and maintenance of machine learning infrastructure is extremely high. Companies like Google, LinkedIn and Twitter spend huge amounts of money to recruit data scientists.

PredictionIO, as MySQL did to the database industry, can be the machine learning server behind every application. It is 100% open source, developer-friendly and production-ready.

RW: Machine learning sounds great, but historically hasn’t worked as advertised, or it’s required extensive engineering resources to pull off. What does PredictionIO do differently?

SC: We believe that every prediction problem is unique; therefore, most black box machine learning solutions don’t work as planned. PredictionIO makes the life of developers easier by handling a lot of heavy lifting, such as algorithm evaluation and distributed deployment. 

It also comes with a number of built-in predictive engines for developers to use right away. But more importantly, PredictionIO is a customizable open-source product. This means that developers can optimize and improve the predictive engines whenever they need to.

Open-Sourcing The Machines

RW: You’re open source. How does this help? 

SC: We don’t believe in “black box” approaches to machine learning, as I noted. Open source allows developers and data scientists to contribute to the PredictionIO ecosystem.

PredictionIO is showcased by Github as one of the most popular open source machine learning projects in the world—thousands of developers are engaged in making it better. Currently contributions include SDKs (e.g., for iOS, .NET, Node.js) and plugins (e.g., for Magento and Drupal), but we’re also seeing new engines and algorithms run on top of our infrastructure.

RW: How many companies truly need machine learning in their apps? What are some examples of how companies incorporate machine learning today?

SC: As far as we know, hundreds of applications are powered by PredictionIO now. And it’s just the beginning.

Le Tote, which sends personalized clothing to its subscribers, is using PredictionIO to discover customers’ fashion preferences. PerkHub manages enterprises’ employee perks programs and is using PredictionIO to personalize product recommendations in their weekly emails.

We’re also working on some exciting projects yet to be announced in domains such as mobile health and gaming with applications that include churn analysis and trend detection.

As Easy As MySQL

RW: How hard is this? Can average developers really make use of this or do they need to be a PhD?

SC: If you can use MySQL, you can use PredictionIO.

RW: How do you plan to use your new funding?

There are a lot of product features we want to develop. We are hiring. Developers and machine learning engineers who are passionate about building the industry-changing machine learning server should contact us.

Lead image by Flickr user pinguino k; other images courtesy of PredictionIO

Great ! Thanks for your subscription !

You will soon receive the first Content Loop Newsletter