Computers are now almost as good at visual object recognition as people are.
The proof is in the ImageNet Large-Scale Visual Recognition Challenge, an annual contest for scientists attempting to developing a robotic equivalent to human vision. Contestants must develop computer algorithms that can identify objects that exist in certain images.
To date, eyeless computers have had a hard time parsing millions of images and extracting, say, the 15 or so that feature zebras. But since 2012, computers in the challenge have been gaining on people, and are likely to surpass us in this department in a couple of years.
What changed in 2012? A team from the University of Toronto in Canada entered an algorithm called SuperVision, which used a deep convolutional neural network to divide the million plus images of the challenge into 1,000 separate classes.
Invented in the early 1980s, deep convolutional neural networks consist of multiple layers of artificial neurons arranged in a way that reflects the way the human brain processes vision. As Moore’s Law made computing technology more powerful, these networks grew more capable of imitating the way actual neural networks operate.
Today, deep convolutional neural networks are even more impressive. This year’s ImageNet winner was GoogLeNet, an algorithm invented by a team of Google engineers that had only a 6.65% error rate, close to the human error rate for the same task.
As the Visual Web becomes a greater and greater part of the Internet, we are seeing an increased usage of visual learning. Pinterest acquired VisualGraph, a machine vision company that can isolate types of handbags and clothing out of images. Imgur already uses a less precise form of machine vision, which identifies the Impact font at the top and bottom of memes that designates the photo as an image macro.
For Google engineer Christian Szegedy, who worked on the GoogLeNet project, the technological applications for machine vision may far surpass the number of ways we use our human eyes. He wrote:
These technological advances will enable even better image understanding on our side and the progress is directly transferable to Google products such as photo search, image search, YouTube, self-driving cars, and any place where it is useful to understand what is in an image as well as where things are.
Photo courtesy of the Google Research Blog