YouTube’s next challenge: parsing video content for search results

Author

Jordan Novet

February 23, 2017

For Google, recognizing objects in photos is no longer a challenge. See Google Photos for proof. The next challenge is video. There’s more data to deal with, and videos are simply harder to summarize than images.

Not that Google is alone here — Facebook, Snap, and Twitter have been working to analyze video content.

But Google’s YouTube has long been called the world’s second largest search engine — second only to Google Search. While text can help Google return YouTube search results, the raw content of the video itself is still largely not taken into consideration.

A few months ago Google gave a big gift to the research community: the YouTube 8M data set. Perhaps not coincidentally, today Google updated that data set. The significance here is that image recognition research has been propelled by the availability of open data, specifically Stanford’s ImageNet and Microsoft’s COCO. Artificial intelligence (AI) systems require data in order to become smarter, and these organizations have stepped up to provide that raw material.

Google doesn’t just want to advance the state of the art for the benefit of all, though. It also wants to improve its products — in the same way that it brought Smart Replies to Gmail and instant visual translations to Google Translate. Surely Google wants YouTube to be the best damn place to find a video that relates to your query.

“If it could [recognize] a video of a cow jumping over a moon, or a cat jumping over a fence, that would be really cool,” Google senior fellow Jeff Dean said today in a meeting with reporters at Google’s inaugural TensorFlow Dev Summit at company headquarters.

That would mean Google would no longer need to rely on metadata like descriptions and comments for searches, Dean said. The underlying technology could make for better video recommendations as well.

It’s not clear when YouTube might release the enhanced search capability.

Generally speaking, though, “video is maybe a few years behind where we are with images,” Dean said.

This article was written by Jordan Novet from VentureBeat and was legally licensed through the NewsCred publisher network.

Great ! Thanks for your subscription !

You will soon receive the first Content Loop Newsletter