Diffbot, an API which relies on visual learning to parse web content, is something interesting that we talked about last August. Well, it seems that this particular project has garnered enough interest in order to raise up to $2 million in seed funding from investors who want to see this visual learning robot come to fruition, as it is capable of extracting as well as analyzing web content in the manner that humans do. Mike Tung, a graduate of Stanford University’s Artificial Intelligence program, has a vision where all content on the Internet is capable of being categorized into 18 or so different “page types” (such as a home page, social networking profile, or review among others), and these can then be visually analyzed using layout and contextual cues.
The Diffbot is said to then function like humans, looking at a webpage and being able to identify the important objects on the page instantly while glossing over the less important components such as advertising banners. Sounds good in theory, but it is definitely not easy to implement in the long run.