Thursday, May 17, 2012
Towards Hybrid Human-Machine Vision Systems: Image Annotation using Crowds, Experts and Machines
Peter Welinder, graduate student, Compuation & Neural Systems
The amount of digital image and video data keeps increasing at an ever faster rate. While ``big data'' holds the promise of leading science to new discoveries, raw image data in itself is not of much use. In order to statistically analyze the data, it must be quantified and annotated. Today's automated methods are not adaptable or accurate enough to annotate much of the available data, and hiring experts is slow and expensive. Crowdsourcing, by which the work is distributed between thousands of non-expert annotators, provides a middle ground, but is still too expensive to scale to millions of images. Instead, we propose a solution of hybrid human-machine vision systems, where the work of both humans and machines is balanced to be as cost-effective and accurate as possible. With this goal in mind, we begin by characterizing different types of image annotations, including binary, multi-valued and continuous annotations. We present models for crowdsourcing annotations from expert and non-expert annotators (humans). By trading off the competence, bias and expertise of multiple annotators, we show that it is possible to achieve high quality annotations with very few labels. We show that the number of labels required can be further reduced by actively choosing the best annotators to carry out most of the work. Finally, we study the problem of estimating the performance of automated classifiers (machines) when little ground truth is available.