Final Report Summary - VISREC (Visual Recognition)
Progress has been made on a number of fronts including: (i) learning visual models on-the-fly to retrieve semantic entities in large scale image and video collections starting from a text query - this has enabled visual retrieval of people (from faces), object categories (such as vehicles, animals) and object instances (such as particular buildings, particular paintings); (ii) automatic identification of flower species and sculptures; (iii) methods and models for detecting and localizing object categories in images - in particular reducing the level of supervision that is required when training such models; and (iv) deep learning methods for recognizing object categories, text, and human actions and inter-actions (such as hand-shakes) in images and videos.
The outcomes of this research will impact any applications where visual recognition is useful, and will enable new applications entirely: effortlessly searching and annotating home image and video collections on their visual content; searching and annotating large commercial image and video archives (e.g. YouTube); extending the class of images that can be used to access the web (in the manner of Google Goggles) and hence identify their visual content.