Cognitive-Level Annotation using Latent Statistical Structure (EU ICT FP6)


This project is a collaboration of six leading European research teams in visual recognition, text understanding and machine learning. It aims at developing advanced machine learning algorithms that automatically discover people, objects and other scene elements that are present in images, video and associated text, and use them to structure and interpret scenes. Discovery occurs at three levels of abstraction: new individuals (specific people, objects, scenes and actions), new object classes and attributes, and new hierarchical and other relations between entities. Important foci of the HMDB-LIIR research are the development of machine learning algoithms for analyzing texts that require a minimum of supervision, and of algorithms for the multimodal processing of text and images, among which are probabilistic topic models.


The partners in the CLASS project are: K.U.Leuven (ESAT-Visics, Prof. Luc Van Gool and Prof. Tinne Tuytelaars), LEAR, France (Dr. Bill Triggs), INRIA- Grenoble, France (Dr. Cordelia Smid), University of Oxford, UK (Prof. Andrew Zisserman), University of Helsinki, Finland (Prof. Wray Buntine and Prof. Petri Myllymaki), and Max-Planck Institute for Biological Cybernetics, Germany (Prof. Bernard Schölkopf) .


Several techniques for word sense disambiguation and for the detection of visual entities and their visual atrtibutes in text (e.g., relying on association techniques from data mining, on metrics for semantic similarity applied on WordNet) were studied. Moreover, we have designed, implemented and tested different probabilistic models for the alignment of names and faces in news texts and their accompanying images. We have investigated discriminative and generative models for recognizing the semantic frames and roles in English sentences with special attention to semi-supervised models. This research gave rise to the "Latent Words Language Model". We built a proof-of-concept demonstrator that interrogates images with persons pictured and a demonstrator that automatically commentates broadcasted soaps. We have also designed, implemented and evaluated a tool for multimodal segmentation of video news. A multimodal news summarizer was developed.

CLASS has been identified by the European Commission as an "excellent project".

Period From 2006-01-01 to 2009-06-30.
Financed by EU Sixth Framework Programme ICT, EU FP6-027978
Supervised by Marie-Francine Moens
Staff Wim De Smet
Koen Deschacht
Phi The Pham
Gert-Jan Poulisse
Contact Koen Deschacht

