Call for participationArtificial intelligence (AI) simulates intelligent tasks performed by humans. One primordial task is the understanding and fusion of multiple inputs, for instance, when seeing the world around us, listening to speech and other audio sounds, and reading texts. Multimedia archives containing text, video, still images and audio (speech and other sounds) are quickly gaining importance (e.g., on the World Wide Web or privately held by broadcasting companies, publishing houses, libraries, musea, police and intelligence services, courtrooms, hospitals, etc.). Systems that access or mine this multimedia content should have the intellectual capability to relate the different information sources to each other, and to align and integrate the content. Indeed, more often than not the different modalities complement and disambiguate each other. Such technologies will form the basis for multimedia and cross-media information retrieval and mining.
This situation demands for solutions for cross-media processing and content recognition, where some fundamental problems are emerging topics of interest in the research community. First, content recognition in the visual, textual or audio medium is improved by exploiting cross-modal co-occurrences, especially when many instances can reinforce each other. Content recognized in one medium (e.g., text) can serve as weak annotation for content to be learned in another medium (e.g., the visual medium). This allows, for instance, to train a visual object recognition system for frequent objects without the need for manually labeled training data. Similarly, the visual medium can assist in the processing of textual sources. For instance, recognized visual actions might contribute to the ontological classification of certain verbs used in language. Gestures can complement speech as a visual reflection of the semantics of a discourse or conversation. And there are many other examples. Content recognition also entails content linking across media, where we deal with problems of cross-document coreferencing (or alignment) of, for instance, entities (e.g., persons, objects, locations), of actions performed by entities, of events, and of temporal and spatial forms of expression. Moreover, the initially identified alignments might bootstrap additional cross-modal "translations".
The purpose of this workshop is to bring together researchers from computer vision, sound processing, human language technology, computational linguistics, artificial intelligence, machine learning, reasoning, information retrieval, cognitive science and application communities. The workshop will bring fertilizing discussions and ideas about that will foster new interdisciplinary research avenues in artificial intelligence. It will encourage research into intelligent behavior and unify methodologies.
The workshop is open to all members of the AI community. The number of participants is however limited to 40. We welcome original papers and posters. They should show a clearly motivated interest or expertise in cross-media recognition, retrieval and mining, and discuss contributing and grounded ideas. We aim at obtaining a balanced selection of papers coming from the different disciplines involved, that clearly stress multimodal processing.
Topics of interest include, but are not limited to:
- Cross-media mining and categorization
- Cross-media search and question answering
- Cross-media summarization
- Cross-media linking of entities, attributes, objects, and actions
- Cross-media emotion detection and genre classification
- Alignment algorithms
- Image auto-annotation
- Image and video retrieval based on multi-modal cues
- Recognition of semantic roles and frames in text, images and video
- Recognition of narratives in text and video
- Spatial and temporal recognition and resolution
- Multimodal discourse analysis
before March 27, 2009, 17:00 UTC+1.
Papers should not be more than 6 double-column pages long. Please follow the IJCAI formatting instructions and use the supplied Word templates or Latex sources. Formatting guidelines can be found at the IJCAI 2009 website. The reviewing process will be double-blind with each submission receiving at least three reviews. Papers will be selected for oral or poster sessions. All accepted papers (oral and poster) will be published in the workshop proceedings. We are also arranging a journal special issue for post-workshop publication of selected papers.