Workshop Program

March 28, 2007

Venue: College De Valk, Tiensestraat 41, B- 3000 Leuven

10.30 - 11.15 Registration, coffee
11.15 - 11.30 Welcome by Prof. Dr. Ir. Ludo Froyen, Dean of the Faculty of Engineering, K.U.Leuven
11.30 - 12.30 Keynote session 1
  Beyond the Library - Organizing the World's Information
  Thomas Hofmann, Google Zurich, Switzerland
12.30 - 13.30 Coffee, sandwiches
13.30 - 14.30 Paper session 1
  Theoretical Benchmarks of Evaluation Methodologies in XML Retrieval
  Tobias Blanke and Mounia Lalmas
  PF/Tijah: Text Search in an XML Database System
  Djoerd Hiemstra (invited speaker)
14.30 - 15.30 Keynote session 2
  Challenges in Web Information Retrieval
  Nick Craswell, Microsoft Cambridge, UK
15.30 - 16.30 Coffee, poster session
  MuSeUM: Unified Access to the State of the Art
  Avi Arampatzis, Jaap Kamps, Marijn Koolen and Nir Nussbaum
  Towards Extracting XML Data Using A Template Schema
  Bassam Hammo, Munib Qutaishat and Majdi Bsaiso
  Text Mining: Dealing with Multiple Orthographies
  Anton Bryl
  The Cornetto Database: Architecture and User-Scenarios
  Piek Vossen, Katja Hofmann, Maarten de Rijke, Erik Tjong Kim Sang and Koen Deschacht
16.30 - 17.30 Keynote session 3
  But How Will They Use It? Thinking About Evaluation of Integrated Multilingual Multimedia Information Systems
  Douglas W. Oard, University of Maryland, USA
17.30 - 17.45 Best student paper award sponsored by WGI
19.30 - 23.00 Social event: Dinner at the Faculty Club

March 29, 2007

Venue: Kasteelpark Arenberg (close to the "Kantineplein"), B- 3001 Heverlee

8.30 - 8.55 Registration, coffee
8.55 - 9.00 Welcome
9.00 - 10.00 Keynote session 4
  Efficient Visual Search and Automatic Annotation of Videos
  Josef Sivic, University of Oxford, UK
10.00 - 11.00 Paper session 2
  Applying Hash-based Indexing in Text-based Information Retrieval
  Benno Stein and Martin Potthast
  Development of a Mid-frequency Favoring Weighting Scheme for Document Clustering
  Joris D'hondt, Joris Vertommen, Dirk Cattrysse and Joost Duflou
11.00 - 11.15 Coffee
11.15 - 12.45 Paper Session 3
  Content-based image retrieval via Multi-scale Graph-based Segmentation
  Iris Van Hamel, Ioanis Pratikakis, Cosmin Mihai and Hichem Sahli
  Memory Based Learning and the Interpretation of Numbers in Archaeological Reports
  Hans Paijmans and Sander Wubben
  Support for Decision Making: Electoral Search
  Valentin Jijkoun, Maarten Marx, Maarten de Rijke and Frank van Waveren
12.45 - 13.45 Coffee, sandwiches
13.45 - 14.00 Closing of the conference by the program chairs:
  Marie-Francine Moens, Katholieke Universiteit Leuven
  Tinne Tuytelaars, Katholieke Universiteit Leuven
  Arjen de Vries, Centrum voor Wiskunde en Informatica, Amsterdam




Keynote Speakers

Thomas Hofmann
Director of Engineering, Google Zurich European Engineering Centre, Switzerland
Talk title: Beyond the Library - Organizing the World's Information
Talk time: March 28 2007, 11:30

Talk abstract:
This talk will present challenges involved in building a Web search engine and will touch on questions of system and algorithm design, in particular as they involve large scale data processing and data mining.

About the speaker:
Prof. Dr. Thomas Hofmann received a Ph.D. in Computer Science from the University of Bonn in 1997. Since then he held postdocotral positions at the Massachussets Institute of Technology as well as the University of California at Berkeley and the International Computer Science Institute. He then moved to Brown University, where he became an Assistant Professor of Computer Science.
Getting back from the USA, he became the director of the Fraunhofer Integrated Publication and Information Systems Institute IPSI in Darmstadt (Germany) and had at the same time been appointed Professor of Intelligent Systems in the Computer Science Department at the Darmstadt University of Technology.
Currently, Thomas Hofmann is a Director of Engineering at
Google (Zurich, Switzerland).
His research and interests focuses on Information Retrieval and Machine Learning, but also covers related areas like Data Mining, Pattern Recognition, Computer Vision, Natural Language Learning, Information Theory, and Computational Statistics.



Nick Craswell
Associate Researcher, Microsoft Research Cambridge, UK
Talk title: Challenges in Web Information Retrieval
Talk time: March 28 2007, 14:30

Talk abstract:
When building a Web search engine, we can benefit from core IR techniques, such as probabilistic ranking models and evaluation methods. But we also face problems that are not yet so well-studied in the field of IR. This talk explores several of these. For efficiency reasons, we need to crawl the web selectively. This raises an interesting query-independent ranking problem. We have large-scale logs of user behaviour. I will present a novel approach for dealing with sparsity of this data. We may also have relevance judgments for a large number of queries, as in the new TREC "million query" track, which allows for large-scale parameter tuning experiments. Each of these problems lends itself to data-driven solutions. The talk should thus give a flavour of the work that goes on in the area of commercial Web IR.

About the speaker:
Dr. Nick Craswell is currently an Associate Researcher at
Microsoft Research Cambridge, in the Information Retrieval and Analysis Group. An up-to-date overview of Dr. Craswell's research interests, activities, and publications can be found on his homepage.



Douglas W. Oard
Associate Professor, University of Maryland, USA
Talk title: But How Will They Use It? Thinking About Evaluation of Integrated Multilingual Multimedia Information Systems
Talk time: March 28 2007, 16:30

Talk abstract:
Speech recognition and machine translation techniques are evolving rapidly, creating new opportunities to build systems to support information seeking in large collections of multilingual and multimedia content. Little is presently known, however, about how people would use such systems to accomplish real tasks. In such circumstances, designers naturally rely on their own judgment to decide how component capabilities should be optimized and how those components should be integrated. Once that's been done, the next step is to put the resulting system in the hands of users in order to learn what they do with it.
In this talk, I will describe what we have learned so far from such a process. I'll begin by describing Rosetta, an integrated system that supports search and display of live and archived news feeds in four languages by users who know only English. I'll then introduce several user study designs that we have tried, several focused on formative evaluation of the Rosetta system, and one that was designed to support a summative comparison of four systems. I'll illustrate each study design with some examples of what we have learned to date from approach. The talk will conclude with a few remarks on what we see as next steps in this continuing process. This is joint work with research teams at IBM Research, the University of Pittsburgh, Carnegie Mellon University and the University of Maryland.

About the speaker:
Douglas Oard is an Associate Professor at the University of Maryland, College Park, with a joint appointment in the College of Information Studies and the Institute for Advanced Computer Studies. He holds a Ph.D. in Electrical Engineering from the University of Maryland, and his research interests center around the use of emerging technologies to support information seeking. His recent work has focused on interactive techniques for cross-language information retrieval, searching conversational media, and leveraging observable behavior to improve user modeling. Additional information is available at
http://www.glue.umd.edu/~oard/.



Josef Sivic
Research Associate, University of Oxford, UK
Talk title: Efficient visual search and automatic annotation of videos
Talk time: March 29 2007, 9:00
The presentation can be freely downloaded [pdf size: 11 MB, requires Acrobat Reader]

Talk abstract:
Despite the recent success of text based search (e.g. Google), visual search in unannotated image and video collections remains a challenging problem. The imaged appearance of a particular object can change significantly due to changing camera viewpoint, illumination, or partial occlusion by other objects. Furthermore, there is additional variation, such as changing facial expression, when searching for particular people.
In the first part of the talk, I will describe our work on visual retrieval of particular objects and people in videos where the query is specified by an image of the object/person. A live visual search for objects and actors will be demonstrated on several feature length movies. In the second part of the talk, I will show how readily available text annotation such as movie shooting scripts and subtitles, together with visual facial appearance, can be used to automatically label occurrences of characters in TV or film footage.
Joint work with Andrew Zisserman and Mark Everingham.

About the speaker:
Josef Sivic is a Research Fellow in the
Department of Engineering Science, University of Oxford, where he completed his PhD thesis dealing with efficient visual search of images and videos. His research interests are in visual search, object recognition and scene understanding applied to large image and video collections such as movie databases or photo sharing archives.





Invited Speakers

Djoerd Hiemstra
Department of Computer Science, Database Group, University of Twente, The Netherlands
Talk title: PF/Tijah - Text Search in an XML Database System
Talk time: March 28 2007, 14:00

Talk abstract:
PF/Tijah (Pathfinder/Tijah, pronounce as "Pee Ef Teeja") is a flexible open source text search system that is integrated with an XML/XQuery database management system. PF/Tijah implements several built-in XQuery functions and is therefore fully compliant with XQuery 1.0. It has a number of unique selling points that distinguishes it from other information retrieval systems:

  • PF/Tijah supports retrieving arbitrary parts of the textual data, not just documents.
  • PF/Tijah supports complex scoring and ranking of the retrieved results by means of structured queries that address both the document content and its structure.
  • PF/Tijah supports ad hoc result presentation by means of its query language. For instance, when searching for a special issue of a journal, it is easy to print any information from that retrieval result on the screen in a declarative way (i.e., by means of its query language, not by means of a general purpose programming language).
  • PF/Tijah supports text search combined with traditional database querying, including for instance joins on values. For instance, one could search for employees from the financial department that also worked for the sales department and that sent an email about "tax refunds"
In the talk, Dr. Djoerd Hiemstra presents examples of PF/Tijah's use, explains some of the system internals, and discusses plans for future work. PF/Tijah is part of the open source release of MonetDB/XQuery. For more information, go to:
http://dbappl.cs.utwente.nl/pftijah

About the speaker:
See Dr. Djoerd Hiemstra's home page.




7th Dutch-Belgian Information Retrieval Workshop (DIR 2007)