Workshop Program
March 28, 2007
Venue: College De Valk, Tiensestraat 41, B- 3000 Leuven
| 10.30 - 11.15 | Registration, coffee |
| 11.15 - 11.30 | Welcome by Prof. Dr. Ir. Ludo Froyen, Dean of the Faculty of Engineering, K.U.Leuven |
| 11.30 - 12.30 | Keynote session 1 Beyond the Library - Organizing the World's Information Thomas Hofmann, Google Zurich, Switzerland |
| 12.30 - 13.30 | Coffee, sandwiches |
| 13.30 - 14.30 | Paper session 1 Theoretical Benchmarks of Evaluation Methodologies in XML Retrieval Tobias Blanke and Mounia Lalmas PF/Tijah: Text Search in an XML Database System Djoerd Hiemstra (invited speaker) |
| 14.30 - 15.30 | Keynote session 2 Challenges in Web Information Retrieval Nick Craswell, Microsoft Cambridge, UK |
| 15.30 - 16.30 | Coffee, poster session MuSeUM: Unified Access to the State of the Art Avi Arampatzis, Jaap Kamps, Marijn Koolen and Nir Nussbaum Towards Extracting XML Data Using A Template Schema Bassam Hammo, Munib Qutaishat and Majdi Bsaiso Text Mining: Dealing with Multiple Orthographies Anton Bryl The Cornetto Database: Architecture and User-Scenarios Piek Vossen, Katja Hofmann, Maarten de Rijke, Erik Tjong Kim Sang and Koen Deschacht |
| 16.30 - 17.30 | Keynote session 3 But How Will They Use It? Thinking About Evaluation of Integrated Multilingual Multimedia Information Systems Douglas W. Oard, University of Maryland, USA |
| 17.30 - 17.45 | Best student paper award sponsored by WGI |
| 19.30 - 23.00 | Social event: Dinner at the Faculty Club |
March 29, 2007
Venue: Kasteelpark Arenberg (close to the "Kantineplein"), B- 3001 Heverlee
| 8.30 - 8.55 | Registration, coffee |
| 8.55 - 9.00 | Welcome |
| 9.00 - 10.00 | Keynote session 4 Efficient Visual Search and Automatic Annotation of Videos Josef Sivic, University of Oxford, UK |
| 10.00 - 11.00 | Paper session 2 Applying Hash-based Indexing in Text-based Information Retrieval Benno Stein and Martin Potthast Development of a Mid-frequency Favoring Weighting Scheme for Document Clustering Joris D'hondt, Joris Vertommen, Dirk Cattrysse and Joost Duflou |
| 11.00 - 11.15 | Coffee |
| 11.15 - 12.45 | Paper Session 3 Content-based image retrieval via Multi-scale Graph-based Segmentation Iris Van Hamel, Ioanis Pratikakis, Cosmin Mihai and Hichem Sahli Memory Based Learning and the Interpretation of Numbers in Archaeological Reports Hans Paijmans and Sander Wubben Support for Decision Making: Electoral Search Valentin Jijkoun, Maarten Marx, Maarten de Rijke and Frank van Waveren |
| 12.45 - 13.45 | Coffee, sandwiches |
| 13.45 - 14.00 | Closing of the conference by the program chairs: Marie-Francine Moens, Katholieke Universiteit Leuven Tinne Tuytelaars, Katholieke Universiteit Leuven Arjen de Vries, Centrum voor Wiskunde en Informatica, Amsterdam |
Keynote Speakers
![]() |
Thomas Hofmann Director of Engineering, Google Zurich European Engineering Centre, Switzerland |
| Talk title: Beyond the Library - Organizing the World's Information Talk time: March 28 2007, 11:30 |
Talk abstract:
This talk will present challenges involved in building a Web search
engine and will touch on questions of system and algorithm design, in
particular as they involve large scale data processing and data
mining.
About the speaker:
Prof. Dr. Thomas Hofmann received a Ph.D. in Computer Science from the
University of Bonn in 1997. Since then he held postdocotral positions at
the Massachussets Institute of Technology as well as the University of
California at Berkeley and the International Computer Science Institute.
He then moved to Brown University, where he became an Assistant Professor
of Computer Science.
Getting back from the USA, he became the director of the Fraunhofer
Integrated Publication and Information Systems Institute IPSI in Darmstadt
(Germany) and had at the same time been appointed Professor of Intelligent
Systems in the Computer Science Department at the Darmstadt
University of Technology.
Currently, Thomas Hofmann is a Director of Engineering at
Google (Zurich, Switzerland).
His research and interests focuses on Information Retrieval and
Machine Learning, but also covers related areas like Data Mining,
Pattern Recognition, Computer Vision, Natural Language Learning,
Information Theory, and Computational Statistics.
![]() |
Nick Craswell Associate Researcher, Microsoft Research Cambridge, UK |
| Talk title: Challenges in Web Information Retrieval Talk time: March 28 2007, 14:30 |
Talk abstract:
When building a Web search engine, we can benefit from core IR techniques, such
as probabilistic ranking models and evaluation methods. But we also face
problems that are not yet so well-studied in the field of IR. This talk
explores several of these. For efficiency reasons, we need to crawl the web
selectively. This raises an interesting query-independent ranking problem. We
have large-scale logs of user behaviour. I will present a novel approach for
dealing with sparsity of this data. We may also have relevance judgments for a
large number of queries, as in the new TREC "million query" track, which allows
for large-scale parameter tuning experiments. Each of these problems lends
itself to data-driven solutions. The talk should thus give a flavour of the work
that goes on in the area of commercial Web IR.
About the speaker:
Dr. Nick Craswell is currently an Associate Researcher at
Microsoft Research Cambridge,
in the Information Retrieval and Analysis Group.
An up-to-date overview of Dr. Craswell's research interests, activities, and publications
can be found on his homepage.
![]() |
Douglas W. Oard Associate Professor, University of Maryland, USA |
| Talk title: But How Will They Use It? Thinking About Evaluation of Integrated Multilingual Multimedia Information Systems Talk time: March 28 2007, 16:30 |
Talk abstract:
Speech recognition and machine translation techniques are evolving
rapidly, creating new opportunities to build systems to support
information seeking in large collections of multilingual and
multimedia content. Little is presently known, however, about how
people would use such systems to accomplish real tasks. In such
circumstances, designers naturally rely on their own judgment to
decide how component capabilities should be optimized and how those
components should be integrated. Once that's been done, the next step
is to put the resulting system in the hands of users in order to learn
what they do with it.
In this talk, I will describe what we have
learned so far from such a process. I'll begin by describing Rosetta,
an integrated system that supports search and display of live and
archived news feeds in four languages by users who know only English.
I'll then introduce several user study designs that we have tried,
several focused on formative evaluation of the Rosetta system, and one
that was designed to support a summative comparison of four systems.
I'll illustrate each study design with some examples of what we have
learned to date from approach. The talk will conclude with a few
remarks on what we see as next steps in this continuing process. This
is joint work with research teams at IBM Research, the University of
Pittsburgh, Carnegie Mellon University and the University of Maryland.
About the speaker:
Douglas Oard is an Associate Professor at the University of Maryland,
College Park, with a joint appointment in the College of Information
Studies and the Institute for Advanced Computer Studies. He holds a
Ph.D. in Electrical Engineering from the University of Maryland, and
his research interests center around the use of emerging technologies
to support information seeking. His recent work has focused on
interactive techniques for cross-language information retrieval,
searching conversational media, and leveraging observable behavior to
improve user modeling. Additional information is available at
http://www.glue.umd.edu/~oard/.
![]() |
Josef Sivic Research Associate, University of Oxford, UK |
| Talk title: Efficient visual search and automatic annotation of videos Talk time: March 29 2007, 9:00 The presentation can be freely downloaded [ size: 11 MB, requires Acrobat Reader]
|
Talk abstract:
Despite the recent success of text based search (e.g. Google),
visual search in unannotated image and video collections remains a
challenging problem. The imaged appearance of a particular object
can change significantly due to changing camera viewpoint,
illumination, or partial occlusion by other objects. Furthermore,
there is additional variation, such as changing facial expression,
when searching for particular people.
In the first part of the talk, I will describe our work on visual
retrieval of particular objects and people in videos where the
query is specified by an image of the object/person. A live visual
search for objects and actors will be demonstrated on several
feature length movies. In the second part of the talk, I will show
how readily available text annotation such as movie shooting scripts
and subtitles, together with visual facial appearance, can be used
to automatically label occurrences of characters in TV or film
footage.
Joint work with Andrew Zisserman and Mark Everingham.
About the speaker:
Josef Sivic is a Research Fellow in the Department of Engineering
Science, University of Oxford, where he completed his PhD thesis
dealing with efficient visual search of images and videos. His
research interests are in visual search, object recognition and
scene understanding applied to large image and video collections
such as movie databases or photo sharing archives.
Invited Speakers
![]() |
Djoerd Hiemstra Department of Computer Science, Database Group, University of Twente, The Netherlands |
| Talk title: PF/Tijah - Text Search in an XML Database System Talk time: March 28 2007, 14:00 |
Talk abstract:
PF/Tijah (Pathfinder/Tijah, pronounce as "Pee Ef Teeja") is a flexible
open source text search system that is integrated with an XML/XQuery
database management system. PF/Tijah implements several built-in XQuery
functions and is therefore fully compliant with XQuery 1.0. It has a
number of unique selling points that distinguishes it from other
information retrieval systems:
- PF/Tijah supports retrieving arbitrary parts of the textual data, not just documents.
- PF/Tijah supports complex scoring and ranking of the retrieved results by means of structured queries that address both the document content and its structure.
- PF/Tijah supports ad hoc result presentation by means of its query language. For instance, when searching for a special issue of a journal, it is easy to print any information from that retrieval result on the screen in a declarative way (i.e., by means of its query language, not by means of a general purpose programming language).
- PF/Tijah supports text search combined with traditional database querying, including for instance joins on values. For instance, one could search for employees from the financial department that also worked for the sales department and that sent an email about "tax refunds"
About the speaker:
See Dr. Djoerd Hiemstra's home page.




size: 11 MB, requires 