Seminars

Academic year 2009-2010

  • 2010-06-01, 12.45-14.00 ,Room 5.001 of the Dept. of Computer Science, Celestijnenlaan 200A.
    Prof. Jie Tang. Tsinghua University
    "Web-based Social Network Mining"

    Abstract: The ubiquitous use of Internet and computer enables people to participate and interact with each other in various web communities, such as forums, newsgroups, blogs, social websites, etc.  With the fast expanding of the Social Web, it is crucial for us to understand the social dynamics involved in social relations as well as to predict future social trends. In this talk, I am going to introduce our research work on social network mining. In particular, I will introduce an academic search system Arnetminer.org (http://www.arnetminer.org), which tries to model and analyze the academic social network. In this system, we automatically extract over 64,000 researcher profiles, more than 3 million papers, and about 17 millions citation relationships from the Web. Services such as expertise search, association search, course search, and topic analysis have been provided. The system has been in operation on the internet for more than three years. System logs show that users of the system cover more than 190 countries.

    Jie Tang is an associate professor at the Department of Computer Science and Technology, Tsinghua University. His main research interests include social network mining, text mining, statistical learning, and semantic web. He has published over 60 research papers in major international journals and conferences including: KDD, IJCAI, SIGMOD, ACL, ISWC, TKDD, TKDE, JWS and JoDS. He is the principal investigator of National High-tech R&D Program (863) Program, NSFC project, Chinese Young Faculty Research Funding, National 985 funding, and international collaborative projects with Minnesota University, IBM, Google, Nokia, Sogou, etc. He serves as Vice PC Chair of Web Intelligence 2010, co-chair of Workshop SWSM’08-09, LDMTA’09, FDM’09, and also serves as the PC member of more than 40 international conferences including KDD, ACL, WWW, SIGIR, Coling. He serves as the editor of Journal of Software, Semantic Web Journal, Journal of Advances in Information Technology, the guest editor of TKDD special issue on large-scale data mining and TIST special issue on Computational Models of Collective Intelligence in the Social Web. HP: http://keg.cs.tsinghua.edu.cn/persons/tj/


Academic year 2008-2009

  • 2009-05-12, 17:30- 18:30 ,Celestijnenlaan 200A 3001 Heverlee, Room 200A.00.225
    Prof. Dr. Stephen Robertson, Microsoft Research Cambridge and City University London.
    "Probability of relevance is a slippery concept: The origins of the unified model"

    Abstract: In 1960, Maron and Kuhns proposed a probabilistic model for information retrieval, discussing the probability that a document is relevant to a user information need. In 1976, Sparck Jones and I put forward another model, using some similar concepts (including probability of relevance) but in a different way. But the two notions of probability of relevance turned out to be quite distinct and in some sense incompatible. Nevertheless, Maron, Cooper and I attempted a synthesis in 1981; other people have approached the same problem in different ways over the intervening years. There are different ways of thinking about the problem, but one useful way relates to the event spaces in which the probabilities are defined. In this talk I will discuss the event space view and some of the approaches to the problem.

  • 2009-01-15, 12h00 ,Celestijnenlaan 200A, 05.155
    Theo Huibers. University of Twente Human Media Interaction Group / Thaesis
    "Information Retrieval for Children"

    Abstract: In a world where the internet and technology play such an important role as they do today, it is absolutely necessary that children can assess the meaning of gathered information and can get engaged in interaction with content in child-friendly ways. Hardly any currently available IR system aims to facilitate the creation of child-centric information access, based on the understanding of the behaviour and needs of children. To achieve this goal, new research is required. We need to create information services that are tailored for the unique information needs of children and their intuitive styles of interaction. In this talk, Prof. Huibers will address the major research questions in this field and he will also present some insight into current and future research on this topic. This talk will be given jointly with Hanna Jochmann.

  • 2008-12-12, 11h00 ,Celestijnenlaan 200A, 05.128
    Fabio Crestan. University of Lugano Faculty of Informatics
    "From Linking Text to Linking Crimes: Information Retrieval, but not as you know it"

    Abstract: Information retrieval techniques have been used for long time to identify links between textual items for the automatic construction of hypertexts and electronic books where sought information could be accessed by browsing. While research work in this area has been steadily decreasing in recent years, some of the techniques developed in that context are proving very valuable in a number of new application areas. In this talk I present an approach to automatic linking textual items based on a Language Model that is used to prioritise criminal suspects in a police investigation. The model can be easily extended to take account of additional linking data, such as geographical location of crimes or suspect social networks. This would enable to browse large networks of investigative information automatically constructed from police archives.

    Slides

  • 2008-12-05, 12h00 ,Celestijnenlaan 200A, 05.128
    Juan Carlos Gomez. K.U.Leuven Legal Informatics and Information Retrieval (LIIR)
    "Information Retrieval and the Virtual Observatory"

    Abstract: Astronomy has become an enormously data-rich science. The cumulative data volume, now measured in hundreds of Terabytes, is growing exponentially, with increases in data complexity and quality as well including numerical data, catalogs, images, video, simulations, etc. This great richness of information poses substantial technical challenges, ranging from data access and manipulation to sophisticated data mining and statistical analysis needed for their scientific exploration. Our current ability to fully exploit scientifically this data avalanche is limited by the existing tools and resources, and the problem is growing rapidly. The Virtual Observatory concept is the astronomy community's answer to these challenges. It represents an organized, coherent approach to the transition to a new, information-rich astronomy, for the 21 st century. In this talk I will address some of the major topics involving information retrieval applied inside the Virtual Observatory concept and the future research in this field.

  • 2008-11-27, 12h00 ,Celestijnenlaan 200A, 05.128
    Wim De Smet. K.U.Leuven Legal Informatics and Information Retrieval (LIIR)
    "Graphical Models for Event Detection and Multimodal Learning"

    Abstract: Graphical models have recently gained popularity in the field of statistical learning as a powerful tool to represent and learn complex generative processes. This allows us to view sets of data as samples from a distribution that is defined by several parameters, which can be dependent or independent, visible or hidden. Learning these parameters is the key goal of graphical models, and can become computationally impossible, depending on the complexity of the model. Several solutions have been proposed, employing different statistical strategies. Amongst popular graphical models we find for example Latent Dirichlet Allocation, or the more advanced Pachinko Allocation Model. Learning strategies include Gibbs sampling, or variational inference.
    In this seminar, I will give an introduction to graphical models, presenting popular examples and learning strategies. I will then focus on how they are applied in our current research in the field of multimedia processing. On one hand, we used different models in the field of textual event clustering, i.e. the comparison of identical stories in a set of news documents. On the other hand, we are researching new models to learn multimodal topic models, where we want to correlate words from text, and visual features from video.

    Slides

  • 2008-11-06, 12h00 ,Celestijnenlaan 200A, 05.152
    Erik Boiy. K.U.Leuven Legal Informatics and Information Retrieval (LIIR)
    "A Machine Learning Approach to Sentiment Analysis in Multilingual Web Texts"

    Abstract: Sentiment analysis, also called opinion mining, is a form of information extraction from text, which is of growing research and commercial interest. Machine learning experiments were performed with regard to sentiment analysis in English, Dutch and French blog, review and forum Web texts. Training was done from a set of example sentences that were manually annotated as positive, negative or neutral with regard to the opinion of people about consumption products. Several classification models, some of which were configured in a cascaded pipeline, were learned and evaluated. Several problems needed to be dealt with, such as the noisy character of the input texts, the attribution of the sentiment to a particular entity and the small size of the training set. As a result, active learning techniques were investigated for reducing the number of examples to be manually annotated. Opinion was identified as positive, negative and neutral with ca. 83% accuracy for English texts, ca. 70% and 68% for Dutch and French texts respectively, based on term unigram features augmented with linguistic features. Overall, these experiments provided insights into the portability of the learned models across domains and languages.

    Slides

  • 2008-10-16, 12h00 ,Celestijnenlaan 200A, 05.155
    Raquel Mochales Palau. K.U.Leuven Legal Informatics and Information Retrieval (LIIR)
    "Automatic Argumentation Detection"

    Abstract: Argumentation detection is an important task in different Natural Language Processing (NLP) tasks, such as legal summarization, meeting tracking, or in building intelligent systems that can understand and engage in an argument. The analysis of argumentation is a relatively new research area that deals with different NLP problems, such as co- reference and ambiguity, and with issues closely related to argumentation theory, such as argument formalization. In this work we study different arguments in written discourse, focusing on legal documentation. We start by differentiating between argumentative and non-argumentative sections in text, and then concentrate on the detection of full argumentative structures. We propose models for detecting argumentative structures which integrate knowledge on legal argumentation and argument structures with linguistic and discourse characteristics. We investigate different techniques, moving from rule based to machine learning techniques. To evaluate these models, we have manually annotated a training set of legal documents that is the first of its kind. In this talk, we present the basic theoretical background of this work, an inside view to the corpus annotation, and an initial experimental evaluation of our argument detection models.

    Slides

  • 2008-10-02, 12h00 ,Celestijnenlaan 200A, 05.155
    Leif Azzopardi. University of Glasgow Information Retrieval group
    "Accessibility in Transportation Planning / Retrievability in Information Retrieval"

    Abstract: In this talk, I will introduce the concept of accessibility from the field of transportation planning, and explain how it can be adopted within the context of Information Retrieval (IR). An analogy is drawn between the two fields, which motivates the development of document accessibility measures for IR systems. Considering the accessibility/retrievability of documents within a collection given an IR System provides a different perspective on the analysis and evaluation of such systems. In an example application of these new measures, we show how they can be used to inform the design and management of current and future IR systems.

    Slides


Academic year 2007-2008

  • 2008-05-22, 12h00 ,Celestijnenlaan 200A, 05.135
    Mirella Lapat. University of Edinburgh Institute for Communicating and Collaborative Systems (ICCS), School of Informatics
    "Automatic Image Annotation Using Auxiliary Text Information"

    Abstract: As the number of image collections is rapidly growing, so does the need to browse and search them. Recent years have witnessed significant progress in developing methods for image retrieval, many of which are query-based. Given a database of images, each annotated with keywords, the query is used to retrieve relevant pictures under the assumption that the annotations can essentially capture their semantics. One stumbling block to the widespread use of query-based image retrieval systems is obtaining the keywords for the images. Since manual annotation is expensive, time-consuming and practically infeasible for large databases, there has been great interest in automating the image annotation process. The availability of databases of images labeled with keywords is necessary for developing and evaluating image annotation models. In this work we exploit the vast resource of images available on the web. We create a database of pictures that are naturally embedded into news articles and propose to use their captions as a proxy for annotation keywords. Experimental results show that an image annotation model can be developed on this dataset alone without the overhead of manual annotation. We also demonstrate that the news articles associated with the pictures can be used to boost image annotation performance.

    Slides

  • 2008-05-13, ,
    Ronny Lempel. Yahoo! Research, Israel
    "Toward Task Completion Assistance in Web IR"

    Abstract: This talk will describe some of the challenges currently addressed by Web Information Retrieval systems, namely search engines. The first generation of search engines closely resembled classic IR systems in their goal to return relevant documents to a user's query. Those systems relied mostly on on-page contents. The second generation of search engines added the links between Web pages to the mix, and differentiated between several types of information needs. Today's engines tap huge amounts of user generated content, and focus on helping users to complete tasks. This involves both assisting users to break down their tasks into sub-tasks, as well as interpretation, aggregation and integration of diverse content that allows users to more readily digest complex information. One particular example of the above is multifaceted search, also known as guided navigation. Multifaceted search is a popular and intuitive interaction paradigm over metadata-rich content that allows users to digest and explore multidimensional data by combining free-text queries and navigational operations. The talk will explore some of the characteristics of multifaceted search, and the requirements these pose on the data structures and algorithms of search engines.

    Ronny Lempel is an invited speaker during the course Text Based Information Retrieval (http://www.kuleuven.be/onderwijs/aanbod/syllabi/H02C8AE.htm) and will speak about Web information retrieval.

    Slides

  • 2008-05-08, ,
    Bojana Dalbelo-Basic & Marko Tadic.
    "Morphological normalisation and collocation extraction"

    Abstract: Due to natural language morphology, words in a text appear in various morphological forms. For most information retrieval and text mining methods, this leads to a decrease in performance. Morphological variation may be reduced by performing morphological normalisation, which conflates the morphological variants of a word to a single representative form. A lexicon-based approach to normalisation allows for high normalisation precision, which, for the morphologically complex languages, may otherwise be difficult to achieve. We will present a process of acquiring an inflectional morphological lexicon from a raw corpus suitable for inflectional normalization and compare the results with Croatian morphological lexicon. Collocations are a linguistic phenomenon commonly defined as two or more words appearing together more often than by chance and whose meaning usually cannot be inferred from the meanings of its parts. As collocations have found many applications in the fields of natural language processing, information retrieval, and text mining, extracting them from large corpora has been the focus of many studies over the past few years. We will describe a method for extending collocation extraction measures. Morphological normalisation and collocation extraction measures, and their combination on the document indexing task, that will be presented here are results of the research carried out in the framework of joint Croatian-Flanders project: 'Computer Aided Document Indexing System for Accessing Legislation' (http://www.cs.kuleuven.be/~liir/projects.php?project=128).

    Slides

  • 2008-04-17, ,
    Christina Lioma.
    "Applications of parts of speech to information retrieval"

    Abstract: Efforts to use linguistics in information retrieval (IR) were initiated in the 1980s, and intensified in the 1990s, reporting performance benefits (see the overviews by Smeaton 1986 & 1999, Karlgren 1993, and Tait 2005). After that time, these efforts decreased: baseline system performance improved, and the cost associated with linguistic processing was not worth the small benefits over the already improved baselines (Tait, 2005). At present, most research on linguistics for IR tends to be geared towards domain-specific IR applications that seem to benefit more from linguistics, like question-answering (Tait and Oakes 2006). Although such applications are important, they should not limit the scope of research into linguistics for IR. In this work, we present an alternative use of linguistics, part of speech information in particular, to IR, and show that it benefits retrieval performance of general IR systems. Legal Notice: Documents/slides/audio files have been provided by the authors as a means to ensure timely dissemination of scholarly and technical work on a noncommercial basis. copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their work here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. This work may not be reposted without the explicit permission of the copyright holder.

    Slides

  • 2008-04-03, ,
    Robert Villa. University of Glasgow, Information Retrieval Group
    "A study of awareness in video search"

    Abstract: While most video retrieval work assumes a user is engaged on a search task alone, recent work such as that by Smeaton et. al. (2006), Morris et. al. (2007) and Adcock et. al. (2007) suggests that there may be benefits in user's collaborating together to satisfy search tasks. One important aspect of collaboration is an individuals awareness of another's activity, which enables an "understanding of the activities of others" (Dourish et. al. 1992). In this work we investigate the role of awareness and its effect on search behavior in multimedia retrieval, focusing on the scenario where two users are searching at the same time on the same task, and via the interface, can see the activity of the other user. The main research question asks: does awareness of another searcher aid a user when carrying out a multimedia search session? A user experiment was carried out using a competitive game scenario in order to to study the effect of awareness under conditions conductive to it's use.

    Slides

  • 2008-03-27, ,
    Ilija Subašić.
    "Topical structure discovery in folksonomies"

    Abstract: In recent years social bookmarking systems (tagging systems) became part of the most popular applications on the Internet. The main idea of social bookmarking is to organize content in a loose fashion by allowing users to completely freely annotate content. Although tags are widely used, search engines based on them still need a lot of improvement.This work presents a way of combining the IR, semantic web and social web approaches of searching the Web by including general topic categories as a part of tagging system. In this way semantic and social web are presented in a unified framework of search and indexing content. The work also shows a way of ontology learning by creating a hierarchical network of tag associations. This network is created using association rules discovery. In order to enhance these networks IR search engine results are used to evaluate relevance of resources to a given topic. Results of this evaluation are used to modify Apriori algorithm for association rule discovery. Networks of association created by application of modified Apriori algorithm are evaluated with topic networks of Open Directory Project (www.dmoz.org).

    Slides

  • 2008-03-13, ,
    Koen Deschacht.
    "Automatic semantic frame detection using semi-supervised methods"

    Abstract: We have created novel semi-supervised methods for the detection of semantic frames and recognition of corresponding semantic roles in English sentences. We concentrate on semantic frames that describe typical actions of characters in a video transcript, with the defining characteristic that the actions, characters and other circumstances have to be visible in the described video. We have manually annotated a training set of video transcripts. Our models use approximate inference techniques that integrate probabilistic topic models with information of the syntactic structure of the sentence. Because of the low performance of these models when learning in a completely unsupervised way, we turned to semi-supervised techniques. We implemented two different Markov Chain Monte Carlo sampling methods, i.e., a Gibbs and a Metropolis-Hastings sampler which both sample from unlabeled data. We present the theoretical background, characteristics and use of our methods in the concrete setting of the analysis of video transcripts of the television soap "Buffy, the vampire slayer". We conclude with discussing the performance of the unsupervised and semi-supervised models compared with a completely supervised method trained with a maximum entropy classifier, and discuss the value and limitations of the models.

    Slides

  • 2008-02-28, ,
    Gerhard Paass. Gerhard Paass is coordinator of the EU-FP6 AntiPhish project (http://www.cs.kuleuven.be/~liir/projects.php?project=102).
    "Semantic annotation for a multimedia corpus"

    Abstract: Fraunhofer IAIS participates in a long-term research program called Theseus sponsored by the German Government. The focus of the research program is on semantic technologies, which determine contents (words, images, and sounds) not through conventional methods (e.g., combinations of letters) but which are able to recognize and place the meaning of a content in its proper context. This talk will give an overview on our machine learning approaches to semantic indexing, e.g. named entity recognition and topic extraction. As a new development we present results on the discrimination of WordNet supersenses by Markov models.

    Slides

  • 2008-01-10, 12h00 ,200A 00.225
    Tom Boyle.
    "Learning design and learning objects"

    Abstract:

    Duval et al (2003) have stated that the purpose of learning objects is to increase the effectiveness of learning. The question is how can this be done? The first approach relies on international standards and specifications for metadata and software packaging. This supports the development of searchable repositories from which learning resources can be retrieved, downloaded and reused. In this approach improvements in learning will come about through widespread reuse of learning resources. In this talk I will argue that this is not sufficient to bring about substantial improvements in learning. An alternative approach is to start with the educational or learning problem. Learning resources are designed, and iteratively developed, using an Agile approach, to solve the educational problem. In this approach design for pedagogical effectiveness is a central concern. This is matched by a concern to develop small, granular resources that are structured to be reusable, applying principles such as cohesion and decoupling. This approach is exemplified, for example, in the EASA award-winning learning objects for programming that we developed.

    However, this approach is very intensive. It may be educationally effective in the small but can it scale up to provide widespread impact? Our work has moved to an emphasis on generative learning objects (GLOs). With GLOs, it is the pedagogical design that provides the basis for reuse rather than content. We have extracted the pedagogical designs from successful learning objects and embedded these design patterns in an authoring tool. The authoring tool can be used to create many different learning objects based on the same, successful pedagogical pattern. This approach puts learning design at the centre of successful learning objects. The GLO authoring tool will be demonstrated in the talk. The objects created by the GLO tool can be packaged, with metadata added in the standard ways.

    The talk will culminate by returning to the central question - how do we improve the effectiveness of learning? It will place the GLO approach within the wider framework of learning design, including the IMS LD specification, and recent work carried out in the JISC “Design for Learning” programme in the UK. From this emerges a perspective on the central relationship between learning objects and learning design.

    Slides



Update