| Home > Publications > PhD theses > Informatics (CW) |
CW 2006_08
Remko Tronçon
Techniques for more efficient ILP data mining engines
Advisor(s): Bart Demoen and Gerda Janssens
Abstract
The goal of data mining is to find rules (or hypotheses) that describe non-trivial relations, patterns or properties of large quantities of data, thus helping in understanding the data better. Inductive Logic Programming (ILP) is a relational data mining technique based on first order logic. Logic provides a powerful yet natural formalism for representing knowledge, allowing ILP to learn concepts that cannot be learned using less powerful data mining techniques. However, because of its high expressivity, the space of all possible hypotheses is also very complex, due to which the search for good hypotheses becomes a complex task.
One of the most important factors in the execution of ILP algorithms is the engine underlying the algorithm. This engine is responsible for evaluating candidate hypotheses (or queries) on the data, and provides primitives to the ILP algorithm for guiding the evaluation of queries. In this work, we present different techniques for optimizing the engines used by ILP algorithms.
We combine two existing, independent, and successful optimization techniques for query evaluation: the once transformation, which aims to avoid redundant execution within a single query, and query packs, which avoid redundancy in the execution of multiple queries.
The general approach to query evaluation is to compile the query to a more efficient version instead of executing the query directly. We study alternatives to this approach, and propose a more performant compilation technique, together with a lazy variant that only compiles parts of queries as they are needed.
Analysis and debugging of query execution is an important part of the development of more efficient query execution techniques. We present a trace-based technique for debugging and analyzing the execution step of ILP algorithms.
We present a study of trading off extra memory for execution time on different levels of ILP execution. These techniques include predicate tabling and program specialization, together with more ILP algorithm-specific techniques.
Libridoc 389 / text.pdf (1.4M) / mailto: dtai team
