CW 476

Siegfried Nijssen and Elisa Fromont
Mining optimal decision trees from itemset lattices

Abstract

We investigate the problem of finding optimal decision trees under constraints. Because the discovery of such trees has high theoretical complexity, until now no efforts have been made to create an algorithm for use on real-world datasets, and it is unknown how well heuristic tree learners approximate the true optimum. We present an algorithm which in practice is capable of finding optimal decision trees in many datasets, and use this algorithm to evaluate the performance of heuristic decision tree learners. The key idea behind our algorithm is the relation between constraints on decision trees and constraints on itemsets. We propose to exploit lattices of itemsets, from which we can extract various types of optimal decision trees in a linear scan. We give several optimizations to efficiently build these lattices and show that the accuracies of optimal trees compete with C4.5 accuracies, which confirms the common assumption that heuristic decision tree learners approximate the optimum sufficiently well.

report.pdf (313K) / mailto: E. Fromont