HKU Research  The University of Hong Kong
Department of Computer Science
Feature
home
current research
people
publications
downloads
HKU CS

 

30 Aug 2002

Applying Pruning Techniques to Single-Class Emerging Substring Mining
Line
Speaker: Sarah CHAN

 

Abstract

In a sequence database, an emerging substring (ES) of a class is a substring which occurs more frequently in that class rather than other classes. ESs are important to sequence classification as they can capture significant contrast between data classes and provide insights for the construction of sequence classifiers.

We propose a suffix tree-based framework for mining ESs, and study the effectiveness of applying various pruning techniques in different stages of our ES mining algorithm. We consider three basic techniques, which are namely pruning with the support threshold, pruning with the growth rate threshold, and pruning with the length threshold. The combined effects of these pruning methods are also covered. Experiments show that if the target class is of a small population with respect to the whole database, which is the normal scenario in single-class ES mining, most of the pruning techniques would achieve considerable performance gain.

Read the Presentation Slides...

Referred Papers

Back to the top

Comment?  Send to dbgroup@cs.hku.hk