|
Abstract
In a sequence database, an emerging substring (ES) of a class is a
substring which occurs more frequently in that class rather than other
classes. ESs are important to sequence classification as they can capture
significant contrast between data classes and provide insights for the
construction of sequence classifiers.
We propose a suffix tree-based framework for mining ESs, and study the
effectiveness of applying various pruning techniques in different stages
of our ES mining algorithm. We consider three basic techniques, which are
namely pruning with the support threshold, pruning with the growth rate
threshold, and pruning with the length threshold. The combined effects of
these pruning methods are also covered. Experiments show that if the
target class is of a small population with respect to the whole database,
which is the normal scenario in single-class ES mining, most of the
pruning techniques would achieve considerable performance gain.
Read the Presentation
Slides...
Referred Papers
|