HKU Research  The University of Hong Kong
Department of Computer Science
Feature
home
current research
people
publications
downloads
HKU CS

 

01 February 2002

Efficient Mining of Emerging Patterns and Emerging Substrings
Line
Sarah Chan

 

Abstract

Emerging patterns (EPs) are itemsets whose supports change significantly from one dataset to another; they were introduced by Dong and Li to capture multi-attribute contrasts between data classes, or trends over time. EPs are potentially useful for analysis, and have been used in building powerful classifiers. The efficient mining of EPs is a challenging problem, since naive algorithms are too costly. Efficient border-based algorithms have been proposed to discover and store EPs and their variants such as jumping emerging patterns (JEPs) and the most expressive jumping emerging patterns (MEJEPs). Experiments show that EP-based classifiers such as CAEP, the JEP-Classifier and DeEPs have consistent good predictive accuracy, and they almost always outperform C4.5 and CBA.

Emerging substrings (ESs) in sequence databases are analogous with EPs in itemset databases. However, due to the marked differences between the two, techniques for extracting EPs cannot be easily modified to extract ESs. For example, although the border approach can be used to mine jumping emerging substrings (JESs), it is not applicable to general ESs. This makes the efficient mining of ESs an even greater challenge. A brute-force method that makes use of merged suffix trees to store the information of all ESs discovered has been introduced to mine general ESs.

Read the Presentation Slides...

Referred Papers

Back to the top

Comment?  Send to dbgroup@cs.hku.hk