|
Abstract
Emerging patterns (EPs) are itemsets whose supports change significantly from one dataset to another; they were introduced by
Dong and Li to capture multi-attribute contrasts between data classes, or trends over time. EPs are potentially useful for
analysis, and have been used in building powerful classifiers. The efficient mining of EPs is a challenging problem, since
naive algorithms are too costly. Efficient border-based algorithms have been proposed to discover and store EPs and their
variants such as jumping emerging patterns (JEPs) and the most expressive jumping emerging patterns (MEJEPs). Experiments
show that EP-based classifiers such as CAEP, the JEP-Classifier and DeEPs have consistent good predictive accuracy, and they
almost always outperform C4.5 and CBA.
Emerging substrings (ESs) in sequence databases are analogous with EPs in itemset databases. However, due to the marked
differences between the two, techniques for extracting EPs cannot be easily modified to extract ESs. For example, although
the border approach can be used to mine jumping emerging substrings (JESs), it is not applicable to general ESs. This makes
the efficient mining of ESs an even greater challenge. A brute-force method that makes use of merged suffix trees to store
the information of all ESs discovered has been introduced to mine general ESs.
Read the Presentation
Slides...
Referred Papers
|