6 Apr 2004
Mining Periodic Patterns with Gap Requirement from Sequences
Speaker: ZHANG Minghua
Abstract
We study a problem of mining frequently occuring
periodic patterns with a gap requirement from sequences.
Given a character sequence $S = s_1 s_2 ... s_L$ of length $L$
and a pattern $P = p_1 p_2 ... p_l$ of length $l$,
we consider $P$ a frequently occurring pattern
in $S$ if the probability of observing $P$ given a randomly picked
length-$l$ subsequence of $S$ exceeds a certain threshold.
In many applications, particularly those related to bioinformatics,
interesting patterns are periodic with a gap requirement.
That is to say, the characters in $P$ should match subsequences
of $S$ of the form $s_i s_{i+g^1} ... s_{i+g^1+g^2 +
... + g^{l-1}}$, where all the $g^j$'s fall within a narrow range.
The matching characters in $S$ are thus separated by gaps of more or less
the same size. We show the complexity of the mining problem and
discuss why traditional mining algorithms are computationally
infeasible. We propose practical algorithms for solving the problem and evaluate
their effectiveness.
Read the Presentation
Slides...
Referred Papers
|