3 Feb 2004
Mining Frequent Itemsets over Data Streams
Speaker: Bill LIN
Abstract
Discovering frequent itemsets is one of the key operations in data mining
algorithms. In modern emerging applications, data take the form of
continuous data streams, rather than finite static datasets. With limited
memory resources, however, traditional data mining algorithms may not be
able to dynamically find all itemsets whose frequencies are not less than
a user-specified threshold over such streams. Also, short response times
and small memory footprints are the basic requirements for stream data
mining. As the first well-known solution, Lossy Counting can find frequent
itemsets and compute their counts in a single pass with a priori error
guarantees. However, the drawback of this approach is that sometimes the
number of generated intermediate itemsets with very low frequencies may
become extremely large. In this study, we are trying to give an improved
algorithm for finding frequent itemsets over data streams and achieve
better space efficiencies compared to the original Lossy Counting
algorithm. Essentially, our algorithm is able to keep all its data
structures into main memory and thus should response faster than Lossy
Counting algorithm.
|