HKU CS-Database Research Group

The University of Hong Kong
Department of Computer Science

home

db seminars

3 Feb 2004

Mining Frequent Itemsets over Data Streams

Speaker: Bill LIN

Abstract

Discovering frequent itemsets is one of the key operations in data mining algorithms. In modern emerging applications, data take the form of continuous data streams, rather than finite static datasets. With limited memory resources, however, traditional data mining algorithms may not be able to dynamically find all itemsets whose frequencies are not less than a user-specified threshold over such streams. Also, short response times and small memory footprints are the basic requirements for stream data mining. As the first well-known solution, Lossy Counting can find frequent itemsets and compute their counts in a single pass with a priori error guarantees. However, the drawback of this approach is that sometimes the number of generated intermediate itemsets with very low frequencies may become extremely large. In this study, we are trying to give an improved algorithm for finding frequent itemsets over data streams and achieve better space efficiencies compared to the original Lossy Counting algorithm. Essentially, our algorithm is able to keep all its data structures into main memory and thus should response faster than Lossy Counting algorithm.

	Back to the top
	Comment? Send to dbgroup@cs.hku.hk