HKU CS-Database Research Group

The University of Hong Kong
Department of Computer Science

home

db seminars

18 June 2003

Efficient Data Reduction Methods for Online Association Rule Discovery

Speaker: Ivy TONG

Abstract

Classical data mining algorithms require one or more computationally intensive passes over the entire database and can be prohibitively slow. One effective method for dealing with this ever-worsening scalability problem is to run the algorithms on a small sample of the data. In this presentation, I will introduce and compare two data-reduction algorithms for producing such a sample. These algorithms, called FAST and EA, are tailored to "count" data applications such as association rule mining. The algorithms are similar in that both attempt to produce a sample whose "distance" from the complete database is minimal. They differ greatly in the way that they greedily search through the exponential number of possible samples. FAST uses random sampling together with trimming of outlier transactions. EA repeated and deterministically halves the data to obtain the final sample. Experiments show that EA is more expensive to run than FAST, but yields more accurate results for a given sample size. The user can then trade off speed and accuracy by selecting appropriate method based on the specific problem under consideration. Lastly I will show how the EA algorithm can be adapted to provide data reduction schemes for streaming data systems.

Read the Presentation Slides...

Referred Papers

	Back to the top
	Comment? Send to dbgroup@cs.hku.hk