HKU Research  The University of Hong Kong
Department of Computer Science
Feature
home
current research
people
publications
HKU CS

 

18 June 2003

Efficient Data Reduction Methods for Online Association Rule Discovery
Line
Speaker: Ivy TONG

Abstract

Classical data mining algorithms require one or more computationally intensive passes over the entire database and can be prohibitively slow. One effective method for dealing with this ever-worsening scalability problem is to run the algorithms on a small sample of the data. In this presentation, I will introduce and compare two data-reduction algorithms for producing such a sample. These algorithms, called FAST and EA, are tailored to "count" data applications such as association rule mining. The algorithms are similar in that both attempt to produce a sample whose "distance" from the complete database is minimal. They differ greatly in the way that they greedily search through the exponential number of possible samples. FAST uses random sampling together with trimming of outlier transactions. EA repeated and deterministically halves the data to obtain the final sample. Experiments show that EA is more expensive to run than FAST, but yields more accurate results for a given sample size. The user can then trade off speed and accuracy by selecting appropriate method based on the specific problem under consideration. Lastly I will show how the EA algorithm can be adapted to provide data reduction schemes for streaming data systems.

Read the Presentation Slides...

Referred Papers

Back to the top

Comment?  Send to dbgroup@cs.hku.hk