18 June 2003
Efficient Data Reduction Methods for Online Association Rule Discovery
Speaker: Ivy TONG
Abstract
Classical data mining algorithms require one or more computationally
intensive passes over the entire database and can be prohibitively slow.
One effective method for dealing with this ever-worsening scalability
problem is to run the algorithms on a small sample of the data. In this
presentation, I will introduce and compare two data-reduction algorithms
for producing such a sample. These algorithms, called FAST and EA, are
tailored to "count" data applications such as association rule mining.
The algorithms are similar in that both attempt to produce a sample whose
"distance" from the complete database is minimal. They differ greatly in
the way that they greedily search through the exponential number of
possible samples. FAST uses random sampling together with trimming of
outlier transactions. EA repeated and deterministically halves the data to
obtain the final sample. Experiments show that EA is more expensive to run
than FAST, but yields more accurate results for a given sample size. The
user can then trade off speed and accuracy by selecting appropriate method
based on the specific problem under consideration. Lastly I will show how
the EA algorithm can be adapted to provide data reduction schemes for
streaming data systems.
Read the Presentation
Slides...
Referred Papers
|