|
Abstract
Data classification has been studied extensively for many years, but most of the work was
focused on accuracy rather than performance. As a result, the early-days classification
algorithms do not scale well for very large datasets. In this seminar, several approaches to
boosting the performance of large dataset classification will be discussed. The Interval
Classifier (IC) tries to reduce the size of decision trees, SLIQ minimizes the I/O cost and
eliminates redundant sorting and data scanning, SPRINT reduces the size of memory-resident
data and extends to parallel execution, GAC-RDB utilitizes existing state-of-the-art RDBMS
technology in efficient decision rule discovery.
Read the Presentation
Slides...
Referred Papers
|