HKU Research  The University of Hong Kong
Department of Computer Science and Information System
Feature
home
current research
people
publications
downloads
HKU CS

 

08 February 2002

Several approaches to improving the performance of classifying very large datasets
Line
Kevin Yip

 

Abstract

Data classification has been studied extensively for many years, but most of the work was focused on accuracy rather than performance. As a result, the early-days classification algorithms do not scale well for very large datasets. In this seminar, several approaches to boosting the performance of large dataset classification will be discussed. The Interval Classifier (IC) tries to reduce the size of decision trees, SLIQ minimizes the I/O cost and eliminates redundant sorting and data scanning, SPRINT reduces the size of memory-resident data and extends to parallel execution, GAC-RDB utilitizes existing state-of-the-art RDBMS technology in efficient decision rule discovery.

Read the Presentation Slides...

Referred Papers

Back to the top

Comment?  Send to dbgroup@cs.hku.hk