HKU CS-Database Research Group

The University of Hong Kong
Department of Computer Science and Information System

home

db seminars

08 February 2002

Several approaches to improving the performance of classifying very large datasets

Kevin Yip

Abstract

Data classification has been studied extensively for many years, but most of the work was focused on accuracy rather than performance. As a result, the early-days classification algorithms do not scale well for very large datasets. In this seminar, several approaches to boosting the performance of large dataset classification will be discussed. The Interval Classifier (IC) tries to reduce the size of decision trees, SLIQ minimizes the I/O cost and eliminates redundant sorting and data scanning, SPRINT reduces the size of memory-resident data and extends to parallel execution, GAC-RDB utilitizes existing state-of-the-art RDBMS technology in efficient decision rule discovery.

Read the Presentation Slides...

Referred Papers

	Back to the top
	Comment? Send to dbgroup@cs.hku.hk