HKU Research  The University of Hong Kong
Department of Computer Science
Feature
home
current research
people
publications
HKU CS

 

28 Nov 2003

Fast Vertical Mining Using Diffsets
Line
Speaker: Zhang Minghua

Abstract

A number of vertical mining algorithms have been proposed recently for association mining, which have shown to be very effective and usually outperform horizontal approaches. The main advantage of the vertical format is support for fast frquency counting via intersection operations on transaction ids (tids) and automatic pruning of irrelevant data. The main problem with these approaches is when intermediate results of vertical tid lists become too large for memory, thus affecting the algorithm scalability.

In this paper we present a novel vertical data representation called Diffset, that only keeps track of differences in the tids of a candidate pattern from its generating frequent patterns. We show that diffsets drastically cut down the size of memory required to store intermediate results. We show how diffsets, when incorporated into previous vertical mining methods, increase the performance significantly.

Back to the top

Comment?  Send to dbgroup@cs.hku.hk