28 Nov 2003
Fast Vertical Mining Using Diffsets
Speaker: Zhang Minghua
Abstract
A number of vertical mining algorithms have been proposed recently for
association mining, which have shown to be very effective and usually
outperform horizontal approaches. The main advantage of the vertical
format is support for fast frquency counting via intersection operations
on transaction ids (tids) and automatic pruning of irrelevant data. The
main problem with these approaches is when intermediate results of
vertical tid lists become too large for memory, thus affecting the
algorithm scalability.
In this paper we present a novel vertical data representation called
Diffset, that only keeps track of differences in the tids of a candidate
pattern from its generating frequent patterns. We show that diffsets
drastically cut down the size of memory required to store intermediate
results. We show how diffsets, when incorporated into previous vertical
mining methods, increase the performance significantly.
|