19 Dec 2003
Inverted Matrix: Efficient Discovery of Frequent Items
in Large Datasets in the Context of Interactive Mining
Speaker: Ivy TONG
Abstract
Existing association rule mining algorithms suffer from
many problems when mining massive transactional datasets. One
major problem is the high memory dependency: either the gigantic
data structure built is assumed to fit in main memory, or the
recursive mining process is too voracious in memory resources.
Another major impediment is the repetitive and interactive
nature of any knowledge discovery process. To tune parameters,
many runs of the same algorithms are necessary leading to the
building of these huge data structures time and again.
The paper proposes a new disk-based association rule mining
algorithm called Inverted Matrix, which achieves its efficiency
by applying three new ideas. First, transactional data is
converted into a new database layout called Inverted Matrix
that prevents multiple scanning of the database during the
mining phase, in which finding frequent patterns could be
achieved in less than a full scan with random access. Second,
for each frequent item, a relatively small independent tree is
built summarizing co-occurrences. Finally, a simple and non-
recursive mining process reduces the memory requirements as
minimum candidacy generation and counting is needed.
Experimental studies reveal that the Inverted Matrix approach
outperform FP-Tree especially in mining very large transactional
databases with a very large number of unique items. The random
access disk-based approach is particularly advantageous
in a repetitive and interactive setting.
Read the Presentation
Slides...
Referred Papers
|