HKU Research  The University of Hong Kong
Department of Computer Science and Information System
Feature
home
current research
people
publications
downloads
HKU CS

 

8 Mar 2002

Clustering XML Documents for Query Performance Enhancement
Line
WANG Lian

 

Abstract

Using relational tables to store XML documents is an established trend. However, it fragments the documents and creates a large number of joins that seriously impacts query performance. If the collection contains documents of different structures, we show that a proper clustering of the documents will alleviate the problem. To achieve a good clustering, we propose an algorithm S-GRACE which clusters documents according to their XML structures. S-GRACE is a hierarchical clustering algorithm for semi-structure data. The notion of structure graph (s-graph) is proposed which facilitates the definition of a distance metric applicable between documents as well as between clusters of documents.

Our experiments with real data such as the DBLP database shows that S-GRACE can discover clusters that cannot be spotted easily by manual action.

Read the Presentation Slides...

Referred Papers

Back to the top

Comment?  Send to dbgroup@cs.hku.hk