HKU Research  The University of Hong Kong
Department of Computer Science
Feature
home
current research
people
publications
HKU CS

 

2 Jan 2004

The Semi-Supervised Clustering Problem
Line
Speaker: Kevin YIP

Abstract

A very popular data mining task is to partition objects into different groups such that the objects in each group share some unique properties. The task has long been formulated as two different computational problems: clustering and classification. In clustering (unsupervised learning), all objects are unlabeled and the grouping is based on a similarity function. In contrast, classification (supervised learning) takes as input a set of labeled training objects to build a classifier for assigning labels to new unlabeled objects. The grouping of new objects is mainly based on the patterns learnt from the training set.

There are many real situations where clustering may not yield satisfactory results, while there is no or not enough labeled data for building a classifier. In such situations, it is possible to make a surprising performance boost of clustering accuracy by utilizing only a small amount of accessible domain knowledge. This learning paradigm is now commonly known as semi-supervised clustering.

In this talk, I will discuss the motivation for this new technique by addressing its potential applications in various domains, with a special emphasis on explaining why traditional clustering and classification techniques may not work or may give poorer results in such cases. I will then go through the recent studies on the topic and compare the different proposed approaches by comparing

  • The kind of input knowledge being considered
  • When the knowledge is supplied to the clustering algorithm
  • How does the knowledge affect the clustering process
  • I will also suggest some possible future works, particularly in the database and bioinformatics domains.

    Read the Presentation Slides...

    Referred Papers

    Back to the top

    Comment?  Send to dbgroup@cs.hku.hk