HKU CS-Database Research Group

The University of Hong Kong
Department of Computer Science

home

db seminars

2 Jan 2004

The Semi-Supervised Clustering Problem

Speaker: Kevin YIP

Abstract

A very popular data mining task is to partition objects into different groups such that the objects in each group share some unique properties. The task has long been formulated as two different computational problems: clustering and classification. In clustering (unsupervised learning), all objects are unlabeled and the grouping is based on a similarity function. In contrast, classification (supervised learning) takes as input a set of labeled training objects to build a classifier for assigning labels to new unlabeled objects. The grouping of new objects is mainly based on the patterns learnt from the training set.

There are many real situations where clustering may not yield satisfactory results, while there is no or not enough labeled data for building a classifier. In such situations, it is possible to make a surprising performance boost of clustering accuracy by utilizing only a small amount of accessible domain knowledge. This learning paradigm is now commonly known as semi-supervised clustering.

In this talk, I will discuss the motivation for this new technique by addressing its potential applications in various domains, with a special emphasis on explaining why traditional clustering and classification techniques may not work or may give poorer results in such cases. I will then go through the recent studies on the topic and compare the different proposed approaches by comparing

The kind of input knowledge being considered

When the knowledge is supplied to the clustering algorithm

How does the knowledge affect the clustering process

I will also suggest some possible future works, particularly in the database and bioinformatics domains.

Read the Presentation Slides...

Referred Papers

	Back to the top
	Comment? Send to dbgroup@cs.hku.hk