Abstract
Clustering is a powerful technique for discovering unknown patterns from
data, but the effectiveness of traditional clustering algorithms can
decrease tremendously when the dataset dimensionality is high. A new
branch of clustering algorithms has evolved in recent years, which focus
on finding projected clusters defined in certain subspaces of the
original dimension space. With the ability to identify relevant
attributes of each cluster, these algorithms may perform better on high
dimensional data.
A potential application for the projected clustering algorithms is the
analysis of genomic data, which can contain thousands of attributes.
In this talk, I will describe some previous approaches to solving the
projected clustering problem, my current work in the area, and my
experience in using these algorithms to analyse transcriptome and codon
usage datasets. Various possible future research directions will also be
discussed.