20 Apr 2004
Mining Concise Summarizations of Probable Categories in Large Databases
Speaker: HO Wai Shing
Abstract
Summarization is an important tool to analyze a large database. Given a
database of objects, each associated with a class label, it is interesting
to summarize the objects’ properties according to their class labels. For
example, in a medical consensus database, we store the attributes (e.g.,
age, job, smoking habit, etc.) of people and label each person whether he
has lung cancer or not. In order to investigate the factors leading to
lung cancers, we are interested at the summarizations (or, sub-group) of
the data which the probability of having lung cancers is much higher than
the overall average. A probable summarization we can find may be "“the
probability of having lung cancer is 2.78 times higher than the overall
average if he smokes”"
In this talk, I will discuss the definition of the problem, some
properties we've found about the problem and the algorithms we've
developed for the problem. Preliminary experiment results will also be
presented in this talk.
|