HKU Research  The University of Hong Kong
Department of Computer Science
current research


20 Apr 2004

Mining Concise Summarizations of Probable Categories in Large Databases
Speaker: HO Wai Shing


Summarization is an important tool to analyze a large database. Given a database of objects, each associated with a class label, it is interesting to summarize the objects’ properties according to their class labels. For example, in a medical consensus database, we store the attributes (e.g., age, job, smoking habit, etc.) of people and label each person whether he has lung cancer or not. In order to investigate the factors leading to lung cancers, we are interested at the summarizations (or, sub-group) of the data which the probability of having lung cancers is much higher than the overall average. A probable summarization we can find may be "“the probability of having lung cancer is 2.78 times higher than the overall average if he smokes”"

In this talk, I will discuss the definition of the problem, some properties we've found about the problem and the algorithms we've developed for the problem. Preliminary experiment results will also be presented in this talk.

Back to the top

Comment?  Send to