HKU CS-Database Research Group

The University of Hong Kong
Department of Computer Science

home

db seminars

14 Jun 2007

The Impact of Duality in Data Representation Problems

Speaker: Panagiotis Karras

Abstract

Data representation problems, such as summarization or privacy preservation, require the optimization of one parameter in the represented data set under a bound on one or more other parameters. For example, in the data synopsis problem, an error metric should be minimized under a constraint on the space used; in the privacy preservation probelm, a utility metric should be minimized under a constraint on the revealed sensitive information. Classical approaches to these problems have treated them in a direct manner, developing algorithms that straightforwardly strive to detect the optimal solution. Unfortunately, this approach often results in high time- and space complexities, while some problems are polynomially intractable with it. In this talk, we present selections of our research on an altervative methodology, which solves these problems through a lucrative application of the solutions to their dual counterparts. These dual counterparts reverse the relationship between the optimized and the bounded parameters; as we show, they are considerably easier to solve. Given the monotonic relationship between the parameters involved, a binary search procedure can then detect an optimal solution to a primal problem by searching the space of its dual counterpart. Our survey starts with histogram construction, where our approach reduces the time complexity for building an optimal B-bucket histogram for a maximum-error metric from O(n B log2 n) to O(n log(epsilon)). Then we briefly discuss the restricted Haar wavelet synopsis case, and the application of our technique on building unrestricted Haar and Haar+ synopses with approximation guarantees; in this case, it brings both a time and and space complexity advantage. We then show how our technique can facilitate the computation of optimal one-dimesnional partitionings under the l-diversity privacy preservation model. We conclude our talk with a presentation of our solution to the longest-prefix-match Compact Hierarchical Histogram partitioning problem for a maximum-error metric. We demonstrate that this problem, previously considered to be polynomially intractable, is solvable in low-polynomial time through our approach.

Read the Presentation Slides...

	Back to the top
	Comment? Send to dbgroup@cs.hku.hk