Abstract
Obtaining fast and good quality approximations to data distributions is a
problem of central interest to database management. A variety of popular
database applications including approximate querying, similarity searching
and data mining in most application domains, rely on such good quality
approximations. Histogram based approximation is a very popular method in
database theory and practice to succinctly represent a data distribution
in a space efficient manner.
In this talk, I will introduce some papers that gave algorithms on how to
construct query independent histograms over data stream in order to give
good quality approximate answers to queries. I will skech 3 algorithms:
the optimal histogram construction algorithm, agglomerative (1+/epsilon)-
approximation histogram construction algorithm and fixed window histogram
construction algorithm.
At last, I will also show some experimental results performed on the last
algorithm.