|
Abstract
The XML is gaining widespread use as a format for data exchange and
storage on the WWW. Queries over XML data require accurate selectivity
estimation of path expressions to optimize query execution plans.
Selectivity estimation of XML path expression is usually done based on
summary statistics about the structure of the underlying XML repository.
All previous methods require an off-line scan of the XML repository to
collect the statistics. In this talk, I will describe XPathLearner, a
method for estimating selectivity of the most commonly used types of path
expressions without looking at the XML data. XPathLearner gathers and
refines the statistics using query feedback in an on-line manner and is
especially suited to queries in Internet scale applications since the
underlying XML repositories are likely to be inaccessible or too large to
be scanned entirely. Besides the on-line property, XPathLearner ialso has
two other novel features: (a) XPath Learner is workload aware in
collecting the statistics and thus can be dramatically more accurate than
the more costly off-line method under tight memory constraints, and (b)
XPathLearner automatically adjusts the statistics using query feedback
when the underlying XML data change. The empirically estimated accuracy
of XPathLearner on several real data sets are also shown.
Read the Presentation
Slides...
Referred Papers
|