|
Abstract
Many organizations today have databases that grow without limit at a rate
of several million records per day. Mining these continuous data streams
is a new research topic. Most statistical and machine-learning algorithms
assume that training data is a random sample drawn from a stationary
distribution. Unfortunately, this assumption is errorneous since the
underlying concept often changes over time. Although some algorithms have
been proposed for learning time-changing concepts, they generally do not
scale well to the large databases or data streams.
In this talk, I will present an efficient algorithm CVFDT
(Concept-adapting Very Fast Decision Tree learner) for mining decision
trees from continuously-changing data streams. CVFDT is able to maintain a
decision tree up-to-date with a window of examples, using a small constant
amount of time for each example but with guaranteed accuracy.
Read the Presentation
Slides...
Referred Papers
|