HKU Research  The University of Hong Kong
Department of Computer Science
Feature
home
current research
people
publications
HKU CS

 

04 Jan 2007

Reference-Based Indexing of Sequence Databases
Line
Speaker: Angela SIU Wing Yan

Abstract

We consider the problem of similarity search in a very large sequence database with edit distance as the similarity measure. Given limited main memory, our goal is to develop a reference-based index that reduces the number of costly edit distance computations in order to answer a query. The idea in reference-based indexing is to select a small set of reference sequences that serve as a surrogate for the other sequences in the database. We consider two novel strategies for selecting references as well as a new strategy for assiging references to database sequencs. Our experimental results show that our selection and assignmnet methods far outperform competitive methods. For example, our methods prune up to 20 times as many sequences as the Omni method, and as many as 30 times as many sequences as frequency vectors. Our methods also scale nicely for databases containing many and/or very long sequences.

Read the Presentation Slides...

Back to the top

Comment?  Send to dbgroup@cs.hku.hk