|
Abstract
DNA sequences hold the code of life for every living organism. Currently,
biologists are interesting in finding similar pattern in DNA sequences.
The typical size of whole human genome is 3Gbp, i.e. 3G of characters. The
sequences can be considered as strings over an alphabet of four characters
-- A, C, G and T. Our goal is to figure out a practical approach to
indexing human genome such that biologists can do approximate/exact
matching on the 3Gbp DNA sequences efficiently.
In this talk, I will present my research concerning using suffix array and
suffix tree structures with PC Cluster to indexing the human genome. Our
experiment shows that suffix array performs better that than suffix tree,
although suffix tree has lower run-time complexity theoretically. I will
present the techniques for partitioning the data or index structure in
order to fit into the PC cluster model.
Read the Presentation
Slides...
Referred Papers
|