Fast inexact mapping using advanced tree exploration on backward search methods
Fecha
Autores
Tárraga Giménez, Joaquín
Medina Castelló, Ignacio
Dopazo Blazquez, Joaquin
Directores
Handle
https://riunet.upv.es/handle/10251/52360
Cita bibliográfica
Salavert Torres, J.; Tomás Domínguez, AE.; Tárraga Giménez, J.; Medina Castelló, I.; Dopazo Blazquez, J.; Blanquer Espert, I. (2015). Fast inexact mapping using advanced tree exploration on backward search methods. BMC Bioinformatics. 16(18):1-11. https://doi.org/10.1186/s12859-014-0438-3
Titulación
Resumen
Background: Short sequence mapping methods for Next Generation Sequencing consist on a combination of
seeding techniques followed by local alignment based on dynamic programming approaches. Most seeding
algorithms are based on backward search alignment, using the Burrows Wheeler Transform, the Ferragina and
Manzini Index or Suffix Arrays. All these backward search algorithms have excellent performance, but their
computational cost highly increases when allowing errors. In this paper, we discuss an inexact mapping algorithm
based on pruning strategies for search tree exploration over genomic data.
Results: The proposed algorithm achieves a 13x speed-up over similar algorithms when allowing 6 base errors,
including insertions, deletions and mismatches. This algorithm can deal with 400 bps reads with up to 9 errors in a
high quality Illumina dataset. In this example, the algorithm works as a preprocessor that reduces by 55% the number
of reads to be aligned. Depending on the aligner the overall execution time is reduced between 20–40%.
Conclusions: Although not intended as a complete sequence mapping tool, the proposed algorithm could be used
as a preprocessing step to modern sequence mappers. This step significantly reduces the number reads to be aligned,
accelerating overall alignment time. Furthermore, this algorithm could be used for accelerating the seeding step of
already available sequence mappers. In addition, an out-of-core index has been implemented for working with large
genomes on systems without expensive memory configurations.
Palabras clave
Sequence mapping, Burrows-wheeler, FM-Index, Suffix array
ISSN
1471-2105
ISBN
Fuente
BMC Bioinformatics
DOI
10.1186/s12859-014-0438-3
Enlaces relacionados
Código de Proyecto
Patrocinadores
Agradecimientos
The authors would like to thank the Universitat Politecnica de Valencia (Spain) in the frame of the grant "High-performance tools for the alignment of genetic sequences using graphic accelerators (GPGPUs)/Herramientas de altas prestaciones para el alineamiento de secuencias geneticas mediante el uso de aceleradores graficos (GPGPUs)", research program PAID-06-11, code 2025.