- -

Fast inexact mapping using advanced tree exploration on backward search methods

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

Fast inexact mapping using advanced tree exploration on backward search methods

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Salavert Torres, José es_ES
dc.contributor.author Tomás Domínguez, Andrés Enrique es_ES
dc.contributor.author Tárraga Giménez, Joaquín es_ES
dc.contributor.author Medina Castelló, Ignacio es_ES
dc.contributor.author Dopazo Blazquez, Joaquin es_ES
dc.contributor.author Blanquer Espert, Ignacio es_ES
dc.date.accessioned 2015-06-26T18:11:39Z
dc.date.available 2015-06-26T18:11:39Z
dc.date.issued 2015-01-28
dc.identifier.issn 1471-2105
dc.identifier.uri http://hdl.handle.net/10251/52360
dc.description.abstract Background: Short sequence mapping methods for Next Generation Sequencing consist on a combination of seeding techniques followed by local alignment based on dynamic programming approaches. Most seeding algorithms are based on backward search alignment, using the Burrows Wheeler Transform, the Ferragina and Manzini Index or Suffix Arrays. All these backward search algorithms have excellent performance, but their computational cost highly increases when allowing errors. In this paper, we discuss an inexact mapping algorithm based on pruning strategies for search tree exploration over genomic data. Results: The proposed algorithm achieves a 13x speed-up over similar algorithms when allowing 6 base errors, including insertions, deletions and mismatches. This algorithm can deal with 400 bps reads with up to 9 errors in a high quality Illumina dataset. In this example, the algorithm works as a preprocessor that reduces by 55% the number of reads to be aligned. Depending on the aligner the overall execution time is reduced between 20–40%. Conclusions: Although not intended as a complete sequence mapping tool, the proposed algorithm could be used as a preprocessing step to modern sequence mappers. This step significantly reduces the number reads to be aligned, accelerating overall alignment time. Furthermore, this algorithm could be used for accelerating the seeding step of already available sequence mappers. In addition, an out-of-core index has been implemented for working with large genomes on systems without expensive memory configurations. es_ES
dc.description.sponsorship The authors would like to thank the Universitat Politecnica de Valencia (Spain) in the frame of the grant "High-performance tools for the alignment of genetic sequences using graphic accelerators (GPGPUs)/Herramientas de altas prestaciones para el alineamiento de secuencias geneticas mediante el uso de aceleradores graficos (GPGPUs)", research program PAID-06-11, code 2025. en_EN
dc.language Inglés es_ES
dc.publisher BioMed Central es_ES
dc.relation.ispartof BMC Bioinformatics es_ES
dc.rights Reconocimiento (by) es_ES
dc.subject Sequence mapping es_ES
dc.subject Burrows-wheeler es_ES
dc.subject FM-Index es_ES
dc.subject Suffix array es_ES
dc.subject.classification CIENCIAS DE LA COMPUTACION E INTELIGENCIA ARTIFICIAL es_ES
dc.subject.classification LENGUAJES Y SISTEMAS INFORMATICOS es_ES
dc.title Fast inexact mapping using advanced tree exploration on backward search methods es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.1186/s12859-014-0438-3
dc.relation.projectID info:eu-repo/grantAgreement/UPV//PAID-06-11-2025/ es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Instituto de Instrumentación para Imagen Molecular - Institut d'Instrumentació per a Imatge Molecular es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació es_ES
dc.description.bibliographicCitation Salavert Torres, J.; Tomás Domínguez, AE.; Tárraga Giménez, J.; Medina Castelló, I.; Dopazo Blazquez, J.; Blanquer Espert, I. (2015). Fast inexact mapping using advanced tree exploration on backward search methods. BMC Bioinformatics. 16(18):1-11. https://doi.org/10.1186/s12859-014-0438-3 es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion http://dx.doi.org/10.1186/s12859-014-0438-3 es_ES
dc.description.upvformatpinicio 1 es_ES
dc.description.upvformatpfin 11 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 16 es_ES
dc.description.issue 18 es_ES
dc.relation.senia 285402
dc.identifier.pmid 25626517 en_EN
dc.identifier.pmcid PMC4384339 en_EN
dc.contributor.funder Universitat Politècnica de València es_ES
dc.description.references Li H, Homer N. A survey of sequence alignment algorithms for next-generation sequencing. Brief Biol. 2010; 11(5):473–83. es_ES
dc.description.references Smith TF, Waterman MS. Identification of common molecular subsequences. J Mol Biol. 1981; 147:195–7. es_ES
dc.description.references Gotoh O. An improved algorithm for matching biological sequences. J Mol Biol. 1982; 162:705–8. es_ES
dc.description.references Durbin R, Eddy SR, Krogh A, Mitchison G. Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press: Cambridge; 1998. [ http://books.google.es/books?id=R5P2GlJvigQC ] es_ES
dc.description.references Ferragina P, Manzini G. Indexing compressed text. J ACM. 2005; 52(4):552–81. doi:10.1145/10820361082039 es_ES
dc.description.references Burrows M, Wheeler DJ. A block-sorting lossless data compression algorithm. Technical Report 124. (SRC Digital, DEC Palo Alto); May 1994 es_ES
dc.description.references Manzini G. An analysis of the burrows-wheeler transform. In: Proceedings of the Tenth Annual ACM-SIAM Symposium on Discrete Algorithms. NY: ACM-SIAM: 1999. p. 669–77. es_ES
dc.description.references Ferragina P, Manzini G. Opportunistic data structures with applications. In: FOCS: 2000. p. 390–398. es_ES
dc.description.references Li H, Durbin R. Fast and accurate short read alignment with burrows-wheeler transform. Bioinformatics. 2009; 25(14):1754–1760. es_ES
dc.description.references Li R, Yu C, Li Y, Lam T-W, Yiu S-M, Kristiansen K, et al. Soap2: an improved ultrafast tool for short read alignment. Bioinformatics. 2009; 25(15):1966–1967. doi:10.1093/bioinformatics/btp336. es_ES
dc.description.references Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009; 10:(R25). es_ES
dc.description.references Luo R, Wong T, Zhu J, Liu C-M, Zhu X, Wu E, et al. Soap3-dp: Fast, accurate and sensitive gpu-based short read aligner. PLoS ONE. 2013; 8(5):65632. doi:10.1371/journal.pone.0065632 es_ES
dc.description.references Liu Y, Schmidt B. Long read alignment based on maximal exact match seeds. Bioinformatics. 2012; 28(18):318–324. doi:10.1093/bioinformatics/bts414 es_ES
dc.description.references Klus P, Lam S, Lyberg D, Cheung M, Pullan G, McFarlane I, et al. Barracuda - a fast short read sequence aligner using graphics processing units. BMC Res Notes. 2012; 5(1):27. doi:10.1186/1756-0500-5-27 es_ES
dc.description.references Salavert J, Blanquer I, Andrés T, Vicente H, Ignacio M, Joaquín T, et al. Using gpus for the exact alignment of short-read genetic sequences by means of the burrows-wheeler transform. IEEE/ACM Trans Comput Biol Bioinf. 2012; 9(4):1245–56. doi:10.1109/TCBB.2012.49 es_ES
dc.description.references Xin Y, Liu B, Min B, Li WXY, Cheung RCC, Fong AS, et al. Parallel architecture for {DNA} sequence inexact matching with burrows-wheeler transform. Microelectron J. 2013; 44(8):670–82. doi:10.1016/j.mejo.2013.05.004 es_ES
dc.description.references Manber U, Myers G. Suffix arrays: A new method for on-line string searches. In: Proceedings of the First Annual ACM-SIAM Symposium on Discrete Algorithms. SODA ’90Philadelphia, PA, USA: Society for Industrial and Applied Mathematics: 1990. p. 319–327. http://dl.acm.org/citation.cfm?id=320176.320218 es_ES
dc.description.references Abouelhoda MI, Kurtz S, Ohlebusch E. The enhanced suffix array and its applications to genome analysis. In: Proc. Workshop on Algorithms in Bioinformatics, in Lecture Notes in Computer Science,Heidelberger, Berlin: Springer: 2002. p. 449–63. es_ES
dc.description.references Vyverman M, De Baets B, Fack V, Dawyndt P. essamem: finding maximal exact matches using enhanced sparse suffix arrays. Bioinformatics. 2013; 29(6):802–4. doi:10.1093/bioinformatics/btt042 es_ES
dc.description.references Oguzhan Kulekci M, Hon W-K, Shah R, Scott Vitter J, Xu B. Psi-ra: a parallel sparse index for genomic read alignment. BMC Genomics. 2011; 12(Suppl 2):7. doi:10.1186/1471-2164-12-S2-S7 es_ES
dc.description.references Sadakane K. New text indexing functionalities of the compressed suffix arrays. J Algorithms. 2003; 48(2):294–313. doi:10.1016/S0196-6774(03)00087-7 es_ES
dc.description.references Liu C-M, Wong T, Wu E, Luo R, Yiu S-M, Li Y, et al. Soap3: ultra-fast gpu-based parallel alignment tool for short reads. Bioinformatics. 2012; 28(6):878–9. doi:10.1093/bioinformatics/bts061. http://bioinformatics.oxfordjournals.org/content/28/6/878.full.pdf+html es_ES
dc.description.references Lam TW, Li R, Tam A, Wong S, Wu E, Yiu SM. High throughput short read alignment via bi-directional bwt. In: IEEE International Conference On Bioinformatics and Biomedicine, 2009. BIBM ’09.,Washington, D.C., USA: IEEE Computer Society Press: 2009. p. 31–6. doi:10.1109/BIBM.2009.42 es_ES
dc.description.references Li H, Durbin R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics. 2010; 26(5):589–95. doi:10.1093/bioinformatics/btp698 es_ES
dc.description.references Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Meth. 2012; 9(4):357–9. doi:10.1038/nmeth.1923 es_ES
dc.description.references Mu JC, Jiang H, Kiani A, Mohiyuddin M, Asadi NB, Wong WH. Fast and accurate read alignment for resequencing. Bioinformatics. 2012; 28(18):2366–73. doi:10.1093/bioinformatics/bts450 es_ES
dc.description.references Ning Z, Cox AJ, Mullikin JC. Ssaha: A fast search method for large dna databases. Genome Res. 2001; 11(10):1725–9. doi:10.1101/gr.194201 es_ES
dc.description.references Marco-Sola S, Sammeth M, Guigo R, Ribeca P. The GEM mapper: fast, accurate and versatile alignment by filtration. Nat Meth. 2012; 9(12):1185–8. doi:10.1038/nmeth.2221 es_ES
dc.description.references Sadakane K. A library for compressed full-text indexes. https://code.google.com/p/csalib/ (2010) es_ES
dc.description.references Mäkinen V, Navarro G, Sadakane K. Advantages of backward searching; efficient secondary memory and distributed implementation of compressed suffix arrays. In: Proceedings of the 15th International Conference on Algorithms and Computation. ISAAC’04,Berlin, Heidelberg: Springer: 2004. p. 681–92. doi:10.1007/978-3-540-30551-4_59. http://dx.doi.org/10.1007/978-3-540-30551-4_59 es_ES
dc.description.references Puglisi SJ, Smyth WF, Turpin AH. A taxonomy of suffix array construction algorithms. ACM Comput Surv. 2007; 39(2). doi:10.1145/1242471.1242472 es_ES
dc.description.references Okanohara D, Sadakane K. A linear-time burrows-wheeler transform using induced sorting. In: Karlgren J, Tarhio J, Hyyrö H, editors. String Processing and Information Retrieval. Lecture Notes in Computer Science, vol. 5721. Heidelberg, Berlin: Springer: 2009. p. 90–101. es_ES
dc.description.references Grossi R, Vitter J. Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SICOMP: SIAM J Comput. 2005; 35(2):378–407. es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem