- -

Ngs_backbone: a pipeline for read cleaning, mapping and SNP calling using Next Generation Sequence

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

Ngs_backbone: a pipeline for read cleaning, mapping and SNP calling using Next Generation Sequence

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Blanca Postigo, José Miguel es_ES
dc.contributor.author Pascual Bañuls, Laura es_ES
dc.contributor.author Ziarsolo Areitioaurtena, Pello es_ES
dc.contributor.author Nuez Viñals, Fernando es_ES
dc.contributor.author Cañizares Sales, Joaquín es_ES
dc.date.accessioned 2013-04-16T07:34:09Z
dc.date.available 2013-04-16T07:34:09Z
dc.date.issued 2011
dc.identifier.issn 1471-2164
dc.identifier.uri http://hdl.handle.net/10251/27868
dc.description.abstract Background: The possibilities offered by next generation sequencing (NGS) platforms are revolutionizing biotechnological laboratories. Moreover, the combination of NGS sequencing and affordable high-throughput genotyping technologies is facilitating the rapid discovery and use of SNPs in non-model species. However, this abundance of sequences and polymorphisms creates new software needs. To fulfill these needs, we have developed a powerful, yet easy-to-use application. Results: The ngs_backbone software is a parallel pipeline capable of analyzing Sanger, 454, Illumina and SOLiD (Sequencing by Oligonucleotide Ligation and Detection) sequence reads. Its main supported analyses are: read cleaning, transcriptome assembly and annotation, read mapping and single nucleotide polymorphism (SNP) calling and selection. In order to build a truly useful tool, the software development was paired with a laboratory experiment. All public tomato Sanger EST reads plus 14.2 million Illumina reads were employed to test the tool and predict polymorphism in tomato. The cleaned reads were mapped to the SGN tomato transcriptome obtaining a coverage of 4.2 for Sanger and 8.5 for Illumina. 23,360 single nucleotide variations (SNVs) were predicted. A total of 76 SNVs were experimentally validated, and 85% were found to be real. Conclusions: ngs_backbone is a new software package capable of analyzing sequences produced by NGS technologies and predicting SNVs with great accuracy. In our tomato example, we created a highly polymorphic collection of SNVs that will be a useful resource for tomato researchers and breeders. The software developed along with its documentation is freely available under the AGPL license and can be downloaded from http://bioinf. comav.upv.es/ngs_backbone/ or http://github.com/JoseBlanca/franklin. es_ES
dc.language Inglés es_ES
dc.publisher BioMed Central es_ES
dc.relation.ispartof BMC Genomics es_ES
dc.rights Reconocimiento (by) es_ES
dc.subject Framework es_ES
dc.subject Discovery es_ES
dc.subject Transcriptome es_ES
dc.subject Lycoperson-Esculentum es_ES
dc.subject.classification GENETICA es_ES
dc.title Ngs_backbone: a pipeline for read cleaning, mapping and SNP calling using Next Generation Sequence es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.1186/1471-2164-12-285
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Biotecnología - Departament de Biotecnologia es_ES
dc.description.bibliographicCitation Blanca Postigo, JM.; Pascual Bañuls, L.; Ziarsolo Areitioaurtena, P.; Nuez Viñals, F.; Cañizares Sales, J. (2011). Ngs_backbone: a pipeline for read cleaning, mapping and SNP calling using Next Generation Sequence. BMC Genomics. 12:1-8. doi:10.1186/1471-2164-12-285 es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion http://www.biomedcentral.com/1471-2164/12/285 es_ES
dc.description.upvformatpinicio 1 es_ES
dc.description.upvformatpfin 8 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 12 es_ES
dc.relation.senia 205505
dc.identifier.pmid 21635747 en_EN
dc.identifier.pmcid PMC3124440 en_EN
dc.description.references Metzker ML: Sequencing technologies - the next generation. Nature Reviews Genetics. 2010, 11 (1): 31-46. 10.1038/nrg2626. es_ES
dc.description.references 454 sequencing. [ http://www.454.com/ ] es_ES
dc.description.references Illumina Inc. [ http://www.illumina.com/ ] es_ES
dc.description.references Flicek P, Birney E: Sense from sequence reads: methods for alignment and assembly (vol 6, pg S6, 2009). Nature Methods. 2010, 7 (6): 479-479. es_ES
dc.description.references Chevreux B, Pfisterer T, Drescher B, Driesel AJ, Muller WEG, Wetter T, Suhai S: Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs. Genome Research. 2004, 14 (6): 1147-1159. 10.1101/gr.1917404. es_ES
dc.description.references Li H, Durbin R: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009, 25 (14): 1754-1760. 10.1093/bioinformatics/btp324. es_ES
dc.description.references Langmead B, Trapnell C, Pop M, Salzberg SL: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology. 2009, 10 (3): es_ES
dc.description.references Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, Genome Project Data P: The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009, 25 (16): 2078-2079. 10.1093/bioinformatics/btp352. es_ES
dc.description.references 1000 Genomes. A deep Catalog of Human Genetic Variation. [ http://1000genomes.org/wiki/doku.php?id=1000_genomes:analysis:vcf4.0 ] es_ES
dc.description.references The seqanswers internet forum. [ http://seqanswers.com/ ] es_ES
dc.description.references Blankenberg D, Taylor J, Schenck I, He JB, Zhang Y, Ghent M, Veeraraghavan N, Albert I, Miller W, Makova KD, Ross CH, Nekrutenko A: A framework for collaborative analysis of ENCODE data: Making large-scale analyses biologist-friendly. Genome Research. 2007, 17 (6): 960-964. 10.1101/gr.5578007. es_ES
dc.description.references CloVR Automated Sequence Analysis from Your Desktop. [ http://clovr.org/ ] es_ES
dc.description.references Papanicolaou A, Stierli R, Ffrench-Constant RH, Heckel DG: Next generation transcriptomes for next generation genomes using est2assembly. Bmc Bioinformatics. 2009, 10: es_ES
dc.description.references Applied Biosystems by life technologies. [ http://www.appliedbiosystems.com/absite/us/en/home/applications-technologies/solid-next-generation-sequencing.html ] es_ES
dc.description.references Wall PK, Leebens-Mack J, Chanderbali AS, Barakat A, Wolcott E, Liang HY, Landherr L, Tomsho LP, Hu Y, Carlson JE, Ma H, Schuster SC, Soltis DE, Soltis PS, Altman N, dePamphilis CW: Comparison of next generation sequencing technologies for transcriptome characterization. Bmc Genomics. 2009, 10: es_ES
dc.description.references Murchison EP, Tovar C, Hsu A, Bender HS, Kheradpour P, Rebbeck CA, Obendorf D, Conlan C, Bahlo M, Blizzard CA, Pyecroft S, Kreiss A, Kellis M, Stark A, Harkins TT, Marshall Graves JA, Woods GM, Hanon GJ, Papenfuss AT: The Tasmanian Devil Transcriptome Reveals Schwann Cell Origins of a Clonally Transmissible Cancer. Science. 2010, 327 (5961): 84-87. 10.1126/science.1180616. es_ES
dc.description.references Parchman TL, Geist KS, Grahnen JA, Benkman CW, Buerkle CA: Transcriptome sequencing in an ecologically important tree species: assembly, annotation, and marker discovery. Bmc Genomics. 2010, 11: es_ES
dc.description.references Babik W, Stuglik M, Qi W, Kuenzli M, Kuduk K, Koteja P, Radwan J: Heart transcriptome of the bank vole (Myodes glareolus): towards understanding the evolutionary variation in metabolic rate. BMC Genomics. 2010, 11: 390-10.1186/1471-2164-11-390. es_ES
dc.description.references Miller JC, Tanksley SD: RFLP analysis of phylogenetic-relationships and genetic-variation in the genus Lycopersicon. Theoretical and Applied Genetics. 1990, 80 (4): 437-448. es_ES
dc.description.references Williams CE, Stclair DA: Phenetic relationships and levels of variability detected by restriction-fragment-length-polymorphism and random amplified polymorphic DNA analysis of cultivated and wild accessions of Lycopersicon-esculentum. Genome. 1993, 36 (3): 619-630. 10.1139/g93-083. es_ES
dc.description.references Rick CM: Tomato, Lycopersicon esculentum (Solanaceae). Evolution of crop plants. Edited by: Simmonds NW. 1976, London: Longman Group, 268-273. es_ES
dc.description.references Labate JA, Baldo AM: Tomato SNP discovery by EST mining and resequencing. Molecular Breeding. 2005, 16 (4): 343-349. 10.1007/s11032-005-1911-5. es_ES
dc.description.references Yano K, Watanabe M, Yamamoto N, Maeda F, Tsugane T, Shibata D: Expressed sequence tags (EST) database of a miniature tomato cultivar, Micro-Tom. Plant and Cell Physiology. 2005, 46: S139-S139. es_ES
dc.description.references Jimenez-Gomez JM, Maloof JN: Sequence diversity in three tomato species: SNPs, markers, and molecular evolution. Bmc Plant Biology. 2009, 9: es_ES
dc.description.references Yang WC, Bai XD, Kabelka E, Eaton C, Kamoun S, van der Knaap E, Francis D: Discovery of single nucleotide polymorphisms in Lycopersicon esculentum by computer aided analysis of expressed sequence tags. Molecular Breeding. 2004, 14 (1): 21-34. es_ES
dc.description.references Van Deynze A, Stoffel K, Buell CR, Kozik A, Liu J, van der Knaap E, Francis D: Diversity in conserved genes in tomato. Bmc Genomics. 2007, 8: es_ES
dc.description.references Sim SC, Robbins MD, Chilcott C, Zhu T, Francis DM: Oligonucleotide array discovery of polymorphisms in cultivated tomato (Solanum lycopersicum L.) reveals patterns of SNP variation associated with breeding. Bmc Genomics. 2009, 10: es_ES
dc.description.references Bioinformatics at COMAV. [ http://bioinf.comav.upv.es/ngs_backbone/index.html ] es_ES
dc.description.references Broad institute. [ http://www.broadinstitute.org/igv ] es_ES
dc.description.references Bioinformatics at COMAV. [ http://bioinf.comav.upv.es/ngs_backbone/install.html ] es_ES
dc.description.references Github social coding. [ http://github.com/JoseBlanca/franklin ] es_ES
dc.description.references Chou HH, Holmes MH: DNA sequence quality trimming and vector removal. Bioinformatics. 2001, 17 (12): 1093-1104. 10.1093/bioinformatics/17.12.1093. es_ES
dc.description.references Picard. [ http://picard.sourceforge.net/index.shtml ] es_ES
dc.description.references McKenna A, Hanna M, Banks E, Sivachenko A, Citulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA: The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Research. 2010, 20: 1297-1303. 10.1101/gr.107524.110. es_ES
dc.description.references Sol Genomics Network. [ ftp://ftp.solgenomics.net/ ] es_ES
dc.description.references NCBI Genbank. [ http://www.ncbi.nlm.nih.gov/genbank/ ] es_ES
dc.description.references Gundry CN, Vandersteen JG, Reed GH, Pryor RJ, Chen J, Wittwer CT: Amplicon melting analysis with labeled primers: A closed-tube method for differentiating homozygotes and heterozygotes. Clinical Chemistry. 2003, 49 (3): 396-406. 10.1373/49.3.396. es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem