Long-Read RNA-Seq: Quality Control and Benchmarking

Pardo Palacios, Francisco José

doi:10.4995/Thesis/10251/212027

Identificarse

Buscar en RiuNet

Listar

Todo RiuNet
Esta colección

Mi cuenta

Acceder

Estadísticas

Ver Estadísticas de uso

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Long-Read RNA-Seq: Quality Control and Benchmarking

Mostrar el registro sencillo del ítem

Ficheros en el ítem

Nombre: Pardo - Long-Read ...

Tamaño: 46.79Mb

Formato: PDF

Abrir

Nombre: indice_de_tesis.pdf

Tamaño: 102.7Kb

Formato: PDF

Abrir

Nombre: Resumen en castellano ...

Tamaño: 77.68Kb

Formato: PDF

Abrir

dc.contributor.advisor	Conesa Cegarra, Ana	es_ES
dc.contributor.advisor	Tarazona Campos, Sonia	es_ES
dc.contributor.author	Pardo Palacios, Francisco José	es_ES
dc.date.accessioned	2024-11-20T14:15:55Z
dc.date.available	2024-11-20T14:15:55Z
dc.date.created	2024-10-11
dc.date.issued	2024-11-18	es_ES
dc.identifier.uri	http://hdl.handle.net/10251/212027
dc.description.abstract	[ES] La presente tesis muestra la utilización de las lecturas largas para resolver las limitaciones asociadas al ARN-Seq habitual, presentando innovaciones significativas en este campo. Las lecturas largas permiten capturar transcritos completos y detectar nuevas variantes de splicing, mejorando los resultados obtenidos con lecturas cortas en términos de precisión ya que no existe la necesidad de realizar un ensamblado de lecturas que podría dar lugar a isoformas quiméricas. En el marco de este trabajo, se ha desarrollado la herramienta SQANTI3, diseñada para la evaluación y filtrado de transcriptomas. SQANTI3 clasifica modelos de transcripción de lecturas largas según categorías estructurales basadas en sus splice junctions (SJ) y anota diversas características de calidad, tales como la presencia de SJ no canónicas o la fiabilidad de las anotaciones de los sitios de inicio y término de transcripción (TSS y TTS, por sus siglas en inglés) utilizando datos ortogonales. También ofrece un módulo de filtrado de artefactos basado en aprendizaje automático y reglas definidas por el usuario, así como un módulo de "rescate" para evitar la pérdida de genes completos por un filtrado excesivo. Por último, SQANTI3 integra la anotación funcional de los transcriptomas con isoAnnot Lite, facilitando el análisis de cambios en la expresión de isoformas y sus implicaciones funcionales. SQANTI3 se utilizó en los retos 1 y 3 del proyecto LRGASP (Long-read RNA-seq Genome Annotation Assessment Project), un esfuerzo internacional y multicéntrico para el benchmarking de herramientas bioinformáticas de lecturas largas en ARN-Seq. Ambos retos se centraron en la identificación correcta de transcritos en organismos altamente anotados (reto 1) y en organismos no modelo con limitaciones de información a priori (reto 3). LRGASP proporcionó datos de diferentes tecnologías y protocolos a los participantes para que presentaran los resultados obtenidos sus herramientas bioinformáticas. Estos resultados se evaluaron y compararon utilizando SQANTI3, dejando patente las diferencias de transcriptomas obtenidos para una misma muestra dependiendo de los datos y métodos empleados. En resumen, el trabajo en esta tesis resalta la importancia que la utilización de lecturas largas para ARN-Seq puede tener en el futuro y como SQANTI3 es y será una herramienta clave para la evaluación y mejora de la calidad de los transcriptomas.	es_ES
dc.description.abstract	[CA] La present tesi mostra la utilització de les lectures llargues per resoldre les limitacions associades a l'ARN-Seq habitual, presentant innovacions significatives en aquest camp. Les lectures llargues permeten capturar transcrits complets i detectar noves variants de splicing, millorant els resultats obtinguts amb lectures curtes en termes de precisió, ja que no és necessari realitzar un assemblatge de lectures que podria donar lloc a isoformes quimèriques. En el marc d'aquest treball, s'ha desenvolupat l'eina SQANTI3, dissenyada per a l'avaluació i filtratge de transcriptomes. SQANTI3 classifica models de transcripció de lectures llargues segons categories estructurals basades en les seues splice junctions (SJ) i anota diverses característiques de qualitat, com la presència de SJ no canòniques o la fiabilitat de les anotacions dels llocs d'inici i terme de transcripció (TSS i TTS, per les seues sigles en anglés) utilitzant dades ortogonals. També ofereix un mòdul de filtratge d'artefactes basat en aprenentatge automàtic o regles definides per l'usuari, així com un mòdul de "rescat" per a evitar la pèrdua de gens complets per un filtratge excessiu. Finalment, SQANTI3 integra l'anotació funcional dels transcriptomes amb isoAnnot Lite, facilitant l'anàlisi de canvis en l'expressió d'isoformes i les seues implicacions funcionals. SQANTI3 es va utilitzar en els reptes 1 i 3 del projecte LRGASP (Long-read RNA-seq Genome Annotation Assessment Project), un esforç internacional i multicèntric per al benchmarking d'eines bioinformàtiques de lectures llargues en ARN-Seq. Ambdós reptes es van centrar en la identificació correcta de transcrits en organismes altament anotats (repte 1) i en organismes no model amb limitacions d'informació a priori (repte 3). LRGASP va proporcionar dades de diferents tecnologies i protocols als participants perquè presentaren els resultats obtinguts amb les seues eines bioinformàtiques. Aquests resultats es van avaluar i comparar utilitzant SQANTI3, deixant patent les diferències de transcriptomes obtinguts per a una mateixa mostra depenent de les dades i mètodes emprats. En resum, aquesta tesi ressalta la importància que la utilització de lectures llargues per a ARN-Seq pot tindre en el futur i com SQANTI3 és i serà una eina clau per a l'avaluació i millora de la qualitat dels transcriptomes.	es_ES
dc.description.abstract	[EN] This thesis presents the usage of long-read sequencing to overcome the limitations associated with conventional RNA-Seq, introducing significant innovations in this field. Long-read sequencing enables the capture of full-length transcripts and the detection of novel splicing variants, improving the accuracy of results compared to short-read sequencing, as there is no need for assembly, which could otherwise lead to chimeric isoforms. As part of this work, the SQANTI3 tool has been designed and developed for the evaluation and filtering of transcriptomes. SQANTI3 classifies long-read transcription models into structural categories based on their splice junctions (SJ) and annotates a wide variety of quality features, such as the presence of non-canonical SJs or the reliability of Transcription Start and Termination Sites (TSS and TTS) detected using orthogonal data. It also includes an artifact filtering module based on machine learning or user-defined rules, as well as a "rescue" module to prevent the loss of complete genes due to excessive filtering. Finally, SQANTI3 integrates the functional annotation of transcriptomes with isoAnnot Lite, facilitating the analysis of isoform expression changes and their functional implications. SQANTI3 was used in challenges 1 and 3 of the Long-read RNA-seq Genome Annotation Assessment Project (LRGASP), an international and multicenter effort to benchmark bioinformatic tools for long-read RNA-Seq data. Both challenges focused on the correct identification of transcripts in well-annotated organisms (challenge 1) and in non-model organisms with limited prior information (challenge 3). LRGASP provided participants with data from different sequencing technologies and protocols to submit the results obtained by their bioinformatics tools. These results were evaluated and compared using SQANTI3, highlighting the differences in transcriptomes obtained from the same sample depending on the data and methods used. In summary, the work in thesis emphasizes the importance that long-read RNA-Seq can have in the future and how SQANTI3 is and will continue to be a key tool for the evaluation and improvement of transcriptome quality.	es_ES
dc.description.sponsorship	The project is supported by the following grants: Pew Charitable Trust, NIGMS R35GM138122, NHGRI R21HG011280, Spanish Ministry of Science PID2020-119537RB-10, NIGMS R35GM142647, NIGMS R35GM133569, NHGRI U41HG007234, NHGRI F31HG010999, and UM1 HG009443, NHGRI R01HG008759 and R01HG011469, NHGRI R01HG007182, NHGRI UM1HG009402, NHMRC Investigator Grant GNT2017257, Comunitat Valenciana Grant ACIF/2018/290, Chan Zuckerberg Initiative DAF, an advised fund of Silicon Valley Community Foundation, Grant No. 2019-002443, an institutional fund from the Department of Biomedical Informatics, The Ohio State University, an institutional fund from the Department of Computational Medicine and Bioinformatics, University of Michigan, SPBU 73023672, AMED 22kk0305013h9903, 23kk0305024h0001, Wellcome Trust [WT222155/Z/20/Z] , and European Molecular Biology Laboratory. We acknowledge the support of the Spanish Ministry of Science and Innovation to the EMBL partnership, Centro de Excelencia Severo Ochoa, and CERCA Programme / Generalitat de Catalunya and the support of the German Federal Ministry of Education and Research with the grant 161L0242A. This work has been also funded by NIH grant R21HG011280, by the Spanish Ministry of Science grants BES-2016-076994 and PID2020-119537RB-100, and by the Comunitat Valenciana grant ACIF/2018/290.	es_ES
dc.format.extent	290	es_ES
dc.language	Inglés	es_ES
dc.publisher	Universitat Politècnica de València	es_ES
dc.relation	info:eu-repo/grantAgreement/GVA//ACIF%2F2018%2F290/	es_ES
dc.relation	info:eu-repo/grantAgreement/GVA//ACIF%2F2019%2F239/	es_ES
dc.relation	info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-119537RB-I00/ES/INTEGRACION DE DATOS MULTI-OMICOS PARA LA INFERENCIA DE MODELOS MULTI-CAPA DE ENFERMEDAD/	es_ES
dc.rights	Reserva de todos los derechos	es_ES
dc.subject	Secuenciación genética	es_ES
dc.subject	Ácido Ribonucleico (ARN)	es_ES
dc.subject	Secuenciación genética de lectura larga	es_ES
dc.subject	PacBio Sequencer	es_ES
dc.subject	Genome Annotation Assessment Project (LRGASP)	es_ES
dc.subject	RNA sequencing	es_ES
dc.subject	Oxford Nanopore	es_ES
dc.subject	SQANTI3	es_ES
dc.subject	Long-read sequencing	es_ES
dc.subject.classification	ESTADISTICA E INVESTIGACION OPERATIVA	es_ES
dc.title	Long-Read RNA-Seq: Quality Control and Benchmarking	es_ES
dc.type	Tesis doctoral	es_ES
dc.identifier.doi	10.4995/Thesis/10251/212027	es_ES
dc.rights.accessRights	Abierto	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Departamento de Estadística e Investigación Operativa Aplicadas y Calidad - Departament d'Estadística i Investigació Operativa Aplicades i Qualitat	es_ES
dc.description.bibliographicCitation	Pardo Palacios, FJ. (2024). Long-Read RNA-Seq: Quality Control and Benchmarking [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/212027	es_ES
dc.description.accrualMethod	TESIS	es_ES
dc.type.version	info:eu-repo/semantics/acceptedVersion	es_ES
dc.relation.pasarela	TESIS\12606	es_ES
dc.contributor.funder	Ministry of Education, Culture, Sports, Science and Technology, Japón	es_ES
dc.contributor.funder	Bundesministerium für Bildung und Forschung, Alemania	es_ES
dc.contributor.funder	Generalitat Valenciana	es_ES
dc.contributor.funder	National Institute of General Medical Sciences, EEUU	es_ES
dc.contributor.funder	Agencia Estatal de Investigación	es_ES

Este ítem aparece en la(s) siguiente(s) colección(ones)

Tesis doctorales [5389]

Mostrar el registro sencillo del ítem

Long-Read RNA-Seq: Quality Control and Benchmarking

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Buscar en RiuNet

Listar

Todo RiuNet

Esta colección

Mi cuenta

Estadísticas

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Long-Read RNA-Seq: Quality Control and Benchmarking

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)