Evaluation and improvement of quality control methods for long read-defined transcriptomes

Estevan Morió, Eva

Identificarse

Buscar en RiuNet

Listar

Todo RiuNet
Esta colección

Mi cuenta

Acceder

Estadísticas

Ver Estadísticas de uso

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Evaluation and improvement of quality control methods for long read-defined transcriptomes

Mostrar el registro sencillo del ítem

Ficheros en el ítem

Nombre: Estevan - Evaluation ...

Tamaño: 7.667Mb

Formato: PDF

Abrir

Nombre: (2)_Estevan - ...

Tamaño: 1.238Mb

Formato: PDF

Abrir

dc.contributor.advisor	Forment Millet, José Javier	es_ES
dc.contributor.advisor	Conesa Cegarra, Ana	es_ES
dc.contributor.advisor	Arzalluz Luque, Ángeles	es_ES
dc.contributor.author	Estevan Morió, Eva	es_ES
dc.date.accessioned	2022-09-06T13:54:02Z
dc.date.available	2022-09-06T13:54:02Z
dc.date.created	2022-07-15
dc.date.issued	2022-09-06	es_ES
dc.identifier.uri	http://hdl.handle.net/10251/185371
dc.description.abstract	[ES] Las tecnologías de secuenciación de alto rendimiento de transcriptomas mediante lecturas largas han facilitado el descubrimiento de nuevos transcritos. No obstante, dichas tecnologías tienen una tasa de error muy superior a las basadas en lecturas cortas, por lo que requieren herramientas que permitan caracterizar estas variantes noveles y filtrar las que son falsos positivos. De esta necesidad nace SQANTI3 (Structural and Quality Annotation of Novel Transcript Isoforms), un software para el análisis de transcriptomas construidos a partir de lecturas largas. SQANTI3 toma un conjunto de datos de transcritos, junto con la anotación del genoma y, si están disponibles, otros datos ortogonales (expresión, validación de los extremos 3¿ y 5¿, etc.), para devolver un transcriptoma corregido. Asimismo, la herramienta proporciona un amplio conjunto de descriptores de las isoformas y sus sitios de splicing, que se analizan más a fondo en varias gráficas de diagnóstico. SQANTI3 incorpora un clasificador basado en inteligencia artificial (MLfilter) que discrimina, de manera automatizada, los transcritos que pueden considerarse verdaderas isoformas de los potenciales artefactos. Dicho filtro se basa en un algoritmo de 'random forest', que aporta múltiples ventajas al análisis transcriptómico, entre ellas evita el uso de umbrales establecidos manualmente para cada variable descriptora. Sin embargo, como todo modelo de machine learning, se trata de una caja negra, es decir, se desconoce lo que pasa entre la entrada de datos y la salida de una predicción. Comparando distintas combinaciones de parámetros de entrenamiento del MLfilter, hemos caracterizado su funcionamiento y hemos establecido una serie de guías para optimizar la definición de los datos de entrenamiento en función del tipo de datos de partida. Concretamente, se ha evaluado la adecuación del set de transcritos tomados como verdaderos positivos por el clasificador, así como las variables más relevantes para obtener una buena clasificación artefacto-isoforma. Además, hemos detectado errores evitables, como el ¿overfitting¿ o sobreajuste. Todo esto, contribuirá a unas mejores prácticas por parte de la gran comunidad de usuarios que emplean SQANTI3 para refinar sus transcriptomas y abrirá la puerta a futuras investigaciones para mejorar el MLfilter.	es_ES
dc.description.abstract	[EN] High-throughput long-read transcriptome sequencing technologies have facilitated the discovery of novel transcripts. Nevertheless, these technologies have a much higher error rate than those based on short reads and therefore specific tools are required to characterise these novel variants and filter out false positives. SQANTI3 (Structural and Quality Annotation of Novel Transcript Isoforms), a software for analysing long read-based transcriptomes, was born out of this need. SQANTI3 takes a transcriptome, together with genome annotation and, if available, other orthogonal data (expression, validation of the 3' and 5' ends, etc.) to return a corrected transcriptome. The tool also provides a wide set of descriptors of the isoforms and their splice junctions, which are further analysed in several diagnostic plots. SQANTI3 includes an artificial intelligence-based classifier (MLfilter) that automatically discriminates transcripts that can be considered true isoforms from potential artifacts. This filter is based on a random forest algorithm, which brings multiple advantages to transcriptomic analysis, such as avoiding the use of manually set thresholds for each descriptor variable. However, like any machine learning model, it is a black box meaning that what happens between the input data and the predicted output is unknown. By comparing different parameter set ups, we have characterised the MLfilter¿s performance for two transcriptome datasets and established guidelines to optimise the choice of training data according to the input data type. Specifically, we have evaluated the adequacy of the set of transcripts taken as true positives by the classifier, as well as the most relevant variables to obtain a good artifact-isoform classification. We have also detected avoidable biases such as overfitting. Ultimately, this work will help define best practices for the large community of researchers who use SQANTI3 to refine their transcriptomes and will allow further research to improve the MLfilter.	es_ES
dc.format.extent	76	es_ES
dc.language	Inglés	es_ES
dc.publisher	Universitat Politècnica de València	es_ES
dc.rights	Reconocimiento - No comercial - Compartir igual (by-nc-sa)	es_ES
dc.subject	Transcriptómica	es_ES
dc.subject	Lecturas largas	es_ES
dc.subject	Isoformas	es_ES
dc.subject	Transcriptomics	es_ES
dc.subject	Long-reads	es_ES
dc.subject	Isoforms	es_ES
dc.subject.classification	BIOQUIMICA Y BIOLOGIA MOLECULAR	es_ES
dc.subject.other	Grado en Biotecnología-Grau en Biotecnologia	es_ES
dc.title	Evaluation and improvement of quality control methods for long read-defined transcriptomes	es_ES
dc.title.alternative	Evaluación y mejora de métodos de control de calidad de transcriptomas elaborados a partir de lecturas largas	es_ES
dc.title.alternative	Evaluació i millora de mètodes de control de qualitat de transcriptomes elaborats a partir de lectures llargues	es_ES
dc.type	Proyecto/Trabajo fin de carrera/grado	es_ES
dc.rights.accessRights	Abierto	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Departamento de Biotecnología - Departament de Biotecnologia	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Escuela Técnica Superior de Ingeniería Agronómica y del Medio Natural - Escola Tècnica Superior d'Enginyeria Agronòmica i del Medi Natural	es_ES
dc.description.bibliographicCitation	Estevan Morió, E. (2022). Evaluation and improvement of quality control methods for long read-defined transcriptomes. Universitat Politècnica de València. http://hdl.handle.net/10251/185371	es_ES
dc.description.accrualMethod	TFGM	es_ES
dc.relation.pasarela	TFGM\149663	es_ES

Este ítem aparece en la(s) siguiente(s) colección(ones)

ETSIAMN - Trabajos académicos [3284]
Escuela Técnica Superior de Ingeniería Agronómica y del Medio Natural

Mostrar el registro sencillo del ítem

Evaluation and improvement of quality control methods for long read-defined transcriptomes

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Buscar en RiuNet

Listar

Todo RiuNet

Esta colección

Mi cuenta

Estadísticas

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Evaluation and improvement of quality control methods for long read-defined transcriptomes

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)