Automatic Speech Recognition and Machine Translation with Deep Neural Networks for Open Educational Resources, Parliamentary Contents and Broadcast Media

Garcés Díaz-Munío, Gonzalo Vicente

doi:10.4995/Thesis/10251/212454

Identificarse

Buscar en RiuNet

Listar

Todo RiuNet
Esta colección

Mi cuenta

Acceder

Estadísticas

Ver Estadísticas de uso

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Automatic Speech Recognition and Machine Translation with Deep Neural Networks for Open Educational Resources, Parliamentary Contents and Broadcast Media

Mostrar el registro sencillo del ítem

Ficheros en el ítem

Nombre: Garces - Automatic ...

Tamaño: 2.076Mb

Formato: PDF

Abrir

Nombre: contents.pdf

Tamaño: 88.31Kb

Formato: PDF

Abrir

Nombre: abstract.pdf

Tamaño: 119.2Kb

Formato: PDF

Abrir

dc.contributor.advisor	Civera Saiz, Jorge	es_ES
dc.contributor.advisor	Juan Císcar, Alfonso	es_ES
dc.contributor.author	Garcés Díaz-Munío, Gonzalo Vicente	es_ES
dc.date.accessioned	2024-11-29T08:52:12Z
dc.date.available	2024-11-29T08:52:12Z
dc.date.created	2024-10-15
dc.date.issued	2024-11-25	es_ES
dc.identifier.uri	http://hdl.handle.net/10251/212454
dc.description.abstract	[ES] En la última década, el reconocimiento automático del habla (RAH) y la traducción automática (TA) han mejorado enormemente mediante el uso de modelos de redes neuronales profundas (RNP) en constante evolución. Si a principios de los 2010 los sistemas de RAH y TA previos a las RNP llegaron a afrontar con éxito algunas aplicaciones reales como la transcripción y traducción de vídeos docentes pregrabados, ahora en los 2020 son abordables aplicaciones que suponen un reto mucho mayor, como la subtitulación de retransmisiones audiovisuales en directo. En este mismo período, se están invirtiendo cada vez mayores esfuerzos en la accesibilidad a los medios audiovisuales para todos, incluidas las personas sordas. El RAH y la TA, en su estado actual, son grandes herramientas para aumentar la disponibilidad de medidas de accesibilidad como subtítulos, transcripciones y traducciones, y también para proporcionar acceso multilingüe a todo tipo de contenidos. En esta tesis doctoral presentamos resultados de investigación sobre RAH y TA basadas en RNP en tres campos muy activos: los recursos educativos abiertos, los contenidos parlamentarios y los medios audiovisuales. En el área de los recursos educativos abiertos (REA), presentamos primeramente trabajos sobre la evaluación y postedición de RAH y TA con métodos de interacción inteligente, en el marco del proyecto de investigación europeo "transLectures: Transcripción y Traducción de Vídeos Docentes". Los resultados obtenidos confirman que la interacción inteligente puede reducir aún más el esfuerzo de postedición de transcripciones y traducciones automáticas. Seguidamente, en el contexto del posterior proyecto europeo X5gon, presentamos una investigación sobre el desarrollo de sistemas de TA neuronal basados en RNP, y sobre sacar el máximo partido de corpus de TA masivos mediante filtrado automático de datos. Este trabajo dio como resultado sistemas de TA neuronal clasificados entre los mejores en una competición internacional de TA, y mostramos cómo estos nuevos sistemas mejoraron la calidad de los subtítulos multilingües en casos reales de REA. En el ámbito también en crecimiento de las tecnologías del lenguaje para contenidos parlamentarios, describimos una investigación sobre técnicas de filtrado de datos de habla para el RAH en tiempo real en el contexto de debates del Parlamento Europeo. Esta investigación permitió la publicación de Europarl-ASR, un nuevo y amplio corpus de habla para entrenamiento y evaluación de sistemas de RAH en continuo, así como para la evaluación comparativa de técnicas de filtrado de datos de habla. Finalmente, presentamos un trabajo en un ámbito en la vanguardia tecnológica del RAH y de la TA: la subtitulación de retransmisiones audiovisuales en directo, en el marco del Convenio de colaboración I+D+i 2020-2023 entre la radiotelevisión pública valenciana À Punt y la Universitat Politècnica de València para la subtitulación asistida por ordenador de contenidos audiovisuales en tiempo real. Esta investigación ha resultado en la implantación de sistemas de RAH en tiempo real, de alta precisión y baja latencia, para una lengua no mayoritaria en el mundo (el catalán) y una de las lenguas más habladas del mundo (el castellano) en un medio audiovisual real.	es_ES
dc.description.abstract	[CA] En l'última dècada, el reconeixement automàtic de la parla (RAP) i la traducció automàtica (TA) han millorat enormement mitjançant l'ús de models de xarxes neuronals profundes (XNP) en constant evolució. Si a principis dels 2010 els sistemes de RAP i TA previs a les XNP van arribar a afrontar amb èxit algunes aplicacions reals com la transcripció i traducció de vídeos docents pregravats, ara en els 2020 són abordables aplicacions que suposen un repte molt major, com la subtitulació de retransmissions audiovisuals en directe. En aquest mateix període, s'estan invertint cada vegada majors esforços en l'accessibilitat als mitjans audiovisuals per a tots, incloses les persones sordes. El RAP i la TA, en el seu estat actual, són grans eines per a incrementar la disponibilitat de mesures d'accessibilitat com subtítols, transcripcions i traduccions, també com una manera de proporcionar accés multilingüe a tota classe de continguts. En aquesta tesi doctoral presentem resultats d'investigació sobre RAP i TA basades en XNP en tres camps molt actius: els recursos educatius oberts, els continguts parlamentaris i els mitjans audiovisuals. En l'àrea dels recursos educatius oberts (REO), presentem primerament treballs sobre l'avaluació i postedició de RAP i TA amb mètodes d'interacció intel·ligent, en el marc del projecte d'investigació europeu "transLectures: Transcripció i traducció de vídeos docents". Els resultats obtinguts confirmen que la interacció intel·ligent pot reduir encara més l'esforç de postedició de transcripcions i traduccions automàtiques. Seguidament, en el context del posterior projecte europeu X5gon, presentem una investigació sobre el desenvolupament de sistemes de TA neuronal basats en XNP, i sobre traure el màxim partit de corpus de TA massius mitjançant filtratge automàtic de dades. Aquest treball va donar com a resultat sistemes de TA neuronal classificats entre els millors en una competició internacional de TA, i mostrem com aquests nous sistemes milloren la qualitat dels subtítols multilingües en casos reals de REO. En l'àmbit també en creixement de les tecnologies del llenguatge per a continguts parlamentaris, descrivim una investigació sobre tècniques de filtratge de dades de parla per al RAP en temps real en el context de debats del Parlament Europeu. Aquesta investigació va permetre la publicació d'Europarl-ASR, un corpus de parla nou i ampli per a l'entrenament i l'avaluació de sistemes de RAP en continu, així com per a l'avaluació comparativa de tècniques de filtratge de dades de parla. Finalment, presentem un treball en un àmbit en l'avantguarda tecnològica del RAP i de la TA: la subtitulació de retransmissions audiovisuals en directe, en el context del Conveni de col·laboració R+D+i 2020-2023 entre la radiotelevisió pública valenciana À Punt i la Universitat Politècnica de València per a la subtitulació assistida per ordinador de continguts audiovisuals en temps real. Aquesta investigació ha donat com a resultat la implantació de sistemes de RAP en temps real, amb alta precisió i baixa latència, per a una llengua no majoritària en el món (el català) i una de les llengües més parlades del món (el castellà) en un mitjà audiovisual real.	es_ES
dc.description.abstract	[EN] In the last decade, automatic speech recognition (ASR) and machine translation (MT) have improved enormously through the use of constantly evolving deep neural network (DNN) models. If at the beginning of the 2010s the then pre-DNN ASR and MT systems were ready to tackle with success some real-life applications such as offline video lecture transcription and translation, now in the 2020s much more challenging applications are within grasp, such as live broadcast media subtitling. At the same time in this period, media accessibility for everyone, including deaf and hard-of-hearing people, is being given more and more importance. ASR and MT, in their current state, are powerful tools to increase the coverage of accessibility measures such as subtitles, transcriptions and translations, also as a way of providing multilingual access to all types of content. In this PhD thesis, we present research results on automatic speech recognition and machine translation based on deep neural networks in three very active domains: open educational resources, parliamentary contents and broadcast media. Regarding open educational resources (OER), we first present work on the evaluation and post-editing of ASR and MT with intelligent interaction approaches, as carried out in the framework of EU project transLectures: Transcription and Translation of Video Lectures. The results obtained confirm that the intelligent interaction approach can make post-editing automatic transcriptions and translations even more cost-effective. Then, in the context of subsequent EU project X5gon, we present research on developing DNN-based neural MT systems, and making the most of larger MT corpora through automatic data filtering. This work resulted in a first-rank classification in an international evaluation campaign on MT, and we show how these new NMT systems improved the quality of multilingual subtitles in real OER scenarios. In the also growing domain of language technologies for parliamentary contents, we describe research on speech data curation techniques for streaming ASR in the context of European Parliament debates. This research resulted in the release of Europarl-ASR, a new, large speech corpus for streaming ASR system training and evaluation, as well as for the benchmarking of speech data curation techniques. Finally, we present work in a domain on the edge of the state of the art for ASR and MT: the live subtitling of broadcast media, in the context of the 2020-2023 R&D collaboration agreement between the Valencian public broadcaster À Punt and the Universitat Politècnica de València for real-time computer assisted subtitling of media contents. This research has resulted in the deployment of high-quality, low-latency, real-time streaming ASR systems for a less-spoken language (Catalan) and a widely spoken language (Spanish) in a real broadcast use case.	es_ES
dc.description.sponsorship	The research leading to these results has received funding from the European Union’s Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 287755 (transLectures), Competitiveness and Innovation Framework Programme (CIP) under grant agreement no. 621030 (EMMA), Horizon 2020 research and innovation programme under grant agreements no. 761758 (X5gon) and no. 952215 (TAILOR), and EU4Health Programme 2021–2027 as part of Europe’s Beating Cancer Plan under grant agreements no. 101056995 (INTERACT-EUROPE) and no. 101129375 (INTERACT-EUROPE 100); from the Government of Spain’s research projects iTrans2 (ref. TIN2009-14511, MICINN/ERDF EU), MORE (ref. TIN2015-68326-R,MINECO/ERDF EU), Multisub (ref. RTI2018-094879-B-I00, MCIN/AEI/10.13039/501100011033 ERDF “A way of making Europe”), and XLinDub (ref. PID2021-122443OB-I00, MCIN/AEI/10.13039/501100011033 ERDF “A way of making Europe”); from the Generalitat Valenciana’s “R&D collaboration agreement between the Corporació Valenciana de Mitjans de Comunicació (À Punt Mèdia) and the Universitat Politècnica de València (UPV) for real-time computer assisted subtitling of audiovisual contents based on artificial intelligence”, and research project Classroom Activity Recognition (PROMETEO/2019/111); and from the Universitat Politècnica de València’s PAID-01-17 R&D support programme. This work uses data from the RTVE 2018 and 2020 Databases. This set of data has been provided by RTVE Corporation to help develop Spanish-language speech technologies.	es_ES
dc.format.extent	202	es_ES
dc.language	Inglés	es_ES
dc.publisher	Universitat Politècnica de València	es_ES
dc.relation	info:eu-repo/grantAgreement///PROMETEO%2F2019%2F111//CLASSROOM ACTIVITY RECOGNITION/	es_ES
dc.relation	info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/PID2021-122443OB-I00/ES/DOBLAJE AUTOMATICO CROSLINGUE EN TIEMPO REAL DE CONTENIDO EDUCATIVO Y PARLAMENTARIO/	es_ES
dc.relation	info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/RTI2018-094879-B-I00/ES/SUBTITULACION MULTILINGUE DE CLASES DE AULA Y SESIONES PLENARIAS/	es_ES
dc.relation	info:eu-repo/grantAgreement/MICINN//TIN2009-14511/ES/Traduccion De Textos Y Transcripcion De Voz Interactivas/	es_ES
dc.relation	info:eu-repo/grantAgreement/MINECO//TIN2015-68326-R/ES/RECURSOS MULTILINGUES ABIERTOS PARA EDUCACION/	es_ES
dc.rights	Reserva de todos los derechos	es_ES
dc.subject	Automatic Speech Recognition	es_ES
dc.subject	Neural Machine Translation	es_ES
dc.subject	Streaming	es_ES
dc.subject	Deep Neural Networks	es_ES
dc.subject	Open Educational Resources	es_ES
dc.subject	Parliamentary contents	es_ES
dc.subject	Live broadcast media subtitling	es_ES
dc.subject	Speech data filtering	es_ES
dc.subject	Speech data verbatimization	es_ES
dc.subject	Europarl-ASR speech corpus	es_ES
dc.subject.classification	LENGUAJES Y SISTEMAS INFORMATICOS	es_ES
dc.title	Automatic Speech Recognition and Machine Translation with Deep Neural Networks for Open Educational Resources, Parliamentary Contents and Broadcast Media	es_ES
dc.type	Tesis doctoral	es_ES
dc.identifier.doi	10.4995/Thesis/10251/212454	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/EU/H2020/952215/EU/Foundations of Trustworthy AI - Integrating Reasoning, Learning and Optimization/TAILOR	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/EC/H2020/761758/EU/X5gon: Cross Modal, Cross Cultural, Cross Lingual, Cross Domain, and Cross Site Global OER Network/X5gon	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/EC/287755/EU/Transcription and Translation of Video Lectures/TRANSLECTURES	es_ES
dc.rights.accessRights	Abierto	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació	es_ES
dc.description.bibliographicCitation	Garcés Díaz-Munío, GV. (2024). Automatic Speech Recognition and Machine Translation with Deep Neural Networks for Open Educational Resources, Parliamentary Contents and Broadcast Media [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/212454	es_ES
dc.description.accrualMethod	TESIS	es_ES
dc.type.version	info:eu-repo/semantics/acceptedVersion	es_ES
dc.relation.pasarela	TESIS\12900	es_ES
dc.contributor.funder	Universitat Politècnica de València	es_ES
dc.contributor.funder	Generalitat Valenciana	es_ES
dc.contributor.funder	Agencia Estatal de Investigación	es_ES
dc.contributor.funder	Ministerio de Economía y Competitividad	es_ES
dc.contributor.funder	European Commision	es_ES
dc.subject.ods	04.- Garantizar una educación de calidad inclusiva y equitativa, y promover las oportunidades de aprendizaje permanente para todos	es_ES
dc.subject.ods	10.- Reducir las desigualdades entre países y dentro de ellos	es_ES
dc.subject.ods	16.- Promover sociedades pacíficas e inclusivas para el desarrollo sostenible, facilitar acceso a la justicia para todos y crear instituciones eficaces, responsables e inclusivas a todos los niveles	es_ES

Este ítem aparece en la(s) siguiente(s) colección(ones)

Tesis doctorales [5399]

Mostrar el registro sencillo del ítem

Automatic Speech Recognition and Machine Translation with Deep Neural Networks for Open Educational Resources, Parliamentary Contents and Broadcast Media

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Buscar en RiuNet

Listar

Todo RiuNet

Esta colección

Mi cuenta

Estadísticas

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Automatic Speech Recognition and Machine Translation with Deep Neural Networks for Open Educational Resources, Parliamentary Contents and Broadcast Media

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)

Ítems relacionados