Transcription of Spanish Historical Handwritten Documents with Deep Neural Networks

Granell, Emilio; Chammas, Edgard; Likforman-Sulem, Laurence; Martínez-Hinarejos, Carlos-D.; Mokbel, Chafic; Cirstea, Bogdan-Ionut

doi:10.3390/jimaging4010015

Identificarse

Buscar en RiuNet

Listar

Todo RiuNet
Esta colección

Mi cuenta

Acceder

Estadísticas

Ver Estadísticas de uso

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Transcription of Spanish Historical Handwritten Documents with Deep Neural Networks

Mostrar el registro sencillo del ítem

Ficheros en el ítem

Nombre: jimaging-04-00015 ...

Tamaño: 1.194Mb

Formato: PDF

Descripción: Versión editorial

Abrir

dc.contributor.author	Granell, Emilio	es_ES
dc.contributor.author	Chammas, Edgard	es_ES
dc.contributor.author	Likforman-Sulem, Laurence	es_ES
dc.contributor.author	Martínez-Hinarejos, Carlos-D.	es_ES
dc.contributor.author	Mokbel, Chafic	es_ES
dc.contributor.author	Cirstea, Bogdan-Ionut	es_ES
dc.date.accessioned	2019-05-18T20:39:02Z
dc.date.available	2019-05-18T20:39:02Z
dc.date.issued	2018	es_ES
dc.identifier.uri	http://hdl.handle.net/10251/120670
dc.description.abstract	[EN] The digitization of historical handwritten document images is important for the preservation of cultural heritage. Moreover, the transcription of text images obtained from digitization is necessary to provide efficient information access to the content of these documents. Handwritten Text Recognition (HTR) has become an important research topic in the areas of image and computational language processing that allows us to obtain transcriptions from text images. State-of-the-art HTR systems are, however, far from perfect. One difficulty is that they have to cope with image noise and handwriting variability. Another difficulty is the presence of a large amount of Out-Of-Vocabulary (OOV) words in ancient historical texts. A solution to this problem is to use external lexical resources, but such resources might be scarce or unavailable given the nature and the age of such documents. This work proposes a solution to avoid this limitation. It consists of associating a powerful optical recognition system that will cope with image noise and variability, with a language model based on sub-lexical units that will model OOV words. Such a language modeling approach reduces the size of the lexicon while increasing the lexicon coverage. Experiments are first conducted on the publicly available Rodrigo dataset, which contains the digitization of an ancient Spanish manuscript, with a recognizer based on Hidden Markov Models (HMMs). They show that sub-lexical units outperform word units in terms of Word Error Rate (WER), Character Error Rate (CER) and OOV word accuracy rate. This approach is then applied to deep net classifiers, namely Bi-directional Long-Short Term Memory (BLSTMs) and Convolutional Recurrent Neural Nets (CRNNs). Results show that CRNNs outperform HMMs and BLSTMs, reaching the lowest WER and CER for this image dataset and significantly improving OOV recognition.	es_ES
dc.description.sponsorship	Work partially supported by projects READ: Recognition and Enrichment of Archival Documents - 674943 (European Union's H2020) and CoMUN-HaT: Context, Multimodality and User Collaboration in Handwritten Text Processing - TIN2015-70924-C2-1-R (MINECO/FEDER), and a DGA-MRIS (Direction Generale de l'Armement - Mission pour la Recherche et l'Innovation Scientifique) scholarship.	es_ES
dc.language	Inglés	es_ES
dc.publisher	MDPI AG	es_ES
dc.relation.ispartof	Journal of imaging	es_ES
dc.rights	Reconocimiento (by)	es_ES
dc.subject	Character-level language model	es_ES
dc.subject	Historical handwritten transcription	es_ES
dc.subject	Out-of-vocabulary word recognition	es_ES
dc.subject	Word structure retrieval	es_ES
dc.subject.classification	LENGUAJES Y SISTEMAS INFORMATICOS	es_ES
dc.title	Transcription of Spanish Historical Handwritten Documents with Deep Neural Networks	es_ES
dc.type	Artículo	es_ES
dc.identifier.doi	10.3390/jimaging4010015	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/EC/H2020/674943/EU/Recognition and Enrichment of Archival Documents/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/MINECO//TIN2015-70924-C2-1-R/ES/CONTEXTO, MULTIMODALIDAD Y COLABORACION DEL USUARIO EN PROCESADO DE TEXTO MANUSCRITO/	es_ES
dc.rights.accessRights	Abierto	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació	es_ES
dc.description.bibliographicCitation	Granell, E.; Chammas, E.; Likforman-Sulem, L.; Martínez-Hinarejos, C.; Mokbel, C.; Cirstea, B. (2018). Transcription of Spanish Historical Handwritten Documents with Deep Neural Networks. Journal of imaging. 4(1). https://doi.org/10.3390/jimaging4010015	es_ES
dc.description.accrualMethod	S	es_ES
dc.relation.publisherversion	http://doi.org/ 10.3390/jimaging4010015	es_ES
dc.description.upvformatpinicio	15	es_ES
dc.type.version	info:eu-repo/semantics/publishedVersion	es_ES
dc.description.volume	4	es_ES
dc.description.issue	1	es_ES
dc.identifier.eissn	2313-433X	es_ES
dc.relation.pasarela	S\350247	es_ES
dc.contributor.funder	Ministerio de Economía, Industria y Competitividad	es_ES

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem

Transcription of Spanish Historical Handwritten Documents with Deep Neural Networks

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Buscar en RiuNet

Listar

Todo RiuNet

Esta colección

Mi cuenta

Estadísticas

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Transcription of Spanish Historical Handwritten Documents with Deep Neural Networks

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)