Multimodal output combination for transcribing historical handwritten documents

Granell Romero, Emilio; Martínez-Hinarejos, Carlos-D.

doi:10.1007/978-3-319-23192-1_21

Identificarse

Buscar en RiuNet

Listar

Todo RiuNet
Esta colección

Mi cuenta

Acceder

Estadísticas

Ver Estadísticas de uso

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Multimodal output combination for transcribing historical handwritten documents

Mostrar el registro sencillo del ítem

Ficheros en el ítem

Nombre: paper62.pdf

Tamaño: 500.9Kb

Formato: PDF

Descripción: Versión del Autor.

Abrir

Nombre: 92560021.pdf

Tamaño: 845.3Kb

Formato: PDF

Descripción: Versión editorial

Solicitar una copia al autor

dc.contributor.author	Granell Romero, Emilio	es_ES
dc.contributor.author	Martínez-Hinarejos, Carlos-D.	es_ES
dc.date.accessioned	2016-06-13T09:46:27Z
dc.date.available	2016-06-13T09:46:27Z
dc.date.issued	2015-08-25
dc.identifier.isbn	978-3-319-23117-4
dc.identifier.issn	1611-3349
dc.identifier.issn	0302-9743
dc.identifier.uri	http://hdl.handle.net/10251/65730
dc.description	The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-23192-1_21	es_ES
dc.description.abstract	Transcription of digitalised historical documents is an interesting task in the document analysis area. This transcription can be achieved by using Handwritten Text Recognition (HTR) on digitalised pages or by using Automatic Speech Recognition (ASR) on the dictation of contents. Moreover, another option is using both systems in a multimodal combination to obtain a draft transcription, given that combining the outputs of different recognition systems will generally improve the recognition accuracy. In this work, we present a new combination method based on Confusion Network. We check its effectiveness for transcribing a Spanish historical book. Results on both unimodal combination with different optical (for HTR) and acoustic (for ASR) models, and multimodal combination, show a relative reduction of Word and Character Error Rate of 14.3% and 16.6%, respectively, over the HTR baseline.	es_ES
dc.description.sponsorship	Work partially supported by European Union -7th FP, under grant 600707 (tranScriptorium), and by the Spanish MEC under projects STraDA (TIN2012-37475-C02-01), Active2Trans (TIN2012-31723), and SmartWays (RTC-2014-1466-4).	es_ES
dc.format.extent	15	es_ES
dc.language	Inglés	es_ES
dc.publisher	Springer	es_ES
dc.relation.ispartof	Computer Analysis of Images and Patterns	es_ES
dc.relation.ispartofseries	Lecture Notes in Computer Science;9256
dc.rights	Reserva de todos los derechos	es_ES
dc.subject	Document analysis and transcription	es_ES
dc.subject	Handwritten text recognition	es_ES
dc.subject	Automatic speech recognition	es_ES
dc.subject	Confusion Networks combination	es_ES
dc.subject	Recognition outputs combination	es_ES
dc.subject.classification	LENGUAJES Y SISTEMAS INFORMATICOS	es_ES
dc.title	Multimodal output combination for transcribing historical handwritten documents	es_ES
dc.type	Capítulo de libro	es_ES
dc.type	Comunicación en congreso	es_ES
dc.identifier.doi	10.1007/978-3-319-23192-1_21
dc.relation.projectID	info:eu-repo/grantAgreement/EC/FP7/600707/EU/tranScriptorium/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/MINECO//TIN2012-37475-C02-01/ES/SEARCH IN TRANSCRIBED MANUSCRIPTS AND DOCUMENT AUGMENTATION/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/MINECO//TIN2012-31723/ES/INTERACCION ACTIVA PARA TRANSCRIPCION DE HABLA Y TRADUCCION/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/MINECO//RTC-2014-1466-4Q4618002BC.VALENCIANA/ES/SMART WAYS - DESARROLLO DE UNA PLATAFORMA TECNOLÓGICA ORIENTADA A LA EFICIENCIA DE LOS RECURSOS EN EL CAMPO DE LAS NUEVAS TECNOLOGÍAS INTERNET OF THINGS/	es_ES
dc.rights.accessRights	Abierto	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació	es_ES
dc.description.bibliographicCitation	Granell Romero, E.; Martínez-Hinarejos, C. (2015). Multimodal output combination for transcribing historical handwritten documents. En Computer Analysis of Images and Patterns. Springer. 246-260. https://doi.org/10.1007/978-3-319-23192-1_21	es_ES
dc.description.accrualMethod	S	es_ES
dc.relation.conferencename	16th International Conference on Computer Analysis of Images and Patterns (CAIP 2015)	es_ES
dc.relation.conferencedate	September 2-4, 2015	es_ES
dc.relation.conferenceplace	Valletta, Malta	es_ES
dc.relation.publisherversion	http://link.springer.com/chapter/10.1007/978-3-319-23192-1_21	es_ES
dc.description.upvformatpinicio	246	es_ES
dc.description.upvformatpfin	260	es_ES
dc.type.version	info:eu-repo/semantics/publishedVersion	es_ES
dc.relation.senia	292948	es_ES
dc.contributor.funder	European Commission	es_ES
dc.contributor.funder	Ministerio de Ciencia e Innovación	es_ES
dc.description.references	Alabau, V., Martínez-Hinarejos, C.D., Romero, V., Lagarda, A.L.: An iterative multimodal framework for the transcription of handwritten historical documents. Pattern Recognition Letters 35, 195–203 (2014)	es_ES
dc.description.references	Bertolami, R., Halter, B., Bunke, H.: Combination of multiple handwritten text line recognition systems with a recursive approach. In: Proc. Int. Conf. Frontiers Handwriting Recognition, pp. 61–65 (2006)	es_ES
dc.description.references	Bisani, M., Ney, H.: Bootstrap estimates for confidence intervals in ASR performance evaluation. In: Proc. of Int. Conf. on Acoustics, Speech and Signal Processing, vol. 1, pp. 409–412 (2004)	es_ES
dc.description.references	Collobert, R., Bengio, S., Mariéthoz, J.: Torch: a modular machine learning software library. Tech. rep., IDIAP-RR 02–46, IDIAP (2002)	es_ES
dc.description.references	Dreuw, P., Jonas, S., Ney, H.: White-space models for offline Arabic handwriting recognition. In: Proc. of Int. Conf. on Pattern Recognition, pp. 1–4 (2008)	es_ES
dc.description.references	Hermansky, H., Ellis, D.P., Sharma, S.: Tandem connectionist feature extraction for conventional HMM systems. In: Proc. of Int. Conf. Acoustics, Speech and Signal Processing, vol. 3, pp. 1635–1638 (2000)	es_ES
dc.description.references	Ishimaru, S., Nishizaki, H., Sekiguchi, Y.: Effect of confusion network combination on speech recognition system for editing. In: Proc. of APSIPA Annual Summit and Conf., vol. 4, pp. 1–4 (2011)	es_ES
dc.description.references	Johnson, D.: ICSI Quicknet soft package (2004). http://www1.icsi.berkeley.edu/Speech/qn.html	es_ES
dc.description.references	Kneser, R., Ney, H.: Improved backing-off for m-gram language modeling. In: Proc. of Int. Conf. Acoustics, Speech and Signal Processing, vol. 1, pp. 181–184 (1995)	es_ES
dc.description.references	Krishnamurthy, H.K.: Study of algorithms to combine multiple automatic speech recognition (ASR) system outputs. Master’s thesis, Department of Electrical and Computer Engineering (2009). http://hdl.handle.net/2047/d10019273	es_ES
dc.description.references	Luján-Mares, M., Tamarit, V., Alabau, V., Martínez-Hinarejos, C.D., Pastor i Gadea, M., Sanchis, A., Toselli, A.H.: iATROS: a speech and handwritting recognition system. In: V Jornadas en Tecnologías del Habla (VJTH2008), pp. 75–78 (2008)	es_ES
dc.description.references	Moreno, A., Poch, D., Bonafonte, A., Lleida, E., Llisterri, J., Mariño, J.B., Nadeu, C.: Albayzin speech database: design of the phonetic corpus. In: Proc. of EuroSpeech 1993, pp. 175–178 (1993)	es_ES
dc.description.references	Netpbm home page. http://netpbm.sourceforge.net/	es_ES
dc.description.references	Plamondon, R., Srihari, S.N.: On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(1), 63–84 (2000)	es_ES
dc.description.references	Romero, V., Leiva, L.A., Toselli, A.H., Vidal, E.: Interactive multimodal transcription of text images using a web-based demo system. In: Proc. of Conf. on Intelligent User Interfaces, pp. 477–478 (2009)	es_ES
dc.description.references	Serrano, N., Castro, F., Juan, A.: The RODRIGO Database. In: Proc. of Language Resources and Evaluation Conference, pp. 2709–2712 (2010)	es_ES
dc.description.references	Stolcke, A.: SRILM - an extensible language modeling toolkit. In: Proc. Interspeech, pp. 901–904 (2002)	es_ES
dc.description.references	Woodruff, P., Dupont, S.: Bimodal combination of speech and handwriting for improved word recognition. In: Proc. of EUSIPCO 2005, pp. 1918–1921 (2005)	es_ES
dc.description.references	Xue, J., Zhao, Y.: Improved confusion network algorithm and shortest path search from word lattice. In: Proc. of Int. Conf. in Acoustics, Speech and Signal Processing, vol. 1, pp. 853–856 (2005)	es_ES
dc.description.references	Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., et al.: The HTK book (for HTK version 3.4). Cambridge university Eng. Dept. (2006)	es_ES
dc.description.references	Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to information retrieval. Transactions on Information Systems 22(2), 179–214 (2004)	es_ES

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem

Multimodal output combination for transcribing historical handwritten documents

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Buscar en RiuNet

Listar

Todo RiuNet

Esta colección

Mi cuenta

Estadísticas

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Multimodal output combination for transcribing historical handwritten documents

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)