- -

Multimodal output combination for transcribing historical handwritten documents

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

Multimodal output combination for transcribing historical handwritten documents

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Granell Romero, Emilio es_ES
dc.contributor.author Martínez-Hinarejos, Carlos-D. es_ES
dc.date.accessioned 2016-06-13T09:46:27Z
dc.date.available 2016-06-13T09:46:27Z
dc.date.issued 2015-08-25
dc.identifier.isbn 978-3-319-23117-4
dc.identifier.issn 1611-3349
dc.identifier.issn 0302-9743
dc.identifier.uri http://hdl.handle.net/10251/65730
dc.description The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-23192-1_21 es_ES
dc.description.abstract Transcription of digitalised historical documents is an interesting task in the document analysis area. This transcription can be achieved by using Handwritten Text Recognition (HTR) on digitalised pages or by using Automatic Speech Recognition (ASR) on the dictation of contents. Moreover, another option is using both systems in a multimodal combination to obtain a draft transcription, given that combining the outputs of different recognition systems will generally improve the recognition accuracy. In this work, we present a new combination method based on Confusion Network. We check its effectiveness for transcribing a Spanish historical book. Results on both unimodal combination with different optical (for HTR) and acoustic (for ASR) models, and multimodal combination, show a relative reduction of Word and Character Error Rate of 14.3% and 16.6%, respectively, over the HTR baseline. es_ES
dc.description.sponsorship Work partially supported by European Union -7th FP, under grant 600707 (tranScriptorium), and by the Spanish MEC under projects STraDA (TIN2012-37475-C02-01), Active2Trans (TIN2012-31723), and SmartWays (RTC-2014-1466-4). es_ES
dc.format.extent 15 es_ES
dc.language Inglés es_ES
dc.publisher Springer es_ES
dc.relation.ispartof Computer Analysis of Images and Patterns es_ES
dc.relation.ispartofseries Lecture Notes in Computer Science;9256
dc.rights Reserva de todos los derechos es_ES
dc.subject Document analysis and transcription es_ES
dc.subject Handwritten text recognition es_ES
dc.subject Automatic speech recognition es_ES
dc.subject Confusion Networks combination es_ES
dc.subject Recognition outputs combination es_ES
dc.subject.classification LENGUAJES Y SISTEMAS INFORMATICOS es_ES
dc.title Multimodal output combination for transcribing historical handwritten documents es_ES
dc.type Capítulo de libro es_ES
dc.type Comunicación en congreso es_ES
dc.identifier.doi 10.1007/978-3-319-23192-1_21
dc.relation.projectID info:eu-repo/grantAgreement/EC/FP7/600707/EU/tranScriptorium/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/MINECO//TIN2012-37475-C02-01/ES/SEARCH IN TRANSCRIBED MANUSCRIPTS AND DOCUMENT AUGMENTATION/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/MINECO//TIN2012-31723/ES/INTERACCION ACTIVA PARA TRANSCRIPCION DE HABLA Y TRADUCCION/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/MINECO//RTC-2014-1466-4Q4618002BC.VALENCIANA/ES/SMART WAYS - DESARROLLO DE UNA PLATAFORMA TECNOLÓGICA ORIENTADA A LA EFICIENCIA DE LOS RECURSOS EN EL CAMPO DE LAS NUEVAS TECNOLOGÍAS INTERNET OF THINGS/ es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació es_ES
dc.description.bibliographicCitation Granell Romero, E.; Martínez-Hinarejos, C. (2015). Multimodal output combination for transcribing historical handwritten documents. En Computer Analysis of Images and Patterns. Springer. 246-260. https://doi.org/10.1007/978-3-319-23192-1_21 es_ES
dc.description.accrualMethod S es_ES
dc.relation.conferencename 16th International Conference on Computer Analysis of Images and Patterns (CAIP 2015) es_ES
dc.relation.conferencedate September 2-4, 2015 es_ES
dc.relation.conferenceplace Valletta, Malta es_ES
dc.relation.publisherversion http://link.springer.com/chapter/10.1007/978-3-319-23192-1_21 es_ES
dc.description.upvformatpinicio 246 es_ES
dc.description.upvformatpfin 260 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.relation.senia 292948 es_ES
dc.contributor.funder European Commission es_ES
dc.contributor.funder Ministerio de Ciencia e Innovación es_ES
dc.description.references Alabau, V., Martínez-Hinarejos, C.D., Romero, V., Lagarda, A.L.: An iterative multimodal framework for the transcription of handwritten historical documents. Pattern Recognition Letters 35, 195–203 (2014) es_ES
dc.description.references Bertolami, R., Halter, B., Bunke, H.: Combination of multiple handwritten text line recognition systems with a recursive approach. In: Proc. Int. Conf. Frontiers Handwriting Recognition, pp. 61–65 (2006) es_ES
dc.description.references Bisani, M., Ney, H.: Bootstrap estimates for confidence intervals in ASR performance evaluation. In: Proc. of Int. Conf. on Acoustics, Speech and Signal Processing, vol. 1, pp. 409–412 (2004) es_ES
dc.description.references Collobert, R., Bengio, S., Mariéthoz, J.: Torch: a modular machine learning software library. Tech. rep., IDIAP-RR 02–46, IDIAP (2002) es_ES
dc.description.references Dreuw, P., Jonas, S., Ney, H.: White-space models for offline Arabic handwriting recognition. In: Proc. of Int. Conf. on Pattern Recognition, pp. 1–4 (2008) es_ES
dc.description.references Hermansky, H., Ellis, D.P., Sharma, S.: Tandem connectionist feature extraction for conventional HMM systems. In: Proc. of Int. Conf. Acoustics, Speech and Signal Processing, vol. 3, pp. 1635–1638 (2000) es_ES
dc.description.references Ishimaru, S., Nishizaki, H., Sekiguchi, Y.: Effect of confusion network combination on speech recognition system for editing. In: Proc. of APSIPA Annual Summit and Conf., vol. 4, pp. 1–4 (2011) es_ES
dc.description.references Johnson, D.: ICSI Quicknet soft package (2004). http://www1.icsi.berkeley.edu/Speech/qn.html es_ES
dc.description.references Kneser, R., Ney, H.: Improved backing-off for m-gram language modeling. In: Proc. of Int. Conf. Acoustics, Speech and Signal Processing, vol. 1, pp. 181–184 (1995) es_ES
dc.description.references Krishnamurthy, H.K.: Study of algorithms to combine multiple automatic speech recognition (ASR) system outputs. Master’s thesis, Department of Electrical and Computer Engineering (2009). http://hdl.handle.net/2047/d10019273 es_ES
dc.description.references Luján-Mares, M., Tamarit, V., Alabau, V., Martínez-Hinarejos, C.D., Pastor i Gadea, M., Sanchis, A., Toselli, A.H.: iATROS: a speech and handwritting recognition system. In: V Jornadas en Tecnologías del Habla (VJTH2008), pp. 75–78 (2008) es_ES
dc.description.references Moreno, A., Poch, D., Bonafonte, A., Lleida, E., Llisterri, J., Mariño, J.B., Nadeu, C.: Albayzin speech database: design of the phonetic corpus. In: Proc. of EuroSpeech 1993, pp. 175–178 (1993) es_ES
dc.description.references Netpbm home page. http://netpbm.sourceforge.net/ es_ES
dc.description.references Plamondon, R., Srihari, S.N.: On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(1), 63–84 (2000) es_ES
dc.description.references Romero, V., Leiva, L.A., Toselli, A.H., Vidal, E.: Interactive multimodal transcription of text images using a web-based demo system. In: Proc. of Conf. on Intelligent User Interfaces, pp. 477–478 (2009) es_ES
dc.description.references Serrano, N., Castro, F., Juan, A.: The RODRIGO Database. In: Proc. of Language Resources and Evaluation Conference, pp. 2709–2712 (2010) es_ES
dc.description.references Stolcke, A.: SRILM - an extensible language modeling toolkit. In: Proc. Interspeech, pp. 901–904 (2002) es_ES
dc.description.references Woodruff, P., Dupont, S.: Bimodal combination of speech and handwriting for improved word recognition. In: Proc. of EUSIPCO 2005, pp. 1918–1921 (2005) es_ES
dc.description.references Xue, J., Zhao, Y.: Improved confusion network algorithm and shortest path search from word lattice. In: Proc. of Int. Conf. in Acoustics, Speech and Signal Processing, vol. 1, pp. 853–856 (2005) es_ES
dc.description.references Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., et al.: The HTK book (for HTK version 3.4). Cambridge university Eng. Dept. (2006) es_ES
dc.description.references Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to information retrieval. Transactions on Information Systems 22(2), 179–214 (2004) es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem