Mostrar el registro sencillo del ítem
dc.contributor.author | Granell Romero, Emilio | es_ES |
dc.contributor.author | Martínez-Hinarejos, Carlos-D. | es_ES |
dc.date.accessioned | 2016-06-13T09:46:27Z | |
dc.date.available | 2016-06-13T09:46:27Z | |
dc.date.issued | 2015-08-25 | |
dc.identifier.isbn | 978-3-319-23117-4 | |
dc.identifier.issn | 1611-3349 | |
dc.identifier.issn | 0302-9743 | |
dc.identifier.uri | http://hdl.handle.net/10251/65730 | |
dc.description | The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-23192-1_21 | es_ES |
dc.description.abstract | Transcription of digitalised historical documents is an interesting task in the document analysis area. This transcription can be achieved by using Handwritten Text Recognition (HTR) on digitalised pages or by using Automatic Speech Recognition (ASR) on the dictation of contents. Moreover, another option is using both systems in a multimodal combination to obtain a draft transcription, given that combining the outputs of different recognition systems will generally improve the recognition accuracy. In this work, we present a new combination method based on Confusion Network. We check its effectiveness for transcribing a Spanish historical book. Results on both unimodal combination with different optical (for HTR) and acoustic (for ASR) models, and multimodal combination, show a relative reduction of Word and Character Error Rate of 14.3% and 16.6%, respectively, over the HTR baseline. | es_ES |
dc.description.sponsorship | Work partially supported by European Union -7th FP, under grant 600707 (tranScriptorium), and by the Spanish MEC under projects STraDA (TIN2012-37475-C02-01), Active2Trans (TIN2012-31723), and SmartWays (RTC-2014-1466-4). | es_ES |
dc.format.extent | 15 | es_ES |
dc.language | Inglés | es_ES |
dc.publisher | Springer | es_ES |
dc.relation.ispartof | Computer Analysis of Images and Patterns | es_ES |
dc.relation.ispartofseries | Lecture Notes in Computer Science;9256 | |
dc.rights | Reserva de todos los derechos | es_ES |
dc.subject | Document analysis and transcription | es_ES |
dc.subject | Handwritten text recognition | es_ES |
dc.subject | Automatic speech recognition | es_ES |
dc.subject | Confusion Networks combination | es_ES |
dc.subject | Recognition outputs combination | es_ES |
dc.subject.classification | LENGUAJES Y SISTEMAS INFORMATICOS | es_ES |
dc.title | Multimodal output combination for transcribing historical handwritten documents | es_ES |
dc.type | Capítulo de libro | es_ES |
dc.type | Comunicación en congreso | es_ES |
dc.identifier.doi | 10.1007/978-3-319-23192-1_21 | |
dc.relation.projectID | info:eu-repo/grantAgreement/EC/FP7/600707/EU/tranScriptorium/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/MINECO//TIN2012-37475-C02-01/ES/SEARCH IN TRANSCRIBED MANUSCRIPTS AND DOCUMENT AUGMENTATION/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/MINECO//TIN2012-31723/ES/INTERACCION ACTIVA PARA TRANSCRIPCION DE HABLA Y TRADUCCION/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/MINECO//RTC-2014-1466-4Q4618002BC.VALENCIANA/ES/SMART WAYS - DESARROLLO DE UNA PLATAFORMA TECNOLÓGICA ORIENTADA A LA EFICIENCIA DE LOS RECURSOS EN EL CAMPO DE LAS NUEVAS TECNOLOGÍAS INTERNET OF THINGS/ | es_ES |
dc.rights.accessRights | Abierto | es_ES |
dc.contributor.affiliation | Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació | es_ES |
dc.description.bibliographicCitation | Granell Romero, E.; Martínez-Hinarejos, C. (2015). Multimodal output combination for transcribing historical handwritten documents. En Computer Analysis of Images and Patterns. Springer. 246-260. https://doi.org/10.1007/978-3-319-23192-1_21 | es_ES |
dc.description.accrualMethod | S | es_ES |
dc.relation.conferencename | 16th International Conference on Computer Analysis of Images and Patterns (CAIP 2015) | es_ES |
dc.relation.conferencedate | September 2-4, 2015 | es_ES |
dc.relation.conferenceplace | Valletta, Malta | es_ES |
dc.relation.publisherversion | http://link.springer.com/chapter/10.1007/978-3-319-23192-1_21 | es_ES |
dc.description.upvformatpinicio | 246 | es_ES |
dc.description.upvformatpfin | 260 | es_ES |
dc.type.version | info:eu-repo/semantics/publishedVersion | es_ES |
dc.relation.senia | 292948 | es_ES |
dc.contributor.funder | European Commission | es_ES |
dc.contributor.funder | Ministerio de Ciencia e Innovación | es_ES |
dc.description.references | Alabau, V., Martínez-Hinarejos, C.D., Romero, V., Lagarda, A.L.: An iterative multimodal framework for the transcription of handwritten historical documents. Pattern Recognition Letters 35, 195–203 (2014) | es_ES |
dc.description.references | Bertolami, R., Halter, B., Bunke, H.: Combination of multiple handwritten text line recognition systems with a recursive approach. In: Proc. Int. Conf. Frontiers Handwriting Recognition, pp. 61–65 (2006) | es_ES |
dc.description.references | Bisani, M., Ney, H.: Bootstrap estimates for confidence intervals in ASR performance evaluation. In: Proc. of Int. Conf. on Acoustics, Speech and Signal Processing, vol. 1, pp. 409–412 (2004) | es_ES |
dc.description.references | Collobert, R., Bengio, S., Mariéthoz, J.: Torch: a modular machine learning software library. Tech. rep., IDIAP-RR 02–46, IDIAP (2002) | es_ES |
dc.description.references | Dreuw, P., Jonas, S., Ney, H.: White-space models for offline Arabic handwriting recognition. In: Proc. of Int. Conf. on Pattern Recognition, pp. 1–4 (2008) | es_ES |
dc.description.references | Hermansky, H., Ellis, D.P., Sharma, S.: Tandem connectionist feature extraction for conventional HMM systems. In: Proc. of Int. Conf. Acoustics, Speech and Signal Processing, vol. 3, pp. 1635–1638 (2000) | es_ES |
dc.description.references | Ishimaru, S., Nishizaki, H., Sekiguchi, Y.: Effect of confusion network combination on speech recognition system for editing. In: Proc. of APSIPA Annual Summit and Conf., vol. 4, pp. 1–4 (2011) | es_ES |
dc.description.references | Johnson, D.: ICSI Quicknet soft package (2004). http://www1.icsi.berkeley.edu/Speech/qn.html | es_ES |
dc.description.references | Kneser, R., Ney, H.: Improved backing-off for m-gram language modeling. In: Proc. of Int. Conf. Acoustics, Speech and Signal Processing, vol. 1, pp. 181–184 (1995) | es_ES |
dc.description.references | Krishnamurthy, H.K.: Study of algorithms to combine multiple automatic speech recognition (ASR) system outputs. Master’s thesis, Department of Electrical and Computer Engineering (2009). http://hdl.handle.net/2047/d10019273 | es_ES |
dc.description.references | Luján-Mares, M., Tamarit, V., Alabau, V., Martínez-Hinarejos, C.D., Pastor i Gadea, M., Sanchis, A., Toselli, A.H.: iATROS: a speech and handwritting recognition system. In: V Jornadas en Tecnologías del Habla (VJTH2008), pp. 75–78 (2008) | es_ES |
dc.description.references | Moreno, A., Poch, D., Bonafonte, A., Lleida, E., Llisterri, J., Mariño, J.B., Nadeu, C.: Albayzin speech database: design of the phonetic corpus. In: Proc. of EuroSpeech 1993, pp. 175–178 (1993) | es_ES |
dc.description.references | Netpbm home page. http://netpbm.sourceforge.net/ | es_ES |
dc.description.references | Plamondon, R., Srihari, S.N.: On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(1), 63–84 (2000) | es_ES |
dc.description.references | Romero, V., Leiva, L.A., Toselli, A.H., Vidal, E.: Interactive multimodal transcription of text images using a web-based demo system. In: Proc. of Conf. on Intelligent User Interfaces, pp. 477–478 (2009) | es_ES |
dc.description.references | Serrano, N., Castro, F., Juan, A.: The RODRIGO Database. In: Proc. of Language Resources and Evaluation Conference, pp. 2709–2712 (2010) | es_ES |
dc.description.references | Stolcke, A.: SRILM - an extensible language modeling toolkit. In: Proc. Interspeech, pp. 901–904 (2002) | es_ES |
dc.description.references | Woodruff, P., Dupont, S.: Bimodal combination of speech and handwriting for improved word recognition. In: Proc. of EUSIPCO 2005, pp. 1918–1921 (2005) | es_ES |
dc.description.references | Xue, J., Zhao, Y.: Improved confusion network algorithm and shortest path search from word lattice. In: Proc. of Int. Conf. in Acoustics, Speech and Signal Processing, vol. 1, pp. 853–856 (2005) | es_ES |
dc.description.references | Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., et al.: The HTK book (for HTK version 3.4). Cambridge university Eng. Dept. (2006) | es_ES |
dc.description.references | Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to information retrieval. Transactions on Information Systems 22(2), 179–214 (2004) | es_ES |