- -

Character-Based Handwritten Text Recognition of Multilingual Documents

RiuNet: Institutional repository of the Polithecnic University of Valencia

Share/Send to

Cited by

Statistics

Character-Based Handwritten Text Recognition of Multilingual Documents

Show simple item record

Files in this item

dc.contributor.author Del Agua Teba, Miguel Angel es_ES
dc.contributor.author Serrano Martinez Santos, Nicolas es_ES
dc.contributor.author Civera Saiz, Jorge es_ES
dc.contributor.author Juan Císcar, Alfonso es_ES
dc.date.accessioned 2014-01-27T13:38:21Z
dc.date.issued 2012
dc.identifier.isbn 978-3-642-35292-8 (on line)
dc.identifier.isbn 978-3-642-35291-1 (print)
dc.identifier.issn 1865-0929
dc.identifier.uri http://hdl.handle.net/10251/35180
dc.description.abstract [EN] An effective approach to transcribe handwritten text documents is to follow a sequential interactive approach. During the supervision phase, user corrections are incorporated into the system through an ongoing retraining process. In the case of multilingual documents with a high percentage of out-of-vocabulary (OOV) words, two principal issues arise. On the one hand, a minor yet important matter for this interactive approach is to identify the language of the current text line image to be transcribed, as a language dependent recognisers typically performs better than a monolingual recogniser. On the other hand, word-based language models suffer from data scarcity in the presence of a large number of OOV words, degrading their estimation and affecting the performance of the transcription system. In this paper, we successfully tackle both issues deploying character-based language models combined with language identification techniques on an entire 764-page multilingual document. The results obtained significantly reduce previously reported results in terms of transcription error on the same task, but showed that a language dependent approach is not effective on top of character-based recognition of similar languages. es_ES
dc.description.sponsorship The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement n◦ 287755. Also supported by the Spanish Government (MIPRCV ”Consolider Ingenio 2010”, iTrans2 TIN2009-14511, MITTRAL TIN2009-14633-C03-01 and FPU AP2007-0286) and the Generalitat Valenciana (Prometeo/2009/014).
dc.language Inglés es_ES
dc.publisher Springer Verlag (Germany) es_ES
dc.relation info:eu-repo/grantAgreement/MICINN//TIN2009-14511/ES/Traduccion De Textos Y Transcripcion De Voz Interactivas/ es_ES
dc.relation MICINN/TIN2009- 14633-C03-01 es_ES
dc.relation info:eu-repo/grantAgreement/Generalitat Valenciana//PROMETEO09%2F2009%2F014/ES/Adaptive learning and multimodality in pattern recognition (Almapater)/ es_ES
dc.relation MICINN/FPU/AP2007-0286
dc.relation.ispartof Communications in Computer and Information Science es_ES
dc.rights Reserva de todos los derechos es_ES
dc.subject Machine Learning es_ES
dc.subject HTR es_ES
dc.subject Handwritten Text Recognition es_ES
dc.subject Multilingual es_ES
dc.subject Character es_ES
dc.subject.classification CIENCIAS DE LA COMPUTACION E INTELIGENCIA ARTIFICIAL es_ES
dc.subject.classification LENGUAJES Y SISTEMAS INFORMATICOS es_ES
dc.title Character-Based Handwritten Text Recognition of Multilingual Documents es_ES
dc.type Artículo es_ES
dc.embargo.lift 10000-01-01
dc.embargo.terms forever es_ES
dc.identifier.doi 10.1007/978-3-642-35292-8_20
dc.relation.projectID info:eu-repo/grantAgreement/EC/FP7/287755/EU/Transcription and Translation of Video Lectures/ es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Instituto Universitario Mixto Tecnológico de Informática - Institut Universitari Mixt Tecnològic d'Informàtica es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació es_ES
dc.description.bibliographicCitation Del Agua Teba, MA.; Serrano Martinez Santos, N.; Civera Saiz, J.; Juan Císcar, A. (2012). Character-Based Handwritten Text Recognition of Multilingual Documents. Communications in Computer and Information Science. 328:187-196. https://doi.org/10.1007/978-3-642-35292-8_20 es_ES
dc.description.accrualMethod S es_ES
dc.relation.conferencename Spanish Speech Technology Workshop/Iberian SLTech Workshop es_ES
dc.relation.conferencedate NOV 21-23, 2012 es_ES
dc.relation.conferenceplace Madrid, SPAIN es_ES
dc.relation.publisherversion http://dx.doi.org/10.1007/978-3-642-35292-8_20 es_ES
dc.description.upvformatpinicio 187 es_ES
dc.description.upvformatpfin 196 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 328 es_ES
dc.relation.senia 241898
dc.contributor.funder European Commission
dc.contributor.funder Ministerio de Ciencia e Innovación
dc.contributor.funder Generalitat Valenciana
dc.relation.references Graves, A., Liwicki, M., Fernandez, S., Bertolami, R., Bunke, H., Schmidhuber, J.: A novel connectionist system for unconstrained handwriting recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 31(5), 855–868 (2009) es_ES
dc.relation.references Serrano, N., Tarazón, L., Pérez, D., Ramos-Terrades, O., Juan, A.: The GIDOC prototype. In: Proc. of the 10th Int. Workshop on Pattern Recognition in Information Systems (PRIS 2010), Funchal, Portugal, pp. 82–89 (2010) es_ES
dc.relation.references Serrano, N., Pérez, D., Sanchis, A., Juan, A.: Adaptation from Partially Supervised Handwritten Text Transcriptions. In: Proc. of the 11th Int. Conf. on Multimodal Interfaces and the 6th Workshop on Machine Learning for Multimodal Interaction (ICMI-MLMI 2009), Cambridge, MA, USA, pp. 289–292 (2009) es_ES
dc.relation.references Serrano, N., Sanchis, A., Juan, A.: Balancing error and supervision effort in interactive-predictive handwriting recognition. In: Proc. of the Int. Conf. on Intelligent User Interfaces (IUI 2010), Hong Kong, China, pp. 373–376 (2010) es_ES
dc.relation.references Serrano, N., Giménez, A., Sanchis, A., Juan, A.: Active learning strategies in handwritten text recognition. In: Proc. of the 12th Int. Conf. on Multimodal Interfaces and the 7th Workshop on Machine Learning for Multimodal Interaction (ICMI-MLMI 2010), Beijing, China, vol. (86) (November 2010) es_ES
dc.relation.references Pérez, D., Tarazón, L., Serrano, N., Castro, F., Ramos-Terrades, O., Juan, A.: The GERMANA database. In: Proc. of the 10th Int. Conf. on Document Analysis and Recognition (ICDAR 2009), Barcelona, Spain, pp. 301–305 (2009) es_ES
dc.relation.references del Agua, M.A., Serrano, N., Juan, A.: Language Identification for Interactive Handwriting Transcription of Multilingual Documents. In: Vitrià, J., Sanches, J.M., Hernández, M. (eds.) IbPRIA 2011. LNCS, vol. 6669, pp. 596–603. Springer, Heidelberg (2011) es_ES
dc.relation.references Ghosh, D., Dube, T., Shivaprasad, P.: Script Recognition: A Review. IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI) 32(12), 2142–2161 (2010) es_ES
dc.relation.references Bisani, M., Ney, H.: Open vocabulary speech recognition with flat hybrid models. In: Proc. of the European Conf. on Speech Communication and Technology, pp. 725–728 (2005) es_ES
dc.relation.references Szoke, I., Burget, L., Cernocky, J., Fapso, M.: Sub-word modeling of out of vocabulary words in spoken term detection. In: IEEE Spoken Language Technology Workshop, SLT 2008, pp. 273–276 (December 2008) es_ES
dc.relation.references Brakensiek, A., Rottl, J., Kosmala, A., Rigoll, G.: Off-Line handwriting recognition using various hybrid modeling techniques and character N-Grams. In: 7th International Workshop on Frontiers in Handwritten Recognition, pp. 343–352 (2000) es_ES
dc.relation.references Zamora, F., Castro, M.J., España, S., Gorbe, J.: Unconstrained offline handwriting recognition using connectionist character n-grams. In: The 2010 International Joint Conference on Neural Networks (IJCNN), pp. 1–7 (July 2010) es_ES
dc.relation.references Marti, U.V., Bunke, H.: The IAM-database: an English sentence database for off-line handwriting recognition. IJDAR, 39–46 (2002) es_ES
dc.relation.references Schultz, T., Kirchhoff, K.: Multilingual Speech Processing (2006) es_ES
dc.relation.references Stolcke, A.: SRILM – an extensible language modeling toolkit. In: Proc. of ICSLP 2002, pp. 901–904 (September 2002) es_ES
dc.relation.references Rybach, D., Gollan, C., Heigold, G., Hoffmeister, B., Lööf, J., Schlüter, R., Ney, H.: The RWTH aachen university open source speech recognition system. In: Interspeech, Brighton, U.K., pp. 2111–2114 (September 2009) es_ES
dc.relation.references Efron, B., Tibshirani, R.J.: An Introduction to Bootstrap. Chapman & Hall/CRC (1994) es_ES


This item appears in the following Collection(s)

Show simple item record