Mostrar el registro sencillo del ítem
dc.contributor.author | Del Agua Teba, Miguel Angel | es_ES |
dc.contributor.author | Serrano Martinez Santos, Nicolas | es_ES |
dc.contributor.author | Civera Saiz, Jorge | es_ES |
dc.contributor.author | Juan Císcar, Alfonso | es_ES |
dc.date.accessioned | 2014-01-27T13:38:21Z | |
dc.date.issued | 2012 | |
dc.identifier.isbn | 978-3-642-35292-8 (on line) | |
dc.identifier.isbn | 978-3-642-35291-1 (print) | |
dc.identifier.issn | 1865-0929 | |
dc.identifier.uri | http://hdl.handle.net/10251/35180 | |
dc.description.abstract | [EN] An effective approach to transcribe handwritten text documents is to follow a sequential interactive approach. During the supervision phase, user corrections are incorporated into the system through an ongoing retraining process. In the case of multilingual documents with a high percentage of out-of-vocabulary (OOV) words, two principal issues arise. On the one hand, a minor yet important matter for this interactive approach is to identify the language of the current text line image to be transcribed, as a language dependent recognisers typically performs better than a monolingual recogniser. On the other hand, word-based language models suffer from data scarcity in the presence of a large number of OOV words, degrading their estimation and affecting the performance of the transcription system. In this paper, we successfully tackle both issues deploying character-based language models combined with language identification techniques on an entire 764-page multilingual document. The results obtained significantly reduce previously reported results in terms of transcription error on the same task, but showed that a language dependent approach is not effective on top of character-based recognition of similar languages. | es_ES |
dc.description.sponsorship | The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement n◦ 287755. Also supported by the Spanish Government (MIPRCV ”Consolider Ingenio 2010”, iTrans2 TIN2009-14511, MITTRAL TIN2009-14633-C03-01 and FPU AP2007-0286) and the Generalitat Valenciana (Prometeo/2009/014). | |
dc.language | Inglés | es_ES |
dc.publisher | Springer Verlag (Germany) | es_ES |
dc.relation.ispartof | Communications in Computer and Information Science | es_ES |
dc.rights | Reserva de todos los derechos | es_ES |
dc.subject | Machine Learning | es_ES |
dc.subject | HTR | es_ES |
dc.subject | Handwritten Text Recognition | es_ES |
dc.subject | Multilingual | es_ES |
dc.subject | Character | es_ES |
dc.subject.classification | CIENCIAS DE LA COMPUTACION E INTELIGENCIA ARTIFICIAL | es_ES |
dc.subject.classification | LENGUAJES Y SISTEMAS INFORMATICOS | es_ES |
dc.title | Character-Based Handwritten Text Recognition of Multilingual Documents | es_ES |
dc.type | Artículo | es_ES |
dc.embargo.lift | 10000-01-01 | |
dc.embargo.terms | forever | es_ES |
dc.identifier.doi | 10.1007/978-3-642-35292-8_20 | |
dc.relation.projectID | info:eu-repo/grantAgreement/EC/FP7/287755/EU/Transcription and Translation of Video Lectures/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/MEC//AP2007-02869/ES/AP2007-02869/ | |
dc.relation.projectID | info:eu-repo/grantAgreement/MICINN//TIN2009-14511/ES/Traduccion De Textos Y Transcripcion De Voz Interactivas/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/MICINN//TIN2009-14633-C03-01/ES/Multimodal Interaction For Text Transcription With Adaptive Learning/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/Generalitat Valenciana//PROMETEO09%2F2009%2F014/ES/Adaptive learning and multimodality in pattern recognition (Almapater)/ | es_ES |
dc.rights.accessRights | Abierto | es_ES |
dc.contributor.affiliation | Universitat Politècnica de València. Instituto Universitario Mixto Tecnológico de Informática - Institut Universitari Mixt Tecnològic d'Informàtica | es_ES |
dc.contributor.affiliation | Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació | es_ES |
dc.description.bibliographicCitation | Del Agua Teba, MA.; Serrano Martinez Santos, N.; Civera Saiz, J.; Juan Císcar, A. (2012). Character-Based Handwritten Text Recognition of Multilingual Documents. Communications in Computer and Information Science. 328:187-196. https://doi.org/10.1007/978-3-642-35292-8_20 | es_ES |
dc.description.accrualMethod | S | es_ES |
dc.relation.conferencename | Spanish Speech Technology Workshop/Iberian SLTech Workshop | es_ES |
dc.relation.conferencedate | NOV 21-23, 2012 | es_ES |
dc.relation.conferenceplace | Madrid, SPAIN | es_ES |
dc.relation.publisherversion | http://dx.doi.org/10.1007/978-3-642-35292-8_20 | es_ES |
dc.description.upvformatpinicio | 187 | es_ES |
dc.description.upvformatpfin | 196 | es_ES |
dc.type.version | info:eu-repo/semantics/publishedVersion | es_ES |
dc.description.volume | 328 | es_ES |
dc.relation.senia | 241898 | |
dc.contributor.funder | European Commission | |
dc.contributor.funder | Ministerio de Ciencia e Innovación | |
dc.contributor.funder | Generalitat Valenciana | |
dc.description.references | Graves, A., Liwicki, M., Fernandez, S., Bertolami, R., Bunke, H., Schmidhuber, J.: A novel connectionist system for unconstrained handwriting recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 31(5), 855–868 (2009) | es_ES |
dc.description.references | Serrano, N., Tarazón, L., Pérez, D., Ramos-Terrades, O., Juan, A.: The GIDOC prototype. In: Proc. of the 10th Int. Workshop on Pattern Recognition in Information Systems (PRIS 2010), Funchal, Portugal, pp. 82–89 (2010) | es_ES |
dc.description.references | Serrano, N., Pérez, D., Sanchis, A., Juan, A.: Adaptation from Partially Supervised Handwritten Text Transcriptions. In: Proc. of the 11th Int. Conf. on Multimodal Interfaces and the 6th Workshop on Machine Learning for Multimodal Interaction (ICMI-MLMI 2009), Cambridge, MA, USA, pp. 289–292 (2009) | es_ES |
dc.description.references | Serrano, N., Sanchis, A., Juan, A.: Balancing error and supervision effort in interactive-predictive handwriting recognition. In: Proc. of the Int. Conf. on Intelligent User Interfaces (IUI 2010), Hong Kong, China, pp. 373–376 (2010) | es_ES |
dc.description.references | Serrano, N., Giménez, A., Sanchis, A., Juan, A.: Active learning strategies in handwritten text recognition. In: Proc. of the 12th Int. Conf. on Multimodal Interfaces and the 7th Workshop on Machine Learning for Multimodal Interaction (ICMI-MLMI 2010), Beijing, China, vol. (86) (November 2010) | es_ES |
dc.description.references | Pérez, D., Tarazón, L., Serrano, N., Castro, F., Ramos-Terrades, O., Juan, A.: The GERMANA database. In: Proc. of the 10th Int. Conf. on Document Analysis and Recognition (ICDAR 2009), Barcelona, Spain, pp. 301–305 (2009) | es_ES |
dc.description.references | del Agua, M.A., Serrano, N., Juan, A.: Language Identification for Interactive Handwriting Transcription of Multilingual Documents. In: Vitrià, J., Sanches, J.M., Hernández, M. (eds.) IbPRIA 2011. LNCS, vol. 6669, pp. 596–603. Springer, Heidelberg (2011) | es_ES |
dc.description.references | Ghosh, D., Dube, T., Shivaprasad, P.: Script Recognition: A Review. IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI) 32(12), 2142–2161 (2010) | es_ES |
dc.description.references | Bisani, M., Ney, H.: Open vocabulary speech recognition with flat hybrid models. In: Proc. of the European Conf. on Speech Communication and Technology, pp. 725–728 (2005) | es_ES |
dc.description.references | Szoke, I., Burget, L., Cernocky, J., Fapso, M.: Sub-word modeling of out of vocabulary words in spoken term detection. In: IEEE Spoken Language Technology Workshop, SLT 2008, pp. 273–276 (December 2008) | es_ES |
dc.description.references | Brakensiek, A., Rottl, J., Kosmala, A., Rigoll, G.: Off-Line handwriting recognition using various hybrid modeling techniques and character N-Grams. In: 7th International Workshop on Frontiers in Handwritten Recognition, pp. 343–352 (2000) | es_ES |
dc.description.references | Zamora, F., Castro, M.J., España, S., Gorbe, J.: Unconstrained offline handwriting recognition using connectionist character n-grams. In: The 2010 International Joint Conference on Neural Networks (IJCNN), pp. 1–7 (July 2010) | es_ES |
dc.description.references | Marti, U.V., Bunke, H.: The IAM-database: an English sentence database for off-line handwriting recognition. IJDAR, 39–46 (2002) | es_ES |
dc.description.references | Schultz, T., Kirchhoff, K.: Multilingual Speech Processing (2006) | es_ES |
dc.description.references | Stolcke, A.: SRILM – an extensible language modeling toolkit. In: Proc. of ICSLP 2002, pp. 901–904 (September 2002) | es_ES |
dc.description.references | Rybach, D., Gollan, C., Heigold, G., Hoffmeister, B., Lööf, J., Schlüter, R., Ney, H.: The RWTH aachen university open source speech recognition system. In: Interspeech, Brighton, U.K., pp. 2111–2114 (September 2009) | es_ES |
dc.description.references | Efron, B., Tibshirani, R.J.: An Introduction to Bootstrap. Chapman & Hall/CRC (1994) | es_ES |