- -

Handwriting recognition in historical documents using very large vocabularies

RiuNet: Institutional repository of the Polithecnic University of Valencia

Share/Send to

Cited by

Statistics

Handwriting recognition in historical documents using very large vocabularies

Show simple item record

Files in this item

dc.contributor.author Frinken, Volkmar es_ES
dc.contributor.author Fischer, Andreas es_ES
dc.contributor.author Martínez-Hinarejos, Carlos-D. es_ES
dc.date.accessioned 2017-02-20T09:43:22Z
dc.date.available 2017-02-20T09:43:22Z
dc.date.issued 2013-08
dc.identifier.isbn 978-1-4503-2115-0
dc.identifier.uri http://hdl.handle.net/10251/78056
dc.description © ACM 2013. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in HIP '13 Proceedings of the 2nd International Workshop on Historical Document Imaging and Processinghttp://dx.doi.org/10.1145/2501115.2501116 es_ES
dc.description.abstract Language models are used in automatic transcription system to resolve ambiguities. This is done by limiting the vocabulary of words that can be recognized as well as estimating the n-gram probability of the words in the given text. In the context of historical documents, a non-unified spelling and the limited amount of written text pose a substantial problem for the selection of the recognizable vocabulary as well as the computation of the word probabilities. In this paper we propose for the transcription of historical Spanish text to keep the corpus for the n-gram limited to a sample of the target text, but expand the vocabulary with words gathered from external resources. We analyze the performance of such a transcription system with different sizes of external vocabularies and demonstrate the applicability and the significant increase in recognition accuracy of using up to 300 thousand external words. es_ES
dc.format.extent 6 es_ES
dc.language Inglés es_ES
dc.publisher ACM es_ES
dc.relation European project FP7-PEOPLE-2008-IAPP/ 230653 es_ES
dc.relation European Research Council’s Advanced Grant/ ERC-2010-AdG 20100407 es_ES
dc.relation Spanish R&D projects/ TIN2009-14633-C03-03, RYC-2009- 05031, TIN2011-24631, TIN2012-37475-C02-02 es_ES
dc.relation MITTRAL/ TIN2009-14633-C03-01 es_ES
dc.relation Active2Trans/ TIN2012-31723 es_ES
dc.relation Swiss National Science Foundation fellowship/ PBBEP2 141453 es_ES
dc.rights Reserva de todos los derechos es_ES
dc.subject Historical documents es_ES
dc.subject Handwriting recognition es_ES
dc.subject Language modeling es_ES
dc.subject.classification LENGUAJES Y SISTEMAS INFORMATICOS es_ES
dc.title Handwriting recognition in historical documents using very large vocabularies es_ES
dc.type Comunicación en congreso es_ES
dc.identifier.doi 10.1145/2501115.2501116
dc.relation.projectID info:eu-repo/grantAgreement/EC/FP7230653 es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Escola Tècnica Superior d'Enginyeria Informàtica es_ES
dc.description.bibliographicCitation Frinken, V.; Fischer, A.; Martínez-Hinarejos, C. (2013). Handwriting recognition in historical documents using very large vocabularies. ACM. doi:10.1145/2501115.2501116 es_ES
dc.description.accrualMethod Senia es_ES
dc.relation.conferencename 2nd International Workshop on Historical Document Imaging and Processing es_ES
dc.relation.conferencedate August 24-24, 2013 es_ES
dc.relation.conferenceplace Washington, DC, USA es_ES
dc.relation.publisherversion http://dx.doi.org/10.1145/2501115.2501116 es_ES
dc.type.version info:eu repo/semantics/publishedVersion es_ES
dc.relation.senia 259932 es_ES


This item appears in the following Collection(s)

Show simple item record