Frinken, V.; Fischer, A.; Martínez-Hinarejos, C. (2013). Handwriting recognition in historical documents using very large vocabularies. ACM. https://doi.org/10.1145/2501115.2501116
Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10251/78056
Title:
|
Handwriting recognition in historical documents using very large vocabularies
|
Author:
|
Frinken, Volkmar
Fischer, Andreas
Martínez-Hinarejos, Carlos-D.
|
UPV Unit:
|
Universitat Politècnica de València. Escola Tècnica Superior d'Enginyeria Informàtica
|
Issued date:
|
|
Abstract:
|
Language models are used in automatic transcription system
to resolve ambiguities. This is done by limiting the vocabulary
of words that can be recognized as well as estimating
the n-gram probability of the words in the ...[+]
Language models are used in automatic transcription system
to resolve ambiguities. This is done by limiting the vocabulary
of words that can be recognized as well as estimating
the n-gram probability of the words in the given text. In
the context of historical documents, a non-unified spelling
and the limited amount of written text pose a substantial
problem for the selection of the recognizable vocabulary as
well as the computation of the word probabilities. In this
paper we propose for the transcription of historical Spanish
text to keep the corpus for the n-gram limited to a sample
of the target text, but expand the vocabulary with words
gathered from external resources. We analyze the performance
of such a transcription system with different sizes of
external vocabularies and demonstrate the applicability and
the significant increase in recognition accuracy of using up
to 300 thousand external words.
[-]
|
Subjects:
|
Historical documents
,
Handwriting recognition
,
Language modeling
|
Copyrigths:
|
Reserva de todos los derechos
|
ISBN:
|
978-1-4503-2115-0
|
DOI:
|
10.1145/2501115.2501116
|
Publisher:
|
ACM
|
Publisher version:
|
http://dx.doi.org/10.1145/2501115.2501116
|
Conference name:
|
2nd International Workshop on Historical Document Imaging and Processing
|
Conference place:
|
Washington, DC, USA
|
Conference date:
|
August 24-24, 2013
|
Project ID:
|
info:eu-repo/grantAgreement/EC/FP7/230653/EU/Administrative Document Automate Optimization/
ERC/2010-AdG-20100407
...[+]
info:eu-repo/grantAgreement/EC/FP7/230653/EU/Administrative Document Automate Optimization/
info:eu-repo/grantAgreement/MICINN//TIN2009-14633-C03-03/ES/Extraccion De Conocimiento De Imagenes De Documentos Con Contenidos Heterogeneos/
info:eu-repo/grantAgreement/EC/FP7/269796/EU/Five Centuries of Marriages/
info:eu-repo/grantAgreement/SNSF//PBBEP2_141453/CH/Bootstrapping Handwriting Recognition Systems for Historical Documents/
info:eu-repo/grantAgreement/MICINN//RYC-2009-05031/ES/RYC-2009-05031/
info:eu-repo/grantAgreement/MICINN//TIN2011-24631/ES/TEXTO EN LA CIUDAD - COMPRENSION CENTRADA EN HUMANOS DE TEXTO EN ESCENAS/
info:eu-repo/grantAgreement/MINECO//TIN2012-37475-C02-02/ES/RECONOCIMIENTO CONTEXTUAL EN DOCUMENTOS ANTIGUOS/
info:eu-repo/grantAgreement/MICINN//TIN2009-14633-C03-01/ES/Multimodal Interaction For Text Transcription With Adaptive Learning/
info:eu-repo/grantAgreement/MINECO//TIN2012-31723/ES/INTERACCION ACTIVA PARA TRANSCRIPCION DE HABLA Y TRADUCCION/
ERC/2010-AdG-20100407
[-]
|
Description:
|
© ACM 2013. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in HIP '13 Proceedings of the 2nd International Workshop on Historical Document Imaging and Processinghttp://dx.doi.org/10.1145/2501115.2501116
|
Thanks:
|
This work has been supported by the European project FP7-PEOPLE-2008-IAPP: 230653 the European Research Council’s Advanced Grant ERC-2010-AdG 20100407, the Spanish R&D projects TIN2009-14633-C03-03, RYC-2009-05031, ...[+]
This work has been supported by the European project FP7-PEOPLE-2008-IAPP: 230653 the European Research Council’s Advanced Grant ERC-2010-AdG 20100407, the Spanish R&D projects TIN2009-14633-C03-03, RYC-2009-05031, TIN2011-24631, TIN2012-37475-C02-02, MITTRAL (TIN2009-14633-C03-01), Active2Trans (TIN2012-31723) as well as the Swiss National Science Foundation fellowship project PBBEP2_141453.
[-]
|
Type:
|
Comunicación en congreso
|