Romero Gómez, V.; Fornes, A.; Vidal Ruiz, E.; Sánchez Peiró, JA. (2016). Using the MGGI Methodology for Category-based Language Modeling in Handwritten Marriage Licenses Books. IEEE. https://doi.org/10.1109/ICFHR.2016.0069
Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10251/87633
Título:
|
Using the MGGI Methodology for Category-based Language Modeling in Handwritten Marriage Licenses Books
|
Autor:
|
Romero Gómez, Verónica
Fornes, Alicia
Vidal Ruiz, Enrique
Sánchez Peiró, Joan Andreu
|
Entidad UPV:
|
Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació
Universitat Politècnica de València. Departamento de Estadística e Investigación Operativa Aplicadas y Calidad - Departament d'Estadística i Investigació Operativa Aplicades i Qualitat
|
Fecha difusión:
|
|
Resumen:
|
Handwritten marriage licenses books have been
used for centuries by ecclesiastical and secular institutions
to register marriages. The information contained in these
historical documents is useful for demography studies ...[+]
Handwritten marriage licenses books have been
used for centuries by ecclesiastical and secular institutions
to register marriages. The information contained in these
historical documents is useful for demography studies and
genealogical research, among others. Despite the generally
simple structure of the text in these documents, automatic transcription
and semantic information extraction is difficult due
to the distinct and evolutionary vocabulary, which is composed
mainly of proper names that change along the time. In previous
works we studied the use of category-based language models to
both improve the automatic transcription accuracy and make
easier the extraction of semantic information. Here we analyze
the main causes of the semantic errors observed in previous
results and apply a Grammatical Inference technique known
as MGGI to improve the semantic accuracy of the language
model obtained. Using this language model, full handwritten
text recognition experiments have been carried out, with results
supporting the interest of the proposed approach.
[-]
|
Palabras clave:
|
Handwritten Documents
,
Information extraction
,
Language modeling
,
MGGI
,
Categories-based language model
|
Derechos de uso:
|
Reserva de todos los derechos
|
Fuente:
|
|
DOI:
|
10.1109/ICFHR.2016.0069
|
Editorial:
|
IEEE
|
Versión del editor:
|
http://ieeexplore.ieee.org/document/7814085/
|
Título del congreso:
|
15th International Conference on Frontiers in Handwriting Recognition (ICFHR 2016)
|
Lugar del congreso:
|
Shenzhen, China
|
Fecha congreso:
|
October 23-26, 2016
|
Código del Proyecto:
|
info:eu-repo/grantAgreement/EC/H2020/674943/EU/Recognition and Enrichment of Archival Documents/
info:eu-repo/grantAgreement/MINECO//TIN2015-70924-C2-1-R/ES/CONTEXTO, MULTIMODALIDAD Y COLABORACION DEL USUARIO EN PROCESADO DE TEXTO MANUSCRITO/
info:eu-repo/grantAgreement/MINECO//TIN2015-70924-C2-2-R/ES/CONTEXTUALIZACION DE CONTENIDOS EN EL RECONOCIMIENTO DE IMAGENES DE DOCUMENTOS DE ARCHIVOS/
info:eu-repo/grantAgreement/EC/FP7/269796/EU/Five Centuries of Marriages/
info:eu-repo/grantAgreement/MINECO//RYC-2014-16831/ES/RYC-2014-16831/
|
Descripción:
|
© 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
|
Agradecimientos:
|
This work has been partially supported through the European Union’s H2020 grant READ (Ref: 674943), the European project ERC-2010-AdG-20100407-269796, the MINECO/FEDER, UE projects TIN2015-70924-C2-1-R and TIN2015-70924-C2-2-R, ...[+]
This work has been partially supported through the European Union’s H2020 grant READ (Ref: 674943), the European project ERC-2010-AdG-20100407-269796, the MINECO/FEDER, UE projects TIN2015-70924-C2-1-R and TIN2015-70924-C2-2-R, and the Ramon y Cajal Fellowship RYC-2014-16831.
[-]
|
Tipo:
|
Comunicación en congreso
|