- -

Transcribing a 17th-century botanical manuscript: Longitudinal evaluation of document layout detection and interactive transcription

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

Transcribing a 17th-century botanical manuscript: Longitudinal evaluation of document layout detection and interactive transcription

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Toselli, Alejandro Héctor es_ES
dc.contributor.author Leiva, Luis A. es_ES
dc.contributor.author Bordes-Cabrera, Isabel es_ES
dc.contributor.author Hernández-Tornero, Celio es_ES
dc.contributor.author BOSCH CAMPOS, VICENTE es_ES
dc.contributor.author Vidal, Enrique es_ES
dc.date.accessioned 2020-10-06T03:32:03Z
dc.date.available 2020-10-06T03:32:03Z
dc.date.issued 2018-04 es_ES
dc.identifier.issn 2055-7671 es_ES
dc.identifier.uri http://hdl.handle.net/10251/151164
dc.description.abstract [EN] We present a process for cost-effective transcription of cursive handwritten text images that has been tested on a 1,000-page 17th-century book about botanical species. The process comprised two main tasks, namely: (1) preprocessing: page layout analysis, text line detection, and extraction; and (2) transcription of the extracted text line images. Both tasks were carried out with semiautomatic pro- cedures, aimed at incrementally minimizing user correction effort, by means of computer-assisted line detection and interactive handwritten text recognition technologies. The contribution derived from this work is three-fold. First, we provide a detailed human-supervised transcription of a relatively large historical handwritten book, ready to be searchable, indexable, and accessible to cultural heritage scholars as well as the general public. Second, we have conducted the first longitudinal study to date on interactive handwriting text recognition, for which we provide a very comprehensive user assessment of the real-world per- formance of the technologies involved in this work. Third, as a result of this process, we have produced a detailed transcription and document layout infor- mation (i.e. high-quality labeled data) ready to be used by researchers working on automated technologies for document analysis and recognition. es_ES
dc.description.sponsorship This work is supported by the European Commission through the EU projects HIMANIS (JPICH program, Spanish, grant Ref. PCIN-2015-068) and READ (Horizon-2020 program, grant Ref. 674943); and the Universitat Politecnica de Valencia (grant number SP20130189). This work was also part of the Valorization and I+D+i Resources program of VLC/CAMPUS and has been funded by the Spanish MECD as part of the International Excellence Campus program. es_ES
dc.language Inglés es_ES
dc.publisher Oxford University Press es_ES
dc.relation.ispartof Digital Scholarship in the Humanities es_ES
dc.rights Reserva de todos los derechos es_ES
dc.subject Handwriting recognition es_ES
dc.subject Images es_ES
dc.subject Models es_ES
dc.subject.classification ESTADISTICA E INVESTIGACION OPERATIVA es_ES
dc.subject.classification LENGUAJES Y SISTEMAS INFORMATICOS es_ES
dc.title Transcribing a 17th-century botanical manuscript: Longitudinal evaluation of document layout detection and interactive transcription es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.1093/llc/fqw064 es_ES
dc.relation.projectID info:eu-repo/grantAgreement/EC/H2020/674943/EU/Recognition and Enrichment of Archival Documents/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/UPV//SP20130189/ es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Estadística e Investigación Operativa Aplicadas y Calidad - Departament d'Estadística i Investigació Operativa Aplicades i Qualitat es_ES
dc.description.bibliographicCitation Toselli, AH.; Leiva, LA.; Bordes-Cabrera, I.; Hernández-Tornero, C.; Bosch Campos, V.; Vidal, E. (2018). Transcribing a 17th-century botanical manuscript: Longitudinal evaluation of document layout detection and interactive transcription. Digital Scholarship in the Humanities. 33(1):173-202. https://doi.org/10.1093/llc/fqw064 es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion https://doi.org/10.1093/llc/fqw064 es_ES
dc.description.upvformatpinicio 173 es_ES
dc.description.upvformatpfin 202 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 33 es_ES
dc.description.issue 1 es_ES
dc.relation.pasarela S\338508 es_ES
dc.contributor.funder Universitat Politècnica de València es_ES
dc.description.references Bazzi, I., Schwartz, R., & Makhoul, J. (1999). An omnifont open-vocabulary OCR system for English and Arabic. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(6), 495-504. doi:10.1109/34.771314 es_ES
dc.description.references Causer, T., Tonra, J., & Wallace, V. (2012). Transcription maximized; expense minimized? Crowdsourcing and editing The Collected Works of Jeremy Bentham*. Literary and Linguistic Computing, 27(2), 119-137. doi:10.1093/llc/fqs004 es_ES
dc.description.references Ramel, J. Y., Leriche, S., Demonet, M. L., & Busson, S. (2007). User-driven page layout analysis of historical printed books. International Journal of Document Analysis and Recognition (IJDAR), 9(2-4), 243-261. doi:10.1007/s10032-007-0040-6 es_ES
dc.description.references Romero, V., Fornés, A., Serrano, N., Sánchez, J. A., Toselli, A. H., Frinken, V., … Lladós, J. (2013). The ESPOSALLES database: An ancient marriage license corpus for off-line handwriting recognition. Pattern Recognition, 46(6), 1658-1669. doi:10.1016/j.patcog.2012.11.024 es_ES
dc.description.references Romero, V., Toselli, A. H., & Vidal, E. (2012). Multimodal Interactive Handwritten Text Transcription. Series in Machine Perception and Artificial Intelligence. doi:10.1142/8394 es_ES
dc.description.references Toselli, A. H., Romero, V., Pastor, M., & Vidal, E. (2010). Multimodal interactive transcription of text images. Pattern Recognition, 43(5), 1814-1825. doi:10.1016/j.patcog.2009.11.019 es_ES
dc.description.references Toselli, A. H., Vidal, E., Romero, V., & Frinken, V. (2016). HMM word graph based keyword spotting in handwritten document images. Information Sciences, 370-371, 497-518. doi:10.1016/j.ins.2016.07.063 es_ES
dc.description.references Bunke, H., Bengio, S., & Vinciarelli, A. (2004). Offline recognition of unconstrained handwritten texts using HMMs and statistical language models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(6), 709-720. doi:10.1109/tpami.2004.14 es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem