- -

The IBEM dataset: A large printed scientific image dataset for indexing and searching mathematical expressions

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

The IBEM dataset: A large printed scientific image dataset for indexing and searching mathematical expressions

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Anitei, Dan es_ES
dc.contributor.author Sánchez Peiró, Joan Andreu es_ES
dc.contributor.author Benedí Ruiz, José Miguel es_ES
dc.contributor.author Noya García, Ernesto es_ES
dc.date.accessioned 2023-12-18T19:03:33Z
dc.date.available 2023-12-18T19:03:33Z
dc.date.issued 2023-08 es_ES
dc.identifier.issn 0167-8655 es_ES
dc.identifier.uri http://hdl.handle.net/10251/200851
dc.description.abstract [EN] Searching for information in printed scientific documents is a challenging problem that has recently received special attention from the Pattern Recognition research community. Mathematical expressions are complex elements that appear in scientific documents, and developing techniques for locating and recognizing them requires the preparation of datasets that can be used as benchmarks. Most current techniques for dealing with mathematical expressions are based on Machine Learning techniques which require a large amount of annotated data. These datasets must be prepared with ground-truth information for automatic training and testing. However, preparing large datasets with ground-truth is a very expensive and time-consuming task. This paper introduces the IBEM dataset, consisting of scientific documents that have been prepared for mathematical expression recognition and searching. This dataset consists of 600 documents, more than 8200 page images with more than 160000 mathematical expressions. It has been automatically generated from the Image 1 version of the documents and can be enlarged easily. The ground-truth includes the position at the page level and the Image 1 transcript for mathematical expressions both embedded in the text and displayed. This paper also reports a baseline classification experiment with mathematical symbols and a baseline experiment of Mathematical Expression Recognition performed on the IBEM dataset. These experiments aim to provide some benchmarks for comparison purposes so that future users of the IBEM dataset can have a baseline framework. es_ES
dc.description.sponsorship This work has been partially supported by MCIN/AEI/10.13039/50110 0 011033 under the grant PID2020-116813RB-I00; the Generalitat Valenciana under the FPI grant CIACIF/2021/313; and by the support of the Valencian Graduate School and Research Network of Artificial Intelligence. es_ES
dc.language Inglés es_ES
dc.publisher Elsevier es_ES
dc.relation.ispartof Pattern Recognition Letters es_ES
dc.rights Reconocimiento - No comercial - Sin obra derivada (by-nc-nd) es_ES
dc.subject Mathematical expression dataset es_ES
dc.subject Mathematical expression recognition es_ES
dc.subject Mathematical expression retrieval es_ES
dc.subject Mathematical symbols classification es_ES
dc.subject.classification LENGUAJES Y SISTEMAS INFORMATICOS es_ES
dc.title The IBEM dataset: A large printed scientific image dataset for indexing and searching mathematical expressions es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.1016/j.patrec.2023.05.033 es_ES
dc.relation.projectID info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-116813RB-I00/ES/SEARCHING IN THE SIMANCA ARCHIVE/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/GENERALITAT VALENCIANA//CIACIF%2F2021%2F313//Indexación y búsqueda de expresiones matemáticas basada en redes neuronales profundas para colecciones masivas de imágenes de documentos científicos/ es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Escola Tècnica Superior d'Enginyeria Informàtica es_ES
dc.description.bibliographicCitation Anitei, D.; Sánchez Peiró, JA.; Benedí Ruiz, JM.; Noya García, E. (2023). The IBEM dataset: A large printed scientific image dataset for indexing and searching mathematical expressions. Pattern Recognition Letters. 172:29-36. https://doi.org/10.1016/j.patrec.2023.05.033 es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion https://doi.org/10.1016/j.patrec.2023.05.033 es_ES
dc.description.upvformatpinicio 29 es_ES
dc.description.upvformatpfin 36 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 172 es_ES
dc.relation.pasarela S\494886 es_ES
dc.contributor.funder GENERALITAT VALENCIANA es_ES
dc.contributor.funder AGENCIA ESTATAL DE INVESTIGACION es_ES
dc.contributor.funder Instituto Valenciano de Investigación en Inteligencia Artificial es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem