The IBEM dataset: A large printed scientific image dataset for indexing and searching mathematical expressions

Anitei, Dan; Sánchez Peiró, Joan Andreu; Benedí Ruiz, José Miguel; Noya García, Ernesto

doi:10.1016/j.patrec.2023.05.033

Identificarse

Buscar en RiuNet

Listar

Todo RiuNet
Esta colección

Mi cuenta

Acceder

Estadísticas

Ver Estadísticas de uso

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

The IBEM dataset: A large printed scientific image dataset for indexing and searching mathematical expressions

Mostrar el registro sencillo del ítem

Ficheros en el ítem

Nombre: AniteiSanchezBenedi ...

Tamaño: 1.720Mb

Formato: PDF

Descripción: Versión editorial

Abrir

dc.contributor.author	Anitei, Dan	es_ES
dc.contributor.author	Sánchez Peiró, Joan Andreu	es_ES
dc.contributor.author	Benedí Ruiz, José Miguel	es_ES
dc.contributor.author	Noya García, Ernesto	es_ES
dc.date.accessioned	2023-12-18T19:03:33Z
dc.date.available	2023-12-18T19:03:33Z
dc.date.issued	2023-08	es_ES
dc.identifier.issn	0167-8655	es_ES
dc.identifier.uri	http://hdl.handle.net/10251/200851
dc.description.abstract	[EN] Searching for information in printed scientific documents is a challenging problem that has recently received special attention from the Pattern Recognition research community. Mathematical expressions are complex elements that appear in scientific documents, and developing techniques for locating and recognizing them requires the preparation of datasets that can be used as benchmarks. Most current techniques for dealing with mathematical expressions are based on Machine Learning techniques which require a large amount of annotated data. These datasets must be prepared with ground-truth information for automatic training and testing. However, preparing large datasets with ground-truth is a very expensive and time-consuming task. This paper introduces the IBEM dataset, consisting of scientific documents that have been prepared for mathematical expression recognition and searching. This dataset consists of 600 documents, more than 8200 page images with more than 160000 mathematical expressions. It has been automatically generated from the Image 1 version of the documents and can be enlarged easily. The ground-truth includes the position at the page level and the Image 1 transcript for mathematical expressions both embedded in the text and displayed. This paper also reports a baseline classification experiment with mathematical symbols and a baseline experiment of Mathematical Expression Recognition performed on the IBEM dataset. These experiments aim to provide some benchmarks for comparison purposes so that future users of the IBEM dataset can have a baseline framework.	es_ES
dc.description.sponsorship	This work has been partially supported by MCIN/AEI/10.13039/50110 0 011033 under the grant PID2020-116813RB-I00; the Generalitat Valenciana under the FPI grant CIACIF/2021/313; and by the support of the Valencian Graduate School and Research Network of Artificial Intelligence.	es_ES
dc.language	Inglés	es_ES
dc.publisher	Elsevier	es_ES
dc.relation.ispartof	Pattern Recognition Letters	es_ES
dc.rights	Reconocimiento - No comercial - Sin obra derivada (by-nc-nd)	es_ES
dc.subject	Mathematical expression dataset	es_ES
dc.subject	Mathematical expression recognition	es_ES
dc.subject	Mathematical expression retrieval	es_ES
dc.subject	Mathematical symbols classification	es_ES
dc.subject.classification	LENGUAJES Y SISTEMAS INFORMATICOS	es_ES
dc.title	The IBEM dataset: A large printed scientific image dataset for indexing and searching mathematical expressions	es_ES
dc.type	Artículo	es_ES
dc.identifier.doi	10.1016/j.patrec.2023.05.033	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-116813RB-I00/ES/SEARCHING IN THE SIMANCA ARCHIVE/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/GENERALITAT VALENCIANA//CIACIF%2F2021%2F313//Indexación y búsqueda de expresiones matemáticas basada en redes neuronales profundas para colecciones masivas de imágenes de documentos científicos/	es_ES
dc.rights.accessRights	Abierto	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Escola Tècnica Superior d'Enginyeria Informàtica	es_ES
dc.description.bibliographicCitation	Anitei, D.; Sánchez Peiró, JA.; Benedí Ruiz, JM.; Noya García, E. (2023). The IBEM dataset: A large printed scientific image dataset for indexing and searching mathematical expressions. Pattern Recognition Letters. 172:29-36. https://doi.org/10.1016/j.patrec.2023.05.033	es_ES
dc.description.accrualMethod	S	es_ES
dc.relation.publisherversion	https://doi.org/10.1016/j.patrec.2023.05.033	es_ES
dc.description.upvformatpinicio	29	es_ES
dc.description.upvformatpfin	36	es_ES
dc.type.version	info:eu-repo/semantics/publishedVersion	es_ES
dc.description.volume	172	es_ES
dc.relation.pasarela	S\494886	es_ES
dc.contributor.funder	GENERALITAT VALENCIANA	es_ES
dc.contributor.funder	AGENCIA ESTATAL DE INVESTIGACION	es_ES
dc.contributor.funder	Instituto Valenciano de Investigación en Inteligencia Artificial	es_ES
dc.contributor.funder	Universitat Politècnica de València

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem

The IBEM dataset: A large printed scientific image dataset for indexing and searching mathematical expressions

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Buscar en RiuNet

Listar

Todo RiuNet

Esta colección

Mi cuenta

Estadísticas

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

The IBEM dataset: A large printed scientific image dataset for indexing and searching mathematical expressions

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)

Ítems relacionados