Development of a scalable database for recognition of printed mathemematical expressions

Anitei, Dan

Identificarse

Buscar en RiuNet

Listar

Todo RiuNet
Esta colección

Mi cuenta

Acceder

Estadísticas

Ver Estadísticas de uso

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Development of a scalable database for recognition of printed mathemematical expressions

Mostrar el registro sencillo del ítem

Ficheros en el ítem

Nombre: Anitei - Desarrollo ...

Tamaño: 2.297Mb

Formato: PDF

Abrir

dc.contributor.advisor	Sánchez Peiró, Joan Andreu	es_ES
dc.contributor.advisor	Benedí Ruiz, José Miguel	es_ES
dc.contributor.author	Anitei, Dan	es_ES
dc.date.accessioned	2020-09-18T17:08:20Z
dc.date.available	2020-09-18T17:08:20Z
dc.date.created	2020-07-14
dc.date.issued	2020-09-18	es_ES
dc.identifier.uri	http://hdl.handle.net/10251/150390
dc.description.abstract	[ES] Buscar información en documentos científicos impresos es un reto problemático que recientemente ha recibido atención especial por parte de la comunidad de investigación de Reconocimiento de Formas. Las Expresiones Matemáticas son elementos complejos que aparecen en documentos cientificos, y desarrollar técnicas para localizarlas y reconocerlas requiere preparar data sets que pueden ser utilizados como punto de referencia. La mayoría de las técnicas actuales para lidiar con Expresiones Matemáticas están basadas en técnicas de Reconocimiento de Formas y Aprendizaje Automático y por tanto, estos data sets tienen que ser preparados con información sobre el ground-truth para entrenamiento y test automático. Sin embargo, preparar data sets grandes es muy costoso y requiere mucho tiempo. Este proyecto introduce un data set de documentos científicos que ha sido preparado con el fin de reconocer y buscar Expresiones Matemáticas. Este data set ha sido generado automáticamente a partir de la versión LATEX de los documentos y consecuentemente puede ser aumentado fácilmente. El ground-truth incluye la posición a nivel de página, la versión LATEX de las Expresiones Matemáticas integradas y aisladas del texto y la secuencia de símbolos representados como unicode code points que se han utilizado para definir estas expresiones. En base a este data set, se han extraído estadísticas como por ejemplo el número total y el tipo de las expresiones, el número medio de expresiones por documento y las frecuencias de distribución de todo el conjunto de expresiones. En este documento también se introduce un experimento de clasificación de símbolos matemáticos que puede ser utilizado como punto de partida.	es_ES
dc.description.abstract	[EN] Searching information in printed scientific documents is a challenging problem that has recently received special attention from the Pattern Recognition research community. Mathematical Expressions are complex elements that appear in scientific documents, and developing techniques for locating and recognizing them requires preparation of data sets that can be used as benchmarks. Most of the current techniques for dealing with Mathematical Expressions are based in Machine Intelligent techniques and therefore these data sets have to be prepared with ground-truth information for automatic training and testing. However preparing large data sets with ground-truth is a very expensive and timeconsuming task. This project introduces a data set of scientific documents that has been prepared for Mathematical Expression recognition and searching. This data set has been automatically generated from the LATEX version of the documents and consequently can be enlarged easily. The ground-truth includes the position at page level, the LATEX version for Mathematical Expressions both embedded in the text and displayed and the sequence of mathematical symbols represented as unicode code points used to define these expressions. Based on this data set, statistics such as the total number and type of expressions, the average number of expressions per document and their frequency distribution were extracted. A baseline classification experiment with mathematical symbols from this data set is also reported in this paper.	es_ES
dc.format.extent	74	es_ES
dc.language	Inglés	es_ES
dc.publisher	Universitat Politècnica de València	es_ES
dc.rights	Reconocimiento - Compartir igual (by-sa)	es_ES
dc.subject	Aprendizaje automático	es_ES
dc.subject	Expresiones matemáticas impresas	es_ES
dc.subject	Reconocimiento de formas	es_ES
dc.subject	Redes neuronales convolucionales	es_ES
dc.subject	Data set	es_ES
dc.subject	Machine-learning	es_ES
dc.subject	Mathematical expressions	es_ES
dc.subject	Ground-truth	es_ES
dc.subject	Pattern recognition	es_ES
dc.subject	Convolutional neural networks	es_ES
dc.subject	LaTeX	es_ES
dc.subject	Expresiones matemáticas	es_ES
dc.subject.classification	LENGUAJES Y SISTEMAS INFORMATICOS	es_ES
dc.subject.other	Grado en Ingeniería Informática-Grau en Enginyeria Informàtica	es_ES
dc.title	Development of a scalable database for recognition of printed mathemematical expressions	es_ES
dc.type	Proyecto/Trabajo fin de carrera/grado	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/GVA//GR00-167/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/AEI//TIN2017-91452-EXP/ES/INDEXACION Y BUSQUEDA DE EXPRESIONES MATEMATICAS A GRAN ESCALA EN CORPUS MASIVOS DE DOCUMENTOS IMPRESOS/	es_ES
dc.rights.accessRights	Abierto	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Escola Tècnica Superior d'Enginyeria Informàtica	es_ES
dc.description.bibliographicCitation	Anitei, D. (2020). Development of a scalable database for recognition of printed mathemematical expressions. Universitat Politècnica de València. http://hdl.handle.net/10251/150390	es_ES
dc.description.accrualMethod	TFGM	es_ES
dc.relation.pasarela	TFGM\124360	es_ES
dc.contributor.funder	Agencia Estatal de Investigación	es_ES
dc.contributor.funder	Generalitat Valenciana	es_ES

Este ítem aparece en la(s) siguiente(s) colección(ones)

ETSINF - Trabajos académicos [5174]
Escola Tècnica Superior d'Enginyeria Informàtica

Mostrar el registro sencillo del ítem

Development of a scalable database for recognition of printed mathemematical expressions

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Buscar en RiuNet

Listar

Todo RiuNet

Esta colección

Mi cuenta

Estadísticas

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Development of a scalable database for recognition of printed mathemematical expressions

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)