Egocentric video description based on temporally-linked sequences

Bolaños, Marc; Peris-Abril, Álvaro; Casacuberta Nolla, Francisco; Soler, Sergi; Radeva, Petia

doi:10.1016/j.jvcir.2017.11.022

Identificarse

Buscar en RiuNet

Listar

Todo RiuNet
Esta colección

Mi cuenta

Acceder

Estadísticas

Ver Estadísticas de uso

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Egocentric video description based on temporally-linked sequences

Mostrar el registro sencillo del ítem

Ficheros en el ítem

Nombre: Bolaños;Peris-Abr ...

Tamaño: 3.358Mb

Formato: PDF

Descripción: Versión del Autor.

Abrir

Nombre: 1-s2.0-S104732031 ...

Tamaño: 1.377Mb

Formato: PDF

Descripción: Versión editorial

Solicitar una copia al autor

dc.contributor.author	Bolaños, Marc	es_ES
dc.contributor.author	Peris-Abril, Álvaro	es_ES
dc.contributor.author	Casacuberta Nolla, Francisco	es_ES
dc.contributor.author	Soler, Sergi	es_ES
dc.contributor.author	Radeva, Petia	es_ES
dc.date.accessioned	2020-04-29T07:04:13Z
dc.date.available	2020-04-29T07:04:13Z
dc.date.issued	2018-01	es_ES
dc.identifier.issn	1047-3203	es_ES
dc.identifier.uri	http://hdl.handle.net/10251/141941
dc.description.abstract	[EN] Egocentric vision consists in acquiring images along the day from a first person point-of-view using wearable cameras. The automatic analysis of this information allows to discover daily patterns for improving the quality of life of the user. A natural topic that arises in egocentric vision is storytelling, that is, how to understand and tell the story relying behind the pictures. In this paper, we tackle storytelling as an egocentric sequences description problem. We propose a novel methodology that exploits information from temporally neighboring events, matching precisely the nature of egocentric sequences. Furthermore, we present a new method for multimodal data fusion consisting on a multi-input attention recurrent network. We also release the EDUB-SegDesc dataset. This is the first dataset for egocentric image sequences description, consisting of 1339 events with 3991 descriptions, from 55¿days acquired by 11 people. Finally, we prove that our proposal outperforms classical attentional encoder-decoder methods for video description.	es_ES
dc.description.sponsorship	This work was partially founded by TIN2015-66951-C2, SGR 1219, CERCA, Grant 20141510 (Marato TV3), PrometeoII/2014/030 and R-MIPRCV network (TIN2014-54728-REDC). Petia Radeva is partially founded by ICREA Academia'2014. Marc Bolanos is partially founded by an FPU fellowship. We gratefully acknowledge the support of NVIDIA Corporation with the donation of a Titan X GPU used for this research. The funders had no role in the study design, data collection, analysis, and preparation of the manuscript.	es_ES
dc.language	Inglés	es_ES
dc.publisher	Elsevier	es_ES
dc.relation.ispartof	Journal of Visual Communication and Image Representation	es_ES
dc.rights	Reconocimiento - No comercial - Sin obra derivada (by-nc-nd)	es_ES
dc.subject	Egocentric vision	es_ES
dc.subject	Video description	es_ES
dc.subject	Deep learning	es_ES
dc.subject	Multi-modal learning	es_ES
dc.subject.classification	LENGUAJES Y SISTEMAS INFORMATICOS	es_ES
dc.title	Egocentric video description based on temporally-linked sequences	es_ES
dc.type	Artículo	es_ES
dc.identifier.doi	10.1016/j.jvcir.2017.11.022	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/MINECO//TIN2015-66951-C2/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/AGAUR//2014-SGR-1219/ES/Computer Vision at the Universitat de Barcelona	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/MINECO//TIN2014-54728-REDC/ES/RED DE EXCELENCIA MULTIMODAL INTERACTION IN PATTERN RECOGNITION AND COMPUTER VISION/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/GVA//PROMETEOII%2F2014%2F030/ES/ Adaptive learning and multimodality in machine translation and text transcription/	es_ES
dc.rights.accessRights	Abierto	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació	es_ES
dc.description.bibliographicCitation	Bolaños, M.; Peris-Abril, Á.; Casacuberta Nolla, F.; Soler, S.; Radeva, P. (2018). Egocentric video description based on temporally-linked sequences. Journal of Visual Communication and Image Representation. 50:205-216. https://doi.org/10.1016/j.jvcir.2017.11.022	es_ES
dc.description.accrualMethod	S	es_ES
dc.relation.publisherversion	https://doi.org/10.1016/j.jvcir.2017.11.022	es_ES
dc.description.upvformatpinicio	205	es_ES
dc.description.upvformatpfin	216	es_ES
dc.type.version	info:eu-repo/semantics/publishedVersion	es_ES
dc.description.volume	50	es_ES
dc.relation.pasarela	S\349208	es_ES
dc.contributor.funder	Generalitat Valenciana	es_ES
dc.contributor.funder	Universitat de Barcelona	es_ES
dc.contributor.funder	Centres de Recerca de Catalunya	es_ES
dc.contributor.funder	Ministerio de Economía y Competitividad	es_ES
dc.contributor.funder	Institució Catalana de Recerca i Estudis Avançats	es_ES
dc.contributor.funder	Agencia de Gestión de Ayudas Universitarias y de Investigación

Este ítem aparece en la(s) siguiente(s) colección(ones)

Artículos, conferencias, monografías [45968]

Mostrar el registro sencillo del ítem

Egocentric video description based on temporally-linked sequences

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Buscar en RiuNet

Listar

Todo RiuNet

Esta colección

Mi cuenta

Estadísticas

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Egocentric video description based on temporally-linked sequences

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)