- -

Semantic Scene Understanding with Large Language Models on Unmanned Aerial Vehicles

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

Semantic Scene Understanding with Large Language Models on Unmanned Aerial Vehicles

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author de Curtò, J. es_ES
dc.contributor.author de Zarzà, I. es_ES
dc.contributor.author Tavares De Araujo Cesariny Calafate, Carlos Miguel es_ES
dc.date.accessioned 2024-10-23T18:08:22Z
dc.date.available 2024-10-23T18:08:22Z
dc.date.issued 2023-02 es_ES
dc.identifier.uri http://hdl.handle.net/10251/210789
dc.description.abstract [EN] Unmanned Aerial Vehicles (UAVs) are able to provide instantaneous visual cues and a high-level data throughput that could be further leveraged to address complex tasks, such as semantically rich scene understanding. In this work, we built on the use of Large Language Models (LLMs) and Visual Language Models (VLMs), together with a state-of-the-art detection pipeline, to provide thorough zero-shot UAV scene literary text descriptions. The generated texts achieve a GUNNING Fog median grade level in the range of 7-12. Applications of this framework could be found in the filming industry and could enhance user experience in theme parks or in the advertisement sector. We demonstrate a low-cost highly efficient state-of-the-art practical implementation of microdrones in a well-controlled and challenging setting, in addition to proposing the use of standardized readability metrics to assess LLM-enhanced descriptions. es_ES
dc.description.sponsorship This work is supported by the HK Innovation and Technology Commission (InnoHK Project CIMDA). We acknowledge the support of Universitat Politecnica de Valencia; R&D project PID2021-122580NB-I00, funded by MCIN/AEI/10.13039/501100011033 and ERDF. es_ES
dc.language Inglés es_ES
dc.publisher MDPI AG es_ES
dc.relation.ispartof Drones es_ES
dc.rights Reconocimiento (by) es_ES
dc.subject Scene understanding es_ES
dc.subject Large language models es_ES
dc.subject Visual language models es_ES
dc.subject CLIP es_ES
dc.subject GPT-3 es_ES
dc.subject YOLOv7 es_ES
dc.subject UAV es_ES
dc.subject.classification ARQUITECTURA Y TECNOLOGIA DE COMPUTADORES es_ES
dc.title Semantic Scene Understanding with Large Language Models on Unmanned Aerial Vehicles es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.3390/drones7020114 es_ES
dc.relation.projectID info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/PID2021-122580NB-I00/ES/SISTEMAS INTELIGENTES DE SENSORIZACION PARA ECOSISTEMAS, ESPACIOS URBANOS Y MOVILIDAD SOSTENIBLE/ es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Escola Tècnica Superior d'Enginyeria Informàtica es_ES
dc.description.bibliographicCitation De Curtò, J.; De Zarzà, I.; Tavares De Araujo Cesariny Calafate, CM. (2023). Semantic Scene Understanding with Large Language Models on Unmanned Aerial Vehicles. Drones. 7(2). https://doi.org/10.3390/drones7020114 es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion https://doi.org/10.3390/drones7020114 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 7 es_ES
dc.description.issue 2 es_ES
dc.identifier.eissn 2504-446X es_ES
dc.relation.pasarela S\482432 es_ES
dc.contributor.funder AGENCIA ESTATAL DE INVESTIGACION es_ES
dc.contributor.funder European Regional Development Fund es_ES
dc.contributor.funder Universitat Politècnica de València es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem