- -

LLM-Informed Multi-Armed Bandit Strategies for Non-Stationary Environments

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

LLM-Informed Multi-Armed Bandit Strategies for Non-Stationary Environments

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author de Curtò, J. es_ES
dc.contributor.author de Zarzà, Irene es_ES
dc.contributor.author Roig, Gemma es_ES
dc.contributor.author Cano, Juan-Carlos es_ES
dc.contributor.author Manzoni, Pietro es_ES
dc.contributor.author Tavares De Araujo Cesariny Calafate, Carlos Miguel es_ES
dc.date.accessioned 2024-05-02T18:08:38Z
dc.date.available 2024-05-02T18:08:38Z
dc.date.issued 2023-07 es_ES
dc.identifier.uri http://hdl.handle.net/10251/203936
dc.description.abstract [EN] In this paper, we introduce an innovative approach to handling the multi-armed bandit (MAB) problem in non-stationary environments, harnessing the predictive power of large language models (LLMs). With the realization that traditional bandit strategies, including epsilon-greedy and upper confidence bound (UCB), may struggle in the face of dynamic changes, we propose a strategy informed by LLMs that offers dynamic guidance on exploration versus exploitation, contingent on the current state of the bandits. We bring forward a new non-stationary bandit model with fluctuating reward distributions and illustrate how LLMs can be employed to guide the choice of bandit amid this variability. Experimental outcomes illustrate the potential of our LLM- informed strategy, demonstrating its adaptability to the fluctuating nature of the bandit problem, while maintaining competitive performance against conventional strategies. This study provides key insights into the capabilities of LLMs in enhancing decision-making processes in dynamic and uncertain scenarios. es_ES
dc.description.sponsorship We acknowledge the support of Universitat Politècnica de València: R&D project PID2021-122580NB-I00, funded by MCIN/AEI/10.13039/501100011033 and ERDF. We thank the following funding sources from GOETHE-University Frankfurt am Main; DePP Dezentrale Plannung von Platoons im Straßengüterverkehr mit Hilfe einer KI auf Basis einzelner LKW , Center for Data Science & AI and xAIBiology . es_ES
dc.language Inglés es_ES
dc.publisher MDPI AG es_ES
dc.relation.ispartof Electronics es_ES
dc.rights Reconocimiento (by) es_ES
dc.subject Multi-armed bandit es_ES
dc.subject Non-stationary environments es_ES
dc.subject Large language models es_ES
dc.subject AI strategy optimization es_ES
dc.subject GPT-3.5-turbo es_ES
dc.subject QLoRA es_ES
dc.subject.classification ARQUITECTURA Y TECNOLOGIA DE COMPUTADORES es_ES
dc.title LLM-Informed Multi-Armed Bandit Strategies for Non-Stationary Environments es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.3390/electronics12132814 es_ES
dc.relation.projectID info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/PID2021-122580NB-I00/ES/SISTEMAS INTELIGENTES DE SENSORIZACION PARA ECOSISTEMAS, ESPACIOS URBANOS Y MOVILIDAD SOSTENIBLE/ es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Escola Tècnica Superior d'Enginyeria Informàtica es_ES
dc.description.bibliographicCitation De Curtò, J.; De Zarzà, I.; Roig, G.; Cano, J.; Manzoni, P.; Tavares De Araujo Cesariny Calafate, CM. (2023). LLM-Informed Multi-Armed Bandit Strategies for Non-Stationary Environments. Electronics. 12(13). https://doi.org/10.3390/electronics12132814 es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion https://doi.org/10.3390/electronics12132814 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 12 es_ES
dc.description.issue 13 es_ES
dc.identifier.eissn 2079-9292 es_ES
dc.relation.pasarela S\495927 es_ES
dc.contributor.funder AGENCIA ESTATAL DE INVESTIGACION es_ES
dc.contributor.funder Goethe-Universität Frankfurt am Main es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem