Mostrar el registro sencillo del ítem
dc.contributor.author | de Curtò, J. | es_ES |
dc.contributor.author | de Zarzà, Irene | es_ES |
dc.contributor.author | Roig, Gemma | es_ES |
dc.contributor.author | Cano, Juan-Carlos | es_ES |
dc.contributor.author | Manzoni, Pietro | es_ES |
dc.contributor.author | Tavares De Araujo Cesariny Calafate, Carlos Miguel | es_ES |
dc.date.accessioned | 2024-05-02T18:08:38Z | |
dc.date.available | 2024-05-02T18:08:38Z | |
dc.date.issued | 2023-07 | es_ES |
dc.identifier.uri | http://hdl.handle.net/10251/203936 | |
dc.description.abstract | [EN] In this paper, we introduce an innovative approach to handling the multi-armed bandit (MAB) problem in non-stationary environments, harnessing the predictive power of large language models (LLMs). With the realization that traditional bandit strategies, including epsilon-greedy and upper confidence bound (UCB), may struggle in the face of dynamic changes, we propose a strategy informed by LLMs that offers dynamic guidance on exploration versus exploitation, contingent on the current state of the bandits. We bring forward a new non-stationary bandit model with fluctuating reward distributions and illustrate how LLMs can be employed to guide the choice of bandit amid this variability. Experimental outcomes illustrate the potential of our LLM- informed strategy, demonstrating its adaptability to the fluctuating nature of the bandit problem, while maintaining competitive performance against conventional strategies. This study provides key insights into the capabilities of LLMs in enhancing decision-making processes in dynamic and uncertain scenarios. | es_ES |
dc.description.sponsorship | We acknowledge the support of Universitat Politècnica de València: R&D project PID2021-122580NB-I00, funded by MCIN/AEI/10.13039/501100011033 and ERDF. We thank the following funding sources from GOETHE-University Frankfurt am Main; DePP Dezentrale Plannung von Platoons im Straßengüterverkehr mit Hilfe einer KI auf Basis einzelner LKW , Center for Data Science & AI and xAIBiology . | es_ES |
dc.language | Inglés | es_ES |
dc.publisher | MDPI AG | es_ES |
dc.relation.ispartof | Electronics | es_ES |
dc.rights | Reconocimiento (by) | es_ES |
dc.subject | Multi-armed bandit | es_ES |
dc.subject | Non-stationary environments | es_ES |
dc.subject | Large language models | es_ES |
dc.subject | AI strategy optimization | es_ES |
dc.subject | GPT-3.5-turbo | es_ES |
dc.subject | QLoRA | es_ES |
dc.subject.classification | ARQUITECTURA Y TECNOLOGIA DE COMPUTADORES | es_ES |
dc.title | LLM-Informed Multi-Armed Bandit Strategies for Non-Stationary Environments | es_ES |
dc.type | Artículo | es_ES |
dc.identifier.doi | 10.3390/electronics12132814 | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/PID2021-122580NB-I00/ES/SISTEMAS INTELIGENTES DE SENSORIZACION PARA ECOSISTEMAS, ESPACIOS URBANOS Y MOVILIDAD SOSTENIBLE/ | es_ES |
dc.rights.accessRights | Abierto | es_ES |
dc.contributor.affiliation | Universitat Politècnica de València. Escola Tècnica Superior d'Enginyeria Informàtica | es_ES |
dc.description.bibliographicCitation | De Curtò, J.; De Zarzà, I.; Roig, G.; Cano, J.; Manzoni, P.; Tavares De Araujo Cesariny Calafate, CM. (2023). LLM-Informed Multi-Armed Bandit Strategies for Non-Stationary Environments. Electronics. 12(13). https://doi.org/10.3390/electronics12132814 | es_ES |
dc.description.accrualMethod | S | es_ES |
dc.relation.publisherversion | https://doi.org/10.3390/electronics12132814 | es_ES |
dc.type.version | info:eu-repo/semantics/publishedVersion | es_ES |
dc.description.volume | 12 | es_ES |
dc.description.issue | 13 | es_ES |
dc.identifier.eissn | 2079-9292 | es_ES |
dc.relation.pasarela | S\495927 | es_ES |
dc.contributor.funder | AGENCIA ESTATAL DE INVESTIGACION | es_ES |
dc.contributor.funder | Goethe-Universität Frankfurt am Main | es_ES |