Mostrar el registro sencillo del ítem
dc.contributor.author | Martínez-Plumed, Fernando | es_ES |
dc.contributor.author | Hernández-Orallo, José | es_ES |
dc.date.accessioned | 2021-07-09T03:31:40Z | |
dc.date.available | 2021-07-09T03:31:40Z | |
dc.date.issued | 2020-06 | es_ES |
dc.identifier.issn | 2475-1502 | es_ES |
dc.identifier.uri | http://hdl.handle.net/10251/169021 | |
dc.description.abstract | [EN] With the purpose of better analyzing the result of artificial intelligence (AI) benchmarks, we present two indicators on the side of the AI problems, difficulty and discrimination, and two indicators on the side of the AI systems, ability and generality. The first three are adapted from psychometric models in item response theory (IRT), whereas generality is defined as a new metric that evaluates whether an agent is consistently good at easy problems and bad at difficult ones. We illustrate how these key indicators give us more insight on the results of two popular benchmarks in AI, the Arcade Learning Environment (Atari 2600 games) and the General Video Game AI competition, and we include some guidelines to estimate and interpret these indicators for other AI benchmarks and competitions. | es_ES |
dc.description.sponsorship | This work was supported by the U.S. Air Force Office of Scientific Research under Award FA9550-17-1-0287; in part by the EU (FEDER) and the Spanish MINECO under Grant TIN 2015-69175-C4-1-R; and in part by the Generalitat Valenciana PROMETEOII/2015/013. The work of F. Mart ' inez-Plumed was supported by INCIBE (Ayudas para la excelencia de los equipos de investigaci ' on avanzada en ciberseguridad), the European Commission, JRC's Centre for Advanced Studies, HUMAINT project (Expert Contract CT-EX2018D335821-101), and UPV PAID-06-18 Ref. SP20180210. The work of J. Hern ' andez-Orallo was supported in part by Salvador de Madariaga grant (PRX17/00467) from the Spanish MECD, in part by the BEST Grant (BEST/2017/045) from the GVA for research stays at the CFI, and in part by the FLI grant RFP2-152. | es_ES |
dc.language | Inglés | es_ES |
dc.publisher | Institute of Electrical and Electronics Engineers (IEEE) | es_ES |
dc.relation.ispartof | IEEE Transactions on Games | es_ES |
dc.rights | Reserva de todos los derechos | es_ES |
dc.subject | Artificial intelligence | es_ES |
dc.subject | Games | es_ES |
dc.subject | Benchmark testing | es_ES |
dc.subject | Task analysis | es_ES |
dc.subject | Adaptation models | es_ES |
dc.subject | Guidelines | es_ES |
dc.subject | Indexes | es_ES |
dc.subject | Artificial intelligence (AI) benchmarks | es_ES |
dc.subject | AI evaluation | es_ES |
dc.subject | Generality | es_ES |
dc.subject | Item response theory (ITR) | es_ES |
dc.subject.classification | LENGUAJES Y SISTEMAS INFORMATICOS | es_ES |
dc.title | Dual Indicators to Analyse AI Benchmarks: Difficulty, Discrimination, Ability and Generality | es_ES |
dc.type | Artículo | es_ES |
dc.identifier.doi | 10.1109/TG.2018.2883773 | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/INCIBE//INCIBEI-2015-27345/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/EC//CT-EX2018D335821-101/EU//HUMAINT/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/UPV//SP20180210/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/MECD//PRX17%2F00467/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/GVA//BEST%2F2017%2F045/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/FLI//RFP2-152/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/UPV//PAID-06-18/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/AFOSR//FA9550-17-1-0287/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/MINECO//TIN2015-69175-C4-1-R/ES/SOLUCIONES EFECTIVAS BASADAS EN LA LOGICA/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/GVA//PROMETEOII%2F2015%2F013/ES/SmartLogic: Logic Technologies for Software Security and Performance/ | es_ES |
dc.rights.accessRights | Abierto | es_ES |
dc.contributor.affiliation | Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació | es_ES |
dc.description.bibliographicCitation | Martínez-Plumed, F.; Hernández-Orallo, J. (2020). Dual Indicators to Analyse AI Benchmarks: Difficulty, Discrimination, Ability and Generality. IEEE Transactions on Games. 12(2):121-131. https://doi.org/10.1109/TG.2018.2883773 | es_ES |
dc.description.accrualMethod | S | es_ES |
dc.relation.publisherversion | https://doi.org/10.1109/TG.2018.2883773 | es_ES |
dc.description.upvformatpinicio | 121 | es_ES |
dc.description.upvformatpfin | 131 | es_ES |
dc.type.version | info:eu-repo/semantics/publishedVersion | es_ES |
dc.description.volume | 12 | es_ES |
dc.description.issue | 2 | es_ES |
dc.relation.pasarela | S\386859 | es_ES |
dc.contributor.funder | European Commission | es_ES |
dc.contributor.funder | Generalitat Valenciana | es_ES |
dc.contributor.funder | Future of Life Institute | es_ES |
dc.contributor.funder | European Regional Development Fund | es_ES |
dc.contributor.funder | Instituto Nacional de Ciberseguridad | es_ES |
dc.contributor.funder | Universitat Politècnica de València | es_ES |
dc.contributor.funder | Air Force Office of Scientific Research | es_ES |
dc.contributor.funder | Ministerio de Economía y Competitividad | es_ES |
dc.contributor.funder | Ministerio de Educación, Cultura y Deporte | es_ES |