- -

Dual Indicators to Analyse AI Benchmarks: Difficulty, Discrimination, Ability and Generality

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

Dual Indicators to Analyse AI Benchmarks: Difficulty, Discrimination, Ability and Generality

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Martínez-Plumed, Fernando es_ES
dc.contributor.author Hernández-Orallo, José es_ES
dc.date.accessioned 2021-07-09T03:31:40Z
dc.date.available 2021-07-09T03:31:40Z
dc.date.issued 2020-06 es_ES
dc.identifier.issn 2475-1502 es_ES
dc.identifier.uri http://hdl.handle.net/10251/169021
dc.description.abstract [EN] With the purpose of better analyzing the result of artificial intelligence (AI) benchmarks, we present two indicators on the side of the AI problems, difficulty and discrimination, and two indicators on the side of the AI systems, ability and generality. The first three are adapted from psychometric models in item response theory (IRT), whereas generality is defined as a new metric that evaluates whether an agent is consistently good at easy problems and bad at difficult ones. We illustrate how these key indicators give us more insight on the results of two popular benchmarks in AI, the Arcade Learning Environment (Atari 2600 games) and the General Video Game AI competition, and we include some guidelines to estimate and interpret these indicators for other AI benchmarks and competitions. es_ES
dc.description.sponsorship This work was supported by the U.S. Air Force Office of Scientific Research under Award FA9550-17-1-0287; in part by the EU (FEDER) and the Spanish MINECO under Grant TIN 2015-69175-C4-1-R; and in part by the Generalitat Valenciana PROMETEOII/2015/013. The work of F. Mart ' inez-Plumed was supported by INCIBE (Ayudas para la excelencia de los equipos de investigaci ' on avanzada en ciberseguridad), the European Commission, JRC's Centre for Advanced Studies, HUMAINT project (Expert Contract CT-EX2018D335821-101), and UPV PAID-06-18 Ref. SP20180210. The work of J. Hern ' andez-Orallo was supported in part by Salvador de Madariaga grant (PRX17/00467) from the Spanish MECD, in part by the BEST Grant (BEST/2017/045) from the GVA for research stays at the CFI, and in part by the FLI grant RFP2-152. es_ES
dc.language Inglés es_ES
dc.publisher Institute of Electrical and Electronics Engineers (IEEE) es_ES
dc.relation.ispartof IEEE Transactions on Games es_ES
dc.rights Reserva de todos los derechos es_ES
dc.subject Artificial intelligence es_ES
dc.subject Games es_ES
dc.subject Benchmark testing es_ES
dc.subject Task analysis es_ES
dc.subject Adaptation models es_ES
dc.subject Guidelines es_ES
dc.subject Indexes es_ES
dc.subject Artificial intelligence (AI) benchmarks es_ES
dc.subject AI evaluation es_ES
dc.subject Generality es_ES
dc.subject Item response theory (ITR) es_ES
dc.subject.classification LENGUAJES Y SISTEMAS INFORMATICOS es_ES
dc.title Dual Indicators to Analyse AI Benchmarks: Difficulty, Discrimination, Ability and Generality es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.1109/TG.2018.2883773 es_ES
dc.relation.projectID info:eu-repo/grantAgreement/INCIBE//INCIBEI-2015-27345/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/EC//CT-EX2018D335821-101/EU//HUMAINT/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/UPV//SP20180210/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/MECD//PRX17%2F00467/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/GVA//BEST%2F2017%2F045/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/FLI//RFP2-152/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/UPV//PAID-06-18/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/AFOSR//FA9550-17-1-0287/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/MINECO//TIN2015-69175-C4-1-R/ES/SOLUCIONES EFECTIVAS BASADAS EN LA LOGICA/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/GVA//PROMETEOII%2F2015%2F013/ES/SmartLogic: Logic Technologies for Software Security and Performance/ es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació es_ES
dc.description.bibliographicCitation Martínez-Plumed, F.; Hernández-Orallo, J. (2020). Dual Indicators to Analyse AI Benchmarks: Difficulty, Discrimination, Ability and Generality. IEEE Transactions on Games. 12(2):121-131. https://doi.org/10.1109/TG.2018.2883773 es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion https://doi.org/10.1109/TG.2018.2883773 es_ES
dc.description.upvformatpinicio 121 es_ES
dc.description.upvformatpfin 131 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 12 es_ES
dc.description.issue 2 es_ES
dc.relation.pasarela S\386859 es_ES
dc.contributor.funder European Commission es_ES
dc.contributor.funder Generalitat Valenciana es_ES
dc.contributor.funder Future of Life Institute es_ES
dc.contributor.funder European Regional Development Fund es_ES
dc.contributor.funder Instituto Nacional de Ciberseguridad es_ES
dc.contributor.funder Universitat Politècnica de València es_ES
dc.contributor.funder Air Force Office of Scientific Research es_ES
dc.contributor.funder Ministerio de Economía y Competitividad es_ES
dc.contributor.funder Ministerio de Educación, Cultura y Deporte es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem