Dual Indicators to Analyse AI Benchmarks: Difficulty, Discrimination, Ability and Generality

Martínez-Plumed, Fernando; Hernández-Orallo, José

doi:10.1109/TG.2018.2883773

Identificarse

Buscar en RiuNet

Listar

Todo RiuNet
Esta colección

Mi cuenta

Acceder

Estadísticas

Ver Estadísticas de uso

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Dual Indicators to Analyse AI Benchmarks: Difficulty, Discrimination, Ability and Generality

Mostrar el registro sencillo del ítem

Ficheros en el ítem

Nombre: Martínez-Plumed;H ...

Tamaño: 1.143Mb

Formato: PDF

Descripción: Versión del Autor.

Abrir

Nombre: IEEE Transactions ...

Tamaño: 4.355Mb

Formato: PDF

Descripción: Versión editorial

Solicitar una copia al autor

dc.contributor.author	Martínez-Plumed, Fernando	es_ES
dc.contributor.author	Hernández-Orallo, José	es_ES
dc.date.accessioned	2021-07-09T03:31:40Z
dc.date.available	2021-07-09T03:31:40Z
dc.date.issued	2020-06	es_ES
dc.identifier.issn	2475-1502	es_ES
dc.identifier.uri	http://hdl.handle.net/10251/169021
dc.description.abstract	[EN] With the purpose of better analyzing the result of artificial intelligence (AI) benchmarks, we present two indicators on the side of the AI problems, difficulty and discrimination, and two indicators on the side of the AI systems, ability and generality. The first three are adapted from psychometric models in item response theory (IRT), whereas generality is defined as a new metric that evaluates whether an agent is consistently good at easy problems and bad at difficult ones. We illustrate how these key indicators give us more insight on the results of two popular benchmarks in AI, the Arcade Learning Environment (Atari 2600 games) and the General Video Game AI competition, and we include some guidelines to estimate and interpret these indicators for other AI benchmarks and competitions.	es_ES
dc.description.sponsorship	This work was supported by the U.S. Air Force Office of Scientific Research under Award FA9550-17-1-0287; in part by the EU (FEDER) and the Spanish MINECO under Grant TIN 2015-69175-C4-1-R; and in part by the Generalitat Valenciana PROMETEOII/2015/013. The work of F. Mart ' inez-Plumed was supported by INCIBE (Ayudas para la excelencia de los equipos de investigaci ' on avanzada en ciberseguridad), the European Commission, JRC's Centre for Advanced Studies, HUMAINT project (Expert Contract CT-EX2018D335821-101), and UPV PAID-06-18 Ref. SP20180210. The work of J. Hern ' andez-Orallo was supported in part by Salvador de Madariaga grant (PRX17/00467) from the Spanish MECD, in part by the BEST Grant (BEST/2017/045) from the GVA for research stays at the CFI, and in part by the FLI grant RFP2-152.	es_ES
dc.language	Inglés	es_ES
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)	es_ES
dc.relation.ispartof	IEEE Transactions on Games	es_ES
dc.rights	Reserva de todos los derechos	es_ES
dc.subject	Artificial intelligence	es_ES
dc.subject	Games	es_ES
dc.subject	Benchmark testing	es_ES
dc.subject	Task analysis	es_ES
dc.subject	Adaptation models	es_ES
dc.subject	Guidelines	es_ES
dc.subject	Indexes	es_ES
dc.subject	Artificial intelligence (AI) benchmarks	es_ES
dc.subject	AI evaluation	es_ES
dc.subject	Generality	es_ES
dc.subject	Item response theory (ITR)	es_ES
dc.subject.classification	LENGUAJES Y SISTEMAS INFORMATICOS	es_ES
dc.title	Dual Indicators to Analyse AI Benchmarks: Difficulty, Discrimination, Ability and Generality	es_ES
dc.type	Artículo	es_ES
dc.identifier.doi	10.1109/TG.2018.2883773	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/INCIBE//INCIBEI-2015-27345/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/EC//CT-EX2018D335821-101/EU//HUMAINT/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/UPV//SP20180210/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/MECD//PRX17%2F00467/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/GVA//BEST%2F2017%2F045/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/FLI//RFP2-152/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/UPV//PAID-06-18/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/AFOSR//FA9550-17-1-0287/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/MINECO//TIN2015-69175-C4-1-R/ES/SOLUCIONES EFECTIVAS BASADAS EN LA LOGICA/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/GVA//PROMETEOII%2F2015%2F013/ES/SmartLogic: Logic Technologies for Software Security and Performance/	es_ES
dc.rights.accessRights	Abierto	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació	es_ES
dc.description.bibliographicCitation	Martínez-Plumed, F.; Hernández-Orallo, J. (2020). Dual Indicators to Analyse AI Benchmarks: Difficulty, Discrimination, Ability and Generality. IEEE Transactions on Games. 12(2):121-131. https://doi.org/10.1109/TG.2018.2883773	es_ES
dc.description.accrualMethod	S	es_ES
dc.relation.publisherversion	https://doi.org/10.1109/TG.2018.2883773	es_ES
dc.description.upvformatpinicio	121	es_ES
dc.description.upvformatpfin	131	es_ES
dc.type.version	info:eu-repo/semantics/publishedVersion	es_ES
dc.description.volume	12	es_ES
dc.description.issue	2	es_ES
dc.relation.pasarela	S\386859	es_ES
dc.contributor.funder	European Commission	es_ES
dc.contributor.funder	Generalitat Valenciana	es_ES
dc.contributor.funder	Future of Life Institute	es_ES
dc.contributor.funder	European Regional Development Fund	es_ES
dc.contributor.funder	Instituto Nacional de Ciberseguridad	es_ES
dc.contributor.funder	Universitat Politècnica de València	es_ES
dc.contributor.funder	Air Force Office of Scientific Research	es_ES
dc.contributor.funder	Ministerio de Economía y Competitividad	es_ES
dc.contributor.funder	Ministerio de Educación, Cultura y Deporte	es_ES

Este ítem aparece en la(s) siguiente(s) colección(ones)

Artículos, conferencias, monografías [47480]

Mostrar el registro sencillo del ítem

Dual Indicators to Analyse AI Benchmarks: Difficulty, Discrimination, Ability and Generality

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Buscar en RiuNet

Listar

Todo RiuNet

Esta colección

Mi cuenta

Estadísticas

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Dual Indicators to Analyse AI Benchmarks: Difficulty, Discrimination, Ability and Generality

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)