Evaluation in artificial intelligence: From task-oriented to ability-oriented measurement

José Hernández-Orallo

doi:10.1007/s10462-016-9505-7

Identificarse

Buscar en RiuNet

Listar

Todo RiuNet
Esta colección

Mi cuenta

Acceder

Estadísticas

Ver Estadísticas de uso

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Evaluation in artificial intelligence: From task-oriented to ability-oriented measurement

Mostrar el registro sencillo del ítem

Ficheros en el ítem

Nombre: AIRE-Evaluation-o ...

Tamaño: 1.013Mb

Formato: PDF

Descripción: Versión del Autor.

Abrir

Nombre: Evaluation in ...

Tamaño: 1.012Mb

Formato: PDF

Descripción: Versión editorial

Solicitar una copia al autor

dc.contributor.author	José Hernández-Orallo	es_ES
dc.date.accessioned	2017-06-26T08:56:40Z
dc.date.available	2017-06-26T08:56:40Z
dc.date.issued	2016-08-19
dc.identifier.issn	0269-2821
dc.identifier.uri	http://hdl.handle.net/10251/83598
dc.description	The final publication is available at Springer via http://dx.doi.org/ 10.1007/s10462-016-9505-7.	es_ES
dc.description.abstract	The evaluation of artificial intelligence systems and components is crucial for the progress of the discipline. In this paper we describe and critically assess the different ways AI systems are evaluated, and the role of components and techniques in these systems. We first focus on the traditional task-oriented evaluation approach. We identify three kinds of evaluation: human discrimination, problem benchmarks and peer confrontation. We describe some of the limitations of the many evaluation schemes and competitions in these three categories, and follow the progression of some of these tests. We then focus on a less customary (and challenging) ability-oriented evaluation approach, where a system is characterised by its (cognitive) abilities, rather than by the tasks it is designed to solve. We discuss several possibilities: the adaptation of cognitive tests used for humans and animals, the development of tests derived from algorithmic information theory or more integrated approaches under the perspective of universal psychometrics. We analyse some evaluation tests from AI that are better positioned for an ability-oriented evaluation and discuss how their problems and limitations can possibly be addressed with some of the tools and ideas that appear within the paper. Finally, we enumerate a series of lessons learnt and generic guidelines to be used when an AI evaluation scheme is under consideration.	es_ES
dc.description.sponsorship	I thank the organisers of the AEPIA Summer School On Artificial Intelligence, held in September 2014, for giving me the opportunity to give a lecture on 'AI Evaluation'. This paper was born out of and evolved through that lecture. The information about many benchmarks and competitions discussed in this paper have been contrasted with information from and discussions with many people: M. Bedia, A. Cangelosi, C. Dimitrakakis, I. GarcIa-Varea, Katja Hofmann, W. Langdon, E. Messina, S. Mueller, M. Siebers and C. Soares. Figure 4 is courtesy of F. Martinez-Plumed. Finally, I thank the anonymous reviewers, whose comments have helped to significantly improve the balance and coverage of the paper. This work has been partially supported by the EU (FEDER) and the Spanish MINECO under Grants TIN 2013-45732-C4-1-P, TIN 2015-69175-C4-1-R and by Generalitat Valenciana PROMETEOII2015/013.	en_EN
dc.language	Inglés	es_ES
dc.publisher	Springer Verlag (Germany)	es_ES
dc.relation.ispartof	Artificial Intelligence Review	es_ES
dc.rights	Reserva de todos los derechos	es_ES
dc.subject	AI evaluation	es_ES
dc.subject	AI competitions	es_ES
dc.subject	Machine intelligence	es_ES
dc.subject	Cognitive abilities	es_ES
dc.subject	Universal psychometrics	es_ES
dc.subject	Turing test	es_ES
dc.subject.classification	LENGUAJES Y SISTEMAS INFORMATICOS	es_ES
dc.title	Evaluation in artificial intelligence: From task-oriented to ability-oriented measurement	es_ES
dc.type	Artículo	es_ES
dc.identifier.doi	10.1007/s10462-016-9505-7
dc.relation.projectID	info:eu-repo/grantAgreement/MINECO//TIN2013-45732-C4-1-P/ES/UNA APROXIMACION DECLARATIVA AL MODELADO, ANALISIS Y RESOLUCION DE PROBLEMAS/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/GVA//PROMETEOII%2F2015%2F013/ES/SmartLogic: Logic Technologies for Software Security and Performance/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/MINECO//TIN2015-69175-C4-1-R/ES/SOLUCIONES EFECTIVAS BASADAS EN LA LOGICA/
dc.rights.accessRights	Abierto	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Escola Tècnica Superior d'Enginyeria Informàtica	es_ES
dc.description.bibliographicCitation	José Hernández-Orallo (2016). Evaluation in artificial intelligence: From task-oriented to ability-oriented measurement. Artificial Intelligence Review. 1-51. https://doi.org/10.1007/s10462-016-9505-7	es_ES
dc.description.accrualMethod	S	es_ES
dc.relation.publisherversion	https://link.springer.com/article/10.1007/s10462-016-9505-7	es_ES
dc.description.upvformatpinicio	1	es_ES
dc.description.upvformatpfin	51	es_ES
dc.type.version	info:eu-repo/semantics/publishedVersion	es_ES
dc.relation.senia	327775	es_ES
dc.contributor.funder	Generalitat Valenciana	es_ES
dc.contributor.funder	Ministerio de Economía y Competitividad	es_ES
dc.description.references	Abel D, Agarwal A, Diaz F, Krishnamurthy A, Schapire RE (2016) Exploratory gradient boosting for reinforcement learning in complex domains. arXiv preprint arXiv:1603.04119	es_ES
dc.description.references	Adams S, Arel I, Bach J, Coop R, Furlan R, Goertzel B, Hall JS, Samsonovich A, Scheutz M, Schlesinger M, Shapiro SC, Sowa J (2012) Mapping the landscape of human-level artificial general intelligence. AI Mag 33(1):25–42	es_ES
dc.description.references	Adams SS, Banavar G, Campbell M (2016) I-athlon: towards a multi-dimensional Turing test. AI Mag 37(1):78–84	es_ES
dc.description.references	Alcalá J, Fernández A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2010) Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Mult Valued Logic Soft Comput 17:255–287	es_ES
dc.description.references	Alexander JRM, Smales S (1997) Intelligence, learning and long-term memory. Personal Individ Differ 23(5):815–825	es_ES
dc.description.references	Alpcan T, Everitt T, Hutter M (2014) Can we measure the difficulty of an optimization problem? In: IEEE information theory workshop (ITW)	es_ES
dc.description.references	Alur R, Bodik R, Juniwal G, Martin MMK, Raghothaman M, Seshia SA, Singh R, Solar-Lezama A, Torlak E, Udupa A (2013) Syntax-guided synthesis. In: Formal methods in computer-aided design (FMCAD), 2013, IEEE, pp 1–17	es_ES
dc.description.references	Alvarado N, Adams SS, Burbeck S, Latta C (2002) Beyond the Turing test: performance metrics for evaluating a computer simulation of the human mind. In: Proceedings of the 2nd international conference on development and learning, IEEE, pp 147–152	es_ES
dc.description.references	Amigoni F, Bastianelli E, Berghofer J, Bonarini A, Fontana G, Hochgeschwender N, Iocchi L, Kraetzschmar G, Lima P, Matteucci M, Miraldo P, Nardi D, Schiaffonati V (2015) Competitions for benchmarking: task and functionality scoring complete performance assessment. IEEE Robot Autom Mag 22(3):53–61	es_ES
dc.description.references	Anderson J, Lebiere C (2003) The Newell test for a theory of cognition. Behav Brain Sci 26(5):587–601	es_ES
dc.description.references	Anderson J, Baltes J, Cheng CT (2011) Robotics competitions as benchmarks for AI research. Knowl Eng Rev 26(01):11–17	es_ES
dc.description.references	Arel I, Rose DC, Karnowski TP (2010) Deep machine learning—a new frontier in artificial intelligence research. IEEE Comput Intell Mag 5(4):13–18	es_ES
dc.description.references	Asada M, Hosoda K, Kuniyoshi Y, Ishiguro H, Inui T, Yoshikawa Y, Ogino M, Yoshida C (2009) Cognitive developmental robotics: a survey. IEEE Trans Auton Ment Dev 1(1):12–34	es_ES
dc.description.references	Aziz H, Brill M, Fischer F, Harrenstein P, Lang J, Seedig HG (2015) Possible and necessary winners of partial tournaments. J Artif Intell Res 54:493–534	es_ES
dc.description.references	Bache K, Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml	es_ES
dc.description.references	Bagnall AJ, Zatuchna ZV (2005) On the classification of maze problems. In: Bull L, Kovacs T (eds) Foundations of learning classifier system. Studies in fuzziness and soft computing, vol. 183, Springer, pp 305–316. http://rd.springer.com/chapter/10.1007/11319122_12	es_ES
dc.description.references	Baldwin D, Yadav SB (1995) The process of research investigations in artificial intelligence - a unified view. IEEE Trans Syst Man Cybern 25(5):852–861	es_ES
dc.description.references	Bellemare MG, Naddaf Y, Veness J, Bowling M (2013) The arcade learning environment: an evaluation platform for general agents. J Artif Intell Res 47:253–279	es_ES
dc.description.references	Besold TR (2014) A note on chances and limitations of psychometric ai. In: KI 2014: advances in artificial intelligence. Springer, pp 49–54	es_ES
dc.description.references	Biever C (2011) Ultimate IQ: one test to rule them all. New Sci 211(2829, 10 September 2011):42–45	es_ES
dc.description.references	Borg M, Johansen SS, Thomsen DL, Kraus M (2012) Practical implementation of a graphics Turing test. In: Advances in visual computing. Springer, pp 305–313	es_ES
dc.description.references	Boring EG (1923) Intelligence as the tests test it. New Repub 35–37	es_ES
dc.description.references	Bostrom N (2014) Superintelligence: paths, dangers, strategies. Oxford University Press, Oxford	es_ES
dc.description.references	Brazdil P, Carrier CG, Soares C, Vilalta R (2008) Metalearning: applications to data mining. Springer, New York	es_ES
dc.description.references	Bringsjord S (2011) Psychometric artificial intelligence. J Exp Theor Artif Intell 23(3):271–277	es_ES
dc.description.references	Bringsjord S, Schimanski B (2003) What is artificial intelligence? Psychometric AI as an answer. In: International joint conference on artificial intelligence, pp 887–893	es_ES
dc.description.references	Brundage M (2016) Modeling progress in ai. AAAI 2016 Workshop on AI, Ethics, and Society	es_ES
dc.description.references	Buchanan BG (1988) Artificial intelligence as an experimental science. Springer, New York	es_ES
dc.description.references	Buhrmester M, Kwang T, Gosling SD (2011) Amazon’s mechanical turk a new source of inexpensive, yet high-quality, data? Perspect Psychol Sci 6(1):3–5	es_ES
dc.description.references	Bursztein E, Aigrain J, Moscicki A, Mitchell JC (2014) The end is nigh: generic solving of text-based captchas. In: Proceedings of the 8th USENIX conference on Offensive Technologies, USENIX Association, p 3	es_ES
dc.description.references	Campbell M, Hoane AJ, Hsu F (2002) Deep Blue. Artif Intell 134(1–2):57–83	es_ES
dc.description.references	Cangelosi A, Schlesinger M, Smith LB (2015) Developmental robotics: from babies to robots. MIT Press, Cambridge	es_ES
dc.description.references	Caputo B, Müller H, Martinez-Gomez J, Villegas M, Acar B, Patricia N, Marvasti N, Üsküdarlı S, Paredes R, Cazorla M et al (2014) Imageclef 2014: overview and analysis of the results. In: Information access evaluation. Multilinguality, multimodality, and interaction, Springer, pp 192–211	es_ES
dc.description.references	Carlson A, Betteridge J, Kisiel B, Settles B, Hruschka ER Jr, Mitchell TM (2010) Toward an architecture for never-ending language learning. In: AAAI, vol 5, p 3	es_ES
dc.description.references	Carroll JB (1993) Human cognitive abilities: a survey of factor-analytic studies. Cambridge University Press, Cambridge	es_ES
dc.description.references	Caruana R (1997) Multitask learning. Mach Learn 28(1):41–75	es_ES
dc.description.references	Chaitin GJ (1982) Gödel’s theorem and information. Int J Theor Phys 21(12):941–954	es_ES
dc.description.references	Chandrasekaran B (1990) What kind of information processing is intelligence? In: The foundation of artificial intelligence—a sourcebook. Cambridge University Press, pp 14–46	es_ES
dc.description.references	Chater N (1999) The search for simplicity: a fundamental cognitive principle? Q J Exp Psychol Sect A 52(2):273–302	es_ES
dc.description.references	Chater N, Vitányi P (2003) Simplicity: a unifying principle in cognitive science? Trends Cogn Sci 7(1):19–22	es_ES
dc.description.references	Chu Z, Gianvecchio S, Wang H, Jajodia S (2010) Who is tweeting on twitter: human, bot, or cyborg? In: Proceedings of the 26th annual computer security applications conference, ACM, pp 21–30	es_ES
dc.description.references	Cochran WG (2007) Sampling techniques. Wiley, New York	es_ES
dc.description.references	Cohen PR, Howe AE (1988) How evaluation guides AI research: the message still counts more than the medium. AI Mag 9(4):35	es_ES
dc.description.references	Cohen Y (2013) Testing and cognitive enhancement. Technical repor, National Institute for Testing and Evaluation, Jerusalem, Israel	es_ES
dc.description.references	Conrad JG, Zeleznikow J (2013) The significance of evaluation in AI and law: a case study re-examining ICAIL proceedings. In: Proceedings of the 14th international conference on artificial intelligence and law, ACM, pp 186–191	es_ES
dc.description.references	Conrad JG, Zeleznikow J (2015) The role of evaluation in ai and law. In: Proceedings of the 15th international conference on artificial intelligence and law, pp 181–186	es_ES
dc.description.references	Deary IJ, Der G, Ford G (2001) Reaction times and intelligence differences: a population-based cohort study. Intelligence 29(5):389–399	es_ES
dc.description.references	Decker KS, Durfee EH, Lesser VR (1989) Evaluating research in cooperative distributed problem solving. Distrib Artif Intell 2:487–519	es_ES
dc.description.references	Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30	es_ES
dc.description.references	Detterman DK (2011) A challenge to Watson. Intelligence 39(2–3):77–78	es_ES
dc.description.references	Dimitrakakis C (2016) Personal communication	es_ES
dc.description.references	Dimitrakakis C, Li G, Tziortziotis N (2014) The reinforcement learning competition 2014. AI Mag 35(3):61–65	es_ES
dc.description.references	Dowe DL (2013) Introduction to Ray Solomonoff 85th memorial conference. In: Dowe DL (ed) Algorithmic probability and friends. Bayesian prediction and artificial intelligence, lecture notes in computer science, vol 7070. Springer, Berlin, pp 1–36	es_ES
dc.description.references	Dowe DL, Hajek AR (1997) A computational extension to the Turing Test. In: Proceedings of the 4th conference of the Australasian cognitive science society, University of Newcastle, NSW, Australia	es_ES
dc.description.references	Dowe DL, Hajek AR (1998) A non-behavioural, computational extension to the Turing test. In: International conference on computational intelligence and multimedia applications (ICCIMA’98), Gippsland, Australia, pp 101–106	es_ES
dc.description.references	Dowe DL, Hernández-Orallo J (2012) IQ tests are not for machines, yet. Intelligence 40(2):77–81	es_ES
dc.description.references	Dowe DL, Hernández-Orallo J (2014) How universal can an intelligence test be? Adapt Behav 22(1):51–69	es_ES
dc.description.references	Drummond C (2009) Replicability is not reproducibility: nor is it good science. In: Proceedings of the evaluation methods for machine learning workshop at the 26th ICML, Montreal, Canada	es_ES
dc.description.references	Drummond C, Japkowicz N (2010) Warning: statistical benchmarking is addictive. Kicking the habit in machine learning. J Exp Theor Artif Intell 22(1):67–80	es_ES
dc.description.references	Duan Y, Chen X, Houthooft R, Schulman J, Abbeel P (2016) Benchmarking deep reinforcement learning for continuous control. arXiv preprint arXiv:1604.06778	es_ES
dc.description.references	Eden AH, Moor JH, Soraker JH, Steinhart E (2013) Singularity hypotheses: a scientific and philosophical assessment. Springer, New York	es_ES
dc.description.references	Edmondson W (2012) The intelligence in ETI—what can we know? Acta Astronaut 78:37–42	es_ES
dc.description.references	Elo AE (1978) The rating of chessplayers, past and present, vol 3. Batsford, London	es_ES
dc.description.references	Embretson SE, Reise SP (2000) Item response theory for psychologists. L. Erlbaum, Hillsdale	es_ES
dc.description.references	Evans JM, Messina ER (2001) Performance metrics for intelligent systems. NIST Special Publication SP, pp 101–104	es_ES
dc.description.references	Everitt T, Lattimore T, Hutter M (2014) Free lunch for optimisation under the universal distribution. In: 2014 IEEE Congress on evolutionary computation (CEC), IEEE, pp 167–174	es_ES
dc.description.references	Falkenauer E (1998) On method overfitting. J Heuristics 4(3):281–287	es_ES
dc.description.references	Feldman J (2003) Simplicity and complexity in human concept learning. Gen Psychol 38(1):9–15	es_ES
dc.description.references	Ferrando PJ (2009) Difficulty, discrimination, and information indices in the linear factor analysis model for continuous item responses. Appl Psychol Meas 33(1):9–24	es_ES
dc.description.references	Ferrando PJ (2012) Assessing the discriminating power of item and test scores in the linear factor-analysis model. Psicológica 33:111–139	es_ES
dc.description.references	Ferri C, Hernández-Orallo J, Modroiu R (2009) An experimental comparison of performance measures for classification. Pattern Recogn Lett 30(1):27–38	es_ES
dc.description.references	Ferrucci D, Brown E, Chu-Carroll J, Fan J, Gondek D, Kalyanpur AA, Lally A, Murdock J, Nyberg E, Prager J et al (2010) Building Watson: an overview of the DeepQA project. AI Mag 31(3):59–79	es_ES
dc.description.references	Fogel DB (1991) The evolution of intelligent decision making in gaming. Cybern Syst 22(2):223–236	es_ES
dc.description.references	Gaschnig J, Klahr P, Pople H, Shortliffe E, Terry A (1983) Evaluation of expert systems: issues and case studies. Build Exp Syst 1:241–278	es_ES
dc.description.references	Geissman JR, Schultz RD (1988) Verification & validation. AI Exp 3(2):26–33	es_ES
dc.description.references	Genesereth M, Love N, Pell B (2005) General game playing: overview of the AAAI competition. AI Mag 26(2):62	es_ES
dc.description.references	Gerónimo D, López AM (2014) Datasets and benchmarking. In: Vision-based pedestrian protection systems for intelligent vehicles. Springer, pp 87–93	es_ES
dc.description.references	Goertzel B, Pennachin C (eds) (2007) Artificial general intelligence. Springer, New York	es_ES
dc.description.references	Goertzel B, Arel I, Scheutz M (2009) Toward a roadmap for human-level artificial general intelligence: embedding HLAI systems in broad, approachable, physical or virtual contexts. Artif Gen Intell Roadmap Initiat	es_ES
dc.description.references	Goldreich O, Vadhan S (2007) Special issue on worst-case versus average-case complexity editors’ foreword. Comput complex 16(4):325–330	es_ES
dc.description.references	Gordon BB (2007) Report on panel discussion on (re-)establishing or increasing collaborative links between artificial intelligence and intelligent systems. In: Messina ER, Madhavan R (eds) Proceedings of the 2007 workshop on performance metrics for intelligent systems, pp 302–303	es_ES
dc.description.references	Gulwani S, Hernández-Orallo J, Kitzelmann E, Muggleton SH, Schmid U, Zorn B (2015) Inductive programming meets the real world. Commun ACM 58(11):90–99	es_ES
dc.description.references	Hand DJ (2004) Measurement theory and practice. A Hodder Arnold Publication, London	es_ES
dc.description.references	Hernández-Orallo J (2000a) Beyond the Turing test. J Logic Lang Inf 9(4):447–466	es_ES
dc.description.references	Hernández-Orallo J (2000b) On the computational measurement of intelligence factors. In: Meystel A (ed) Performance metrics for intelligent systems workshop. National Institute of Standards and Technology, Gaithersburg, pp 1–8	es_ES
dc.description.references	Hernández-Orallo J (2000c) Thesis: computational measures of information gain and reinforcement in inference processes. AI Commun 13(1):49–50	es_ES
dc.description.references	Hernández-Orallo J (2010) A (hopefully) non-biased universal environment class for measuring intelligence of biological and artificial systems. In: Artificial general intelligence, 3rd International Conference. Atlantis Press, Extended report at http://users.dsic.upv.es/proy/anynt/unbiased.pdf , pp 182–183	es_ES
dc.description.references	Hernández-Orallo J (2014) On environment difficulty and discriminating power. Auton Agents Multi-Agent Syst. 29(3):402–454. doi: 10.1007/s10458-014-9257-1	es_ES
dc.description.references	Hernández-Orallo J, Dowe DL (2010) Measuring universal intelligence: towards an anytime intelligence test. Artif Intell 174(18):1508–1539	es_ES
dc.description.references	Hernández-Orallo J, Dowe DL (2013) On potential cognitive abilities in the machine kingdom. Minds Mach 23:179–210	es_ES
dc.description.references	Hernández-Orallo J, Minaya-Collado N (1998) A formal definition of intelligence based on an intensional variant of Kolmogorov complexity. In: Proceedings of international symposium of engineering of intelligent systems (EIS’98), ICSC Press, pp 146–163	es_ES
dc.description.references	Hernández-Orallo J, Dowe DL, España-Cubillo S, Hernández-Lloreda MV, Insa-Cabrera J (2011) On more realistic environment distributions for defining, evaluating and developing intelligence. In: Schmidhuber J, Thórisson K, Looks M (eds) Artificial general intelligence, LNAI, vol 6830. Springer, New York, pp 82–91	es_ES
dc.description.references	Hernández-Orallo J, Flach P, Ferri C (2012a) A unified view of performance metrics: translating threshold choice into expected classification loss. J Mach Learn Res 13(1):2813–2869	es_ES
dc.description.references	Hernández-Orallo J, Insa-Cabrera J, Dowe DL, Hibbard B (2012b) Turing Tests with Turing machines. In: Voronkov A (ed) Turing-100, EPiC Series, vol 10, pp 140–156	es_ES
dc.description.references	Hernández-Orallo J, Dowe DL, Hernández-Lloreda MV (2014) Universal psychometrics: measuring cognitive abilities in the machine kingdom. Cogn Syst Res 27:50–74	es_ES
dc.description.references	Hernández-Orallo J, Martínez-Plumed F, Schmid U, Siebers M, Dowe DL (2016) Computer models solving intelligence test problems: progress and implications. Artif Intell 230:74–107	es_ES
dc.description.references	Herrmann E, Call J, Hernández-Lloreda MV, Hare B, Tomasello M (2007) Humans have evolved specialized skills of social cognition: the cultural intelligence hypothesis. Science 317(5843):1360–1366	es_ES
dc.description.references	Hibbard B (2009) Bias and no free lunch in formal measures of intelligence. J Artif Gen Intell 1(1):54–61	es_ES
dc.description.references	Hingston P (2010) A new design for a Turing Test for bots. In: 2010 IEEE symposium on computational intelligence and games (CIG), IEEE, pp 345–350	es_ES
dc.description.references	Hingston P (2012) Believable bots: can computers play like people?. Springer, New York	es_ES
dc.description.references	Ho TK, Basu M (2002) Complexity measures of supervised classification problems. IEEE Trans Pattern Anal Mach Intell 24(3):289–300	es_ES
dc.description.references	Hutter M (2007) Universal algorithmic intelligence: a mathematical top $$\rightarrow $$ → down approach. In: Goertzel B, Pennachin C (eds) Artificial general intelligence, cognitive technologies. Springer, Berlin, pp 227–290	es_ES
dc.description.references	Igel C, Toussaint M (2005) A no-free-lunch theorem for non-uniform distributions of target functions. J Math Model Algorithms 3(4):313–322	es_ES
dc.description.references	Insa-Cabrera J (2016) Towards a universal test of social intelligence. Ph.D. thesis, Departament de Sistemes Informátics i Computació, UPV	es_ES
dc.description.references	Insa-Cabrera J, Dowe DL, España-Cubillo S, Hernández-Lloreda MV, Hernández-Orallo J (2011a) Comparing humans and ai agents. In: Schmidhuber J, Thórisson K, Looks M (eds) Artificial general intelligence, LNAI, vol 6830. Springer, New York, pp 122–132	es_ES
dc.description.references	Insa-Cabrera J, Dowe DL, Hernández-Orallo J (2011) Evaluating a reinforcement learning algorithm with a general intelligence test. In: Lozano JA, Gamez JM (eds) Current topics in artificial intelligence. CAEPIA 2011, LNAI series 7023. Springer, New York	es_ES
dc.description.references	Insa-Cabrera J, Benacloch-Ayuso JL, Hernández-Orallo J (2012) On measuring social intelligence: experiments on competition and cooperation. In: Bach J, Goertzel B, Iklé M (eds) AGI, lecture notes in computer science, vol 7716. Springer, New York, pp 126–135	es_ES
dc.description.references	Jacoff A, Messina E, Weiss BA, Tadokoro S, Nakagawa Y (2003) Test arenas and performance metrics for urban search and rescue robots. In: Proceedings of 2003 IEEE/RSJ international conference on intelligent robots and systems, 2003 (IROS 2003), IEEE, vol 4, pp 3396–3403	es_ES
dc.description.references	Japkowicz N, Shah M (2011) Evaluating learning algorithms. Cambridge University Press, Cambridge	es_ES
dc.description.references	Jiang J (2008) A literature survey on domain adaptation of statistical classifiers. http://sifaka.cs.uiuc.edu/jiang4/domain_adaptation/survey	es_ES
dc.description.references	Johnson M, Hofmann K, Hutton T, Bignell D (2016) The Malmo platform for artificial intelligence experimentation. In: International joint conference on artificial intelligence (IJCAI)	es_ES
dc.description.references	Keith TZ, Reynolds MR (2010) Cattell–Horn–Carroll abilities and cognitive tests: what we’ve learned from 20 years of research. Psychol Schools 47(7):635–650	es_ES
dc.description.references	Ketter W, Symeonidis A (2012) Competitive benchmarking: lessons learned from the trading agent competition. AI Mag 33(2):103	es_ES
dc.description.references	Khreich W, Granger E, Miri A, Sabourin R (2012) A survey of techniques for incremental learning of HMM parameters. Inf Sci 197:105–130	es_ES
dc.description.references	Kim JH (2004) Soccer robotics, vol 11. Springer, New York	es_ES
dc.description.references	Kitano H, Asada M, Kuniyoshi Y, Noda I, Osawa E (1997) Robocup: the robot world cup initiative. In: Proceedings of the first international conference on autonomous agents, ACM, pp 340–347	es_ES
dc.description.references	Kleiner K (2011) Who are you calling bird-brained? An attempt is being made to devise a universal intelligence test. Economist 398(8723, 5 March 2011):82	es_ES
dc.description.references	Knuth DE (1973) Sorting and searching, volume 3 of the art of computer programming. Addison-Wesley, Reading	es_ES
dc.description.references	Koza JR (2010) Human-competitive results produced by genetic programming. Genet Program Evolvable Mach 11(3–4):251–284	es_ES
dc.description.references	Krueger J, Osherson D (1980) On the psychology of structural simplicity. In: Jusczyk PW, Klein RM (eds) The nature of thought: essays in honor of D. O. Hebb. Psychology Press, London, pp 187–205	es_ES
dc.description.references	Langford J (2005) Clever methods of overfitting. Machine Learning (Theory). http://hunch.net	es_ES
dc.description.references	Langley P (1987) Research papers in machine learning. Mach Learn 2(3):195–198	es_ES
dc.description.references	Langley P (2011) The changing science of machine learning. Mach Learn 82(3):275–279	es_ES
dc.description.references	Langley P (2012) The cognitive systems paradigm. Adv Cogn Syst 1:3–13	es_ES
dc.description.references	Lattimore T, Hutter M (2013) No free lunch versus Occam’s razor in supervised learning. Algorithmic Probability and Friends. Springer, Bayesian Prediction and Artificial Intelligence, pp 223–235	es_ES
dc.description.references	Leeuwenberg ELJ, Van Der Helm PA (2012) Structural information theory: the simplicity of visual form. Cambridge University Press, Cambridge	es_ES
dc.description.references	Legg S, Hutter M (2007a) Tests of machine intelligence. In: Lungarella M, Iida F, Bongard J, Pfeifer R (eds) 50 Years of Artificial Intelligence, Lecture Notes in Computer Science, vol 4850, Springer Berlin Heidelberg, pp 232–242. doi: 10.1007/978-3-540-77296-5_22	es_ES
dc.description.references	Legg S, Hutter M (2007b) Universal intelligence: a definition of machine intelligence. Minds Mach 17(4):391–444	es_ES
dc.description.references	Legg S, Veness J (2013) An approximation of the universal intelligence measure. Algorithmic Probability and Friends. Springer, Bayesian Prediction and Artificial Intelligence, pp 236–249	es_ES
dc.description.references	Levesque HJ (2014) On our best behaviour. Artif Intell 212:27–35	es_ES
dc.description.references	Levesque HJ, Davis E, Morgenstern L (2012) The winograd schema challenge. In: Proceedings of the thirteenth international conference on the principles of knowledge representation and reasoning, pp 552–561	es_ES
dc.description.references	Levin LA (1973) Universal sequential search problems. Prob Inf Transm 9(3):265–266	es_ES
dc.description.references	Levin LA (1986) Average case complete problems. SIAM J Comput 15:285–286	es_ES
dc.description.references	Levin LA (2013) Universal heuristics: how do humans solve unsolvable problems? In: Dowe DL (ed) Algorithmic probability and friends. Bayesian prediction and artificial intelligence, lecture notes in computer science, vol 7070. Springer, New York, pp 53–54	es_ES
dc.description.references	Li M, Vitányi P (2008) An introduction to Kolmogorov complexity and its applications, 3rd edn. Springer, New York	es_ES
dc.description.references	Livingstone D (2006) Turing’s test and believable AI in games. Comput Entertain CIE 4(1):6	es_ES
dc.description.references	Llargues-Asensio JM, Peralta J, Arrabales R, González-Bedía M, Cortez P, López-Peña AL (2014) Artificial intelligence approaches for the generation and assessment of believable human-like behaviour in virtual characters. Expert Systems with Applications	es_ES
dc.description.references	Long D, Fox M (2003) The 3rd international planning competition: results and analysis. J Artif Intell Res JAIR 20:1–59	es_ES
dc.description.references	Lord FM (1980) Applications of item response theory to practical testing problems. Erlbaum, Mahwah	es_ES
dc.description.references	Macià N, Bernadó-Mansilla E (2014) Towards UCI+: a mindful repository design. Inf Sci 261:237–262	es_ES
dc.description.references	Madhavan R, Tunstel E, Messina E (2009) Performance evaluation and benchmarking of intelligent systems. Springer, New York	es_ES
dc.description.references	Mahoney MV (1999) Text compression as a test for artificial intelligence. In: Proceedings of the national conference on artificial intelligence, AAAI, p 970	es_ES
dc.description.references	Marché C, Zantema H (2007) The termination competition. In: Term rewriting and applications, Springer, pp 303–313	es_ES
dc.description.references	Marcus G, Rossi F, Veloso M (2016) Beyond the Turing test (special issue). AI Mag 37(1):3–101	es_ES
dc.description.references	Masum H, Christensen S (2003) The turing ratio: a framework for open-ended task metrics. J Evol Technol	es_ES
dc.description.references	Masum H, Christensen S, Oppacher F (2002) The turing ratio: metrics for open-ended tasks. In: GECCO, Citeseer, pp 973–980	es_ES
dc.description.references	McCarthy J (2007) What is artificial intelligence. Technical report, Stanford University. http://www-formal.stanford.edu/jmc/whatisai.html	es_ES
dc.description.references	McCorduck P (2004) Machines who think. A K Peters/CRC Press, Boca Raton	es_ES
dc.description.references	McDermott J, White DR, Luke S, Manzoni L, Castelli M, Vanneschi L, Jaśkowski W, Krawiec K, Harper R, Jong KD, O’Reilly UM (2012) Genetic programming needs better benchmarks. In: Proceedings of the 14th international conference on Genetic and evolutionary computation conference. ACM, Philadelphia, pp 791–798	es_ES
dc.description.references	McGuigan M (2006) Graphics Turing Test. arXiv preprint arXiv:cs/0603132	es_ES
dc.description.references	Melkikh AV (2014) The no free lunch theorem and hypothesis of instinctive animal behavior. Artif Intell Res 3(4):p43	es_ES
dc.description.references	Mellenbergh GJ (1994) Generalized linear item response theory. Psychol Bull 115(2):300	es_ES
dc.description.references	Mesnil G, Dauphin Y, Glorot X, Rifai S, Bengio Y, Goodfellow IJ, Lavoie E, Muller X, Desjardins G, Warde-Farley D, et al (2012) Unsupervised and transfer learning challenge: a deep learning approach. JMLR: Workshop and Conference Proceedings, 2012 ICML Workshop on Unsupervised and Transfer Learning vol 27, pp 97–110	es_ES
dc.description.references	Messina E, Meystel A, Reeker L (2001) PerMIS 2001, white paper. In: Meystel AM, Messina ER (eds) Measuring the performance and intelligence of systems: proceedings of the 2001 PerMIS Workshop, September 4, 2001, National Institute of Standards and Technology (NIST) Special Publication 982. Gaithersburg, pp 3–15	es_ES
dc.description.references	Meystel A (2000) Permis 2000 white paper: measuring performance and intelligence of systems with autonomy. In: Meystel AM, Messina ER (eds) Measuring the performance and intelligence of systems: proceedings of the 2000 PerMIS Workshop, August 14–16, 2000, National Institute of Standards and Technology (NIST) Special Publication 970. Gaithersburg, pp 1–34	es_ES
dc.description.references	Meystel A, Albus J, Messina E, Leedom D (2003a) Performance measures for intelligent systems: measures of technology readiness. Technical report, DTIC Document	es_ES
dc.description.references	Meystel A, Albus J, Messina E, Leedom D (2003) Permis 2003 white paper: performance measures for intelligent systems—measures of technology readiness. In: Meystel AM, Messina ER (eds) Measuring the performance and intelligence of systems: proceedings of the 2003 PerMIS Workshop, National Institute of Standards and Technology (NIST) Special Publication 1014. Gaithersburg	es_ES
dc.description.references	Minsky ML (ed) (1968) Semantic information processing. MIT Press, Cambridge	es_ES
dc.description.references	Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533	es_ES
dc.description.references	Morgenstern L, Davis E, Ortiz-Jr CL (2016) Planning, executing, and evaluating the Winograd schema challenge. AI Mag 37(1):50–54	es_ES
dc.description.references	Mueller S, Jones M, Minnery B, Hiland JM (2007) The bica cognitive decathlon: a test suite for biologically-inspired cognitive agents. In: Proceedings of behavior representation in modeling and simulation conference, Norfolk	es_ES
dc.description.references	Mueller ST (2010) A partial implementation of the BICA cognitive decathlon using the psychology experiment building language (PEBL). Int J Mach Conscious 2(02):273–288	es_ES
dc.description.references	Mueller ST, Minnery BS (2008) Adapting the Turing Test for embodied neurocognitive evaluation of biologically-inspired cognitive agents. In: Proceedings of 2008 AAAI fall symposium on biologically inspired cognitive architectures	es_ES
dc.description.references	Newell A (1973) You can’t play 20 questions with nature and win: projective comments on the papers of this symposium. In: Chase W (ed) Vis Inf Process. Academic Press, New York, pp 283–308	es_ES
dc.description.references	Newell A (1980) Physical symbol systems. Cogn Sci 4(2):135–183	es_ES
dc.description.references	Newell A (1990) Unified theories of cognition. Harvard University, Cambridge	es_ES
dc.description.references	Newell A, Simon HA (1976) Computer science as empirical inquiry: symbols and search. Commun ACM 19(3):113–126	es_ES
dc.description.references	Nizamani AR (2015) Reasoning with bounded cognitive resources. Ph.D. thesis, Department of Applied Information Technology, Chalmers University of Technology & University of Gothenburg, Sweden	es_ES
dc.description.references	Oppy G, Dowe DL (2011) The Turing Test. In: Zalta EN (ed) Stanford Encyclopedia of Philosophy, Stanford University. http://plato.stanford.edu/entries/turing-test/	es_ES
dc.description.references	Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359	es_ES
dc.description.references	Perez D, Samothrakis S, Togelius J, Schaul T, Lucas S, Couëtoux A, Lee J, Lim CU, Thompson T (2015) The 2014 general video game playing competition. IEEE Transactions on Computational Intelligence and AI in Games	es_ES
dc.description.references	Potthast M, Hagen M, Gollub T, Tippmann M, Kiesel J, Rosso P, Stamatatos E, Stein B (2013) Overview of the 5th international competition on plagiarism detection. CLEF (2013) Evaluation labs and workshop working notes papers, pp 23–26 September. Valencia, Spain	es_ES
dc.description.references	Proudfoot D (2011) Anthropomorphism and AI: Turing’s much misunderstood imitation game. Artif Intell 175(5):950–957	es_ES
dc.description.references	Quinn AJ, Bederson BB (2011) Human computation: a survey and taxonomy of a growing field. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ACM, pp 1403–1412	es_ES
dc.description.references	Rajani S (2011) Artificial intelligence—man or machine. Int J Inf Technol 4(1):173–176	es_ES
dc.description.references	Rao RB, Fung G, Rosales R (2008) On the dangers of cross-validation. an experimental evaluation. In: SDM, SIAM, pp 588–596	es_ES
dc.description.references	Rohrer B (2010) Accelerating progress in artificial general intelligence: choosing a benchmark for natural world interaction. J Artif Gen Intell 2(1):1–28	es_ES
dc.description.references	Rothenberg J, Paul J, Kameny I, Kipps JR, Swenson M (1987) Evaluating expert system tools: a framework and methodology-workshops. Technical report, DTIC Document	es_ES
dc.description.references	Russell S, Norvig P (2009) Artificial intelligence: a modern approach. Prentice Hall, Upper Saddle River	es_ES
dc.description.references	Sanghi P, Dowe DL (2003) A computer program capable of passing IQ tests. In: 4th international conference on cognitive science (ICCS’03), Sydney, pp 570–575	es_ES
dc.description.references	Schaeffer J, Burch N, Bjornsson Y, Kishimoto A, Muller M, Lake R, Lu P, Sutphen S (2007) Checkers is solved. Science 317(5844):1518	es_ES
dc.description.references	Schaie KW (2010) Primary mental abilities. Corsini Encyclopedia of Psychology	es_ES
dc.description.references	Schaul T (2014) An extensible description language for video games. IEEE Trans Comput Intell AI Games PP(99):1–1. doi: 10.1109/TCIAIG.2014.2352795	es_ES
dc.description.references	Schenck C (2013) Intelligence tests for robots: Solving perceptual reasoning tasks with a humanoid robot. Master’s thesis, Iowa State University	es_ES
dc.description.references	Schlenoff C, Scott H, Balakirsky S (2011) Performance evaluation of intelligent systems at the National Institute of Standards and Technology (NIST). Technical report, DTIC Document	es_ES
dc.description.references	Schmid U, Ragni M (2015) Comparing computer models solving number series problems. In: Artificial general intelligence. Springer, pp 352–361	es_ES
dc.description.references	Schweizer P (1998) The truly total Turing test. Minds Mach 8(2):263–272	es_ES
dc.description.references	Searle JR (1980) Minds, brains, and programs. Behav Brain Sci 3:417–457	es_ES
dc.description.references	Seber GAF, Salehi MM (2013) Adaptive cluster sampling. In: Adaptive sampling designs. Springer, pp 11–26	es_ES
dc.description.references	Settles B (2012) Active learning. Synth Lect Artif Intell Mach Learn 6(1):1–114	es_ES
dc.description.references	Shettleworth SJ (2010) Cognition, evolution, and behavior. Oxford University Press, Oxford	es_ES
dc.description.references	Shettleworth SJ, Bloom P, Nadel L (2013) Fundamentals of comparative cognition. Oxford University Press, Oxford	es_ES
dc.description.references	Shieber SM (2016) Principles for designing an AI competition, or why the Turing test fails as an inducement prize. AI Mag 37(1):91–96	es_ES
dc.description.references	Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484–489	es_ES
dc.description.references	Simmons R (2000) Survivability and competence as measures of intelligent systems. In: Meystel AM, Messina ER (eds) Measuring the performance and intelligence of systems: proceedings of the 2000 PerMIS Workshop, August 14–16, 2000, National Institute of Standards and Technology (NIST) Special Publication 970. Gaithersburg, pp 162–163	es_ES
dc.description.references	Simon HA (1995) Artificial intelligence: an empirical science. Artif Intell 77(1):95–127	es_ES
dc.description.references	Sloman A, Scheutz M (2002) A framework for comparing agent architectures. Proceedings of UKCI 2	es_ES
dc.description.references	Smith WD (2002) Rating systems for gameplayers, and learning. NEC, Princeton, NJ, Technical report, pp 93–104	es_ES
dc.description.references	Smith WD (2006) Mathematical definition of “intelligence” (and consequences). Unpublished report	es_ES
dc.description.references	Soares C (2009) UCI++: improved support for algorithm selection using datasetoids. In: Advances in knowledge discovery and data mining. Springer, pp 499–506	es_ES
dc.description.references	Solomonoff R (1996) Does algorithmic probability solve the problem of induction. Inf Stat Induction Sci 7–8	es_ES
dc.description.references	Solomonoff RJ (1964) A formal theory of inductive inference. Part I. Inf Control 7(1):1–22	es_ES
dc.description.references	Solomonoff RJ (1984) Optimum sequential search. Oxbridge Research, Cambridge. http://world.std.com/~rjs/optseq.pdf	es_ES
dc.description.references	Srinivasan R (2002) Importance sampling: applications in communications and detection. Springer, New York	es_ES
dc.description.references	Starkie B, van Zaanen M, Estival D (2006) The Tenjinno machine translation competition. In: Grammatical inference: algorithms and applications. Springer, pp 214–226	es_ES
dc.description.references	Sternberg RJ (ed) (2000) Handbook of intelligence. Cambridge University Press, Cambridge	es_ES
dc.description.references	Strannegård C, Amirghasemi M, Ulfsbücker S (2013a) An anthropomorphic method for number sequence problems. Cogn Syst Res 22–23:27–34	es_ES
dc.description.references	Strannegård C, Nizamani A, Sjöberg A, Engström F (2013b) Bounded Kolmogorov complexity based on cognitive models. In: Kühnberger KU, Rudolph S, Wang P (eds) Artificial general intelligence. Lecture notes in computer science, vol 7999. Springer, Berlin Heidelberg, pp 130–139	es_ES
dc.description.references	Strickler RE (1973) Change in selected characteristics of students between ninth and twelfth grade as related to high school curriculum	es_ES
dc.description.references	Sturtevant N (2012) Benchmarks for grid-based pathfinding. Trans Comput Intell AI Games 4(2):144–148. http://web.cs.du.edu/~sturtevant/papers/benchmarks.pdf	es_ES
dc.description.references	Sutcliffe G (2009) The TPTP problem library and associated infrastructure: the FOF and CNF Parts, v3.5.0. J Autom Reason 43(4):337–362	es_ES
dc.description.references	Sutcliffe G, Suttner C (2006) The state of CASC. AI Commun 19(1):35–48	es_ES
dc.description.references	Thrun S (1996) Is learning the n-th thing any easier than learning the first? In: Advances in neural information processing systems, pp 640–646	es_ES
dc.description.references	Thrun S, Pratt L (2012) Learning to learn. Springer, New York	es_ES
dc.description.references	Thurstone LL (1938a) Primary mental abilities. Psychometric monographs	es_ES
dc.description.references	Thurstone LL (1938b) Primary mental abilities. Psychometric monographs	es_ES
dc.description.references	Togelius J, Yannakakis GN, Karakovskiy S, Shaker N (2012) Assessing believability. In: Believable bots, Springer, pp 215–230	es_ES
dc.description.references	Torrey L, Shavlik J (2009) Transfer learning. Handb Res Mach Learn Appl 3:17–35	es_ES
dc.description.references	Turing AM (1950) Computing machinery and intelligence. Mind 59:433–460	es_ES
dc.description.references	Valiant LG (1984) A theory of the learnable. Commun ACM 27(11):1134–1142	es_ES
dc.description.references	Vallati M, Chrpa L, Grzes M, McCluskey TL, Roberts M, Sanner S (2015) The 2014 international planning competition: progress and trends. AI Mag 36(3):90–98	es_ES
dc.description.references	van Rijn JN, Bischl B, Torgo L, Gao B, Umaashankar V, Fischer S, Winter P, Wiswedel B, Berthold MR, Vanschoren J (2013) Openml: a collaborative science platform. In: Machine learning and knowledge discovery in databases. Springer, pp 645–649	es_ES
dc.description.references	Vanschoren J, Blockeel H, Pfahringer B, Holmes G (2012) Experiment databases. Mach Learn 87(2):127–158	es_ES
dc.description.references	Vanschoren J, van Rijn JN, Bischl B, Torgo L (2014) Openml: networked science in machine learning. ACM SIGKDD Explor Newsl 15(2):49–60	es_ES
dc.description.references	Vázquez D, López AM, Marín J, Ponsa D, Gerónimo D (2014) Virtual and real world adaptation for pedestrian detection. IEEE Trans Pattern Anal Mach Intell 36(4):797–809. doi: 10.1109/TPAMI.2013.163	es_ES
dc.description.references	Vere SA (1992) A cognitive process shell. Behav Brain Sci 15(03):460–461	es_ES
dc.description.references	von Ahn L (2009) Human computation. In: Design automation conference, 2009. DAC’09. 46th ACM/IEEE, IEEE, pp 418–419	es_ES
dc.description.references	von Ahn L, Blum M, Langford J (2004) Telling humans and computers apart automatically. Commun ACM 47(2):56–60	es_ES
dc.description.references	von Ahn L, Maurer B, McMillen C, Abraham D, Blum M (2008) RECAPTCHA: human-based character recognition via web security measures. Science 321(5895):1465	es_ES
dc.description.references	Wallace CS, Boulton DM (1968) An information measure for classification. Comput J 11(2):185–194	es_ES
dc.description.references	Wallace CS, Dowe DL (1999) Minimum message length and Kolmogorov complexity. Comput J 42(4):270–283 (special issue on Kolmogorov complexity)	es_ES
dc.description.references	Wang G, Mohanlal M, Wilson C, Wang X, Metzger M, Zheng H, Zhao BY (2012) Social Turing tests: crowdsourcing sybil detection. arXiv preprint arXiv:1205.3856	es_ES
dc.description.references	Wang P (2010) The evaluation of agi systems. In: Proceedings of the third conference on artificial general intelligence, Citeseer, pp 164–169	es_ES
dc.description.references	Warwick K (2014) Turing Test success marks milestone in computing history. University or Reading Press Release,	es_ES
dc.description.references	Wasserman EA, Zentall TR (2006) Comparative cognition: Experimental explorations of animal intelligence. Oxford University Press, Oxford	es_ES
dc.description.references	Watkins CJCH, Dayan P (1992) Q-learning. Mach Learn 8(3):279–292	es_ES
dc.description.references	Weiss DJ (2011) Better data from better measurements using computerized adaptive testing. J Methods Meas Soc Sci 2(1):1–27	es_ES
dc.description.references	Weizenbaum J (1966) ELIZA—a computer program for the study of natural language communication between man and machine. Commun ACM 9(1):36–45	es_ES
dc.description.references	Wellman M, Reeves D, Lochner K, Vorobeychik Y (2004) Price prediction in a trading agent competition. J Artif Intell Res JAIR 21:19–36	es_ES
dc.description.references	White DR, McDermott J, Castelli M, Manzoni L, Goldman BW, Kronberger G, Jaśkowski W, O’Reilly UM, Luke S (2013) Better GP benchmarks: community survey results and proposals. Genet Program Evolvable Mach 14:3–29. doi: 10.1007/s10710-012-9177-2	es_ES
dc.description.references	Whiteson S, Tanner B, White A (2010) The reinforcement learning competitions. AI Mag 31(2):81–94	es_ES
dc.description.references	Whiteson S, Tanner B, Taylor ME, Stone P (2011) Protecting against evaluation overfitting in empirical reinforcement learning. In: 2011 IEEE symposium on adaptive dynamic programming and reinforcement learning (ADPRL), IEEE, pp 120–127	es_ES
dc.description.references	Williams PL, Beer RD (2010) Information dynamics of evolved agents. In: From animals to animats 11, Springer, pp 38–49	es_ES
dc.description.references	Winikoff M, Cranefield S (2014) On the testability of bdi agent systems. J Artif Intell Res JAIR 51:71–131	es_ES
dc.description.references	Wolpert DH (1996) The lack of a priori distinctions between learning algorithms. Neural Comput 8(7):1341–1390	es_ES
dc.description.references	Wolpert DH (2012) What the no free lunch theorems really mean; how to improve search algorithms. Technical report, Santa fe Institute Working Paper	es_ES
dc.description.references	Wolpert DH, Macready WG (1995) No free lunch theorems for search. Technical report SFI-TR-95-02-010 (Santa Fe Institute)	es_ES
dc.description.references	Wolpert DH, Macready WG (2005) Coevolutionary free lunches. IEEE Trans Evol Comput 9(6):721–735	es_ES
dc.description.references	Yampolskiy RV (2015) Artificial superintelligence: a futuristic approach. CRC Press, Boca Raton	es_ES
dc.description.references	Yonck R (2012) Toward a standard metric of machine intelligence. World Future Rev 4(2):61–70	es_ES
dc.description.references	You J (2015) Beyond the turing test. Science 347(6218):116–116	es_ES
dc.description.references	Zatuchna Z, Bagnall A (2009) Learning mazes with aliasing states: an LCS algorithm with associative perception. Adapt Behav 17(1):28–57	es_ES
dc.description.references	Zhou ZH (2012) Ensemble methods: foundations and algorithms. CRC Press, Boca Raton	es_ES

Este ítem aparece en la(s) siguiente(s) colección(ones)

Artículos, conferencias, monografías [48360]

Mostrar el registro sencillo del ítem

Evaluation in artificial intelligence: From task-oriented to ability-oriented measurement

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Buscar en RiuNet

Listar

Todo RiuNet

Esta colección

Mi cuenta

Estadísticas

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Evaluation in artificial intelligence: From task-oriented to ability-oriented measurement

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)