- -

Twenty Years Beyond the Turing Test: Moving Beyond the Human Judges Too

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

Twenty Years Beyond the Turing Test: Moving Beyond the Human Judges Too

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Hernández-Orallo, José es_ES
dc.date.accessioned 2021-09-14T03:33:33Z
dc.date.available 2021-09-14T03:33:33Z
dc.date.issued 2020-12 es_ES
dc.identifier.issn 0924-6495 es_ES
dc.identifier.uri http://hdl.handle.net/10251/172312
dc.description.abstract [EN] In the last 20 years the Turing test has been left further behind by new developments in artificial intelligence. At the same time, however, these developments have revived some key elements of the Turing test: imitation and adversarialness. On the one hand, many generative models, such as generative adversarial networks (GAN), build imitators under an adversarial setting that strongly resembles the Turing test (with the judge being a learnt discriminative model). The term "Turing learning" has been used for this kind of setting. On the other hand, AI benchmarks are suffering an adversarial situation too, with a 'challenge-solve-and-replace' evaluation dynamics whenever human performance is 'imitated'. The particular AI community rushes to replace the old benchmark by a more challenging benchmark, one for which human performance would still be beyond AI. These two phenomena related to the Turing test are sufficiently distinctive, important and general for a detailed analysis. This is the main goal of this paper. After recognising the abyss that appears beyond superhuman performance, we build on Turing learning to identify two different evaluation schemas: Turing testing and adversarial testing. We revisit some of the key questions surrounding the Turing test, such as 'understanding', commonsense reasoning and extracting meaning from the world, and explore how the new testing paradigms should work to unmask the limitations of current and future AI. Finally, we discuss how behavioural similarity metrics could be used to create taxonomies for artificial and natural intelligence. Both testing schemas should complete a transition in which humans should give way to machines-not only as references to be imitated but also as judges-when pursuing and measuring machine intelligence. es_ES
dc.description.sponsorship I appreciate the reviewers' comments, leading to new Sect. 5, among other modifications and insights in the final version. This work was funded by the Future of Life Institute, FLI, under grant RFP2-152, and also supported by the EU (FEDER) and Spanish MINECO under RTI2018-094403B-C32, and Generalitat Valenciana under PROMETEO/2019/098. Figure 1 was kindly generated on purpose by Fernando Martinez-Plumed. es_ES
dc.language Inglés es_ES
dc.publisher Springer-Verlag es_ES
dc.relation.ispartof Minds and Machines es_ES
dc.rights Reserva de todos los derechos es_ES
dc.subject Turing test es_ES
dc.subject Turing learning es_ES
dc.subject Imitation es_ES
dc.subject Adversarial models es_ES
dc.subject Intelligence evaluation es_ES
dc.subject.classification LENGUAJES Y SISTEMAS INFORMATICOS es_ES
dc.title Twenty Years Beyond the Turing Test: Moving Beyond the Human Judges Too es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.1007/s11023-020-09549-0 es_ES
dc.relation.projectID info:eu-repo/grantAgreement/FLI//RFP2-152/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/GVA//PROMETEO%2F2019%2F098/ES/DeepTrust: Deep Logic Technology for Software Trustworthiness/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/RTI2018-094403-B-C32/ES/RAZONAMIENTO FORMAL PARA TECNOLOGIAS FACILITADORAS Y EMERGENTES/ es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació es_ES
dc.description.bibliographicCitation Hernández-Orallo, J. (2020). Twenty Years Beyond the Turing Test: Moving Beyond the Human Judges Too. Minds and Machines. 30(4):533-562. https://doi.org/10.1007/s11023-020-09549-0 es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion https://doi.org/10.1007/s11023-020-09549-0 es_ES
dc.description.upvformatpinicio 533 es_ES
dc.description.upvformatpfin 562 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 30 es_ES
dc.description.issue 4 es_ES
dc.relation.pasarela S\431825 es_ES
dc.contributor.funder Generalitat Valenciana es_ES
dc.contributor.funder Future of Life Institute es_ES
dc.contributor.funder Agencia Estatal de Investigación es_ES
dc.contributor.funder European Regional Development Fund es_ES
dc.description.references Adiwardana, D., Luong, M. T., So, D. R., Hall, J., Fiedel, N., Thoppilan, R., Yang, Z., Kulshreshtha, A., Nemade, G., Lu, Y., et al. (2020) Towards a human-like open-domain chatbot. arXiv:200109977. es_ES
dc.description.references Alvarado, N., Adams, S. S., Burbeck, S., & Latta, C. (2002). Beyond the Turing test: Performance metrics for evaluating a computer simulation of the human mind. In The 2nd international conference on development and learning, 2002 (pp. 147–152). IEEE. es_ES
dc.description.references Arel, I., & Livingston, S. (2009). Beyond the Turing test. Computer, 42(3), 90–91. es_ES
dc.description.references Armstrong, S., & Sotala, K. (2015). How we’re predicting AI–or failing to. In Beyond artificial intelligence(pp. 11–29). New York: Springer. es_ES
dc.description.references Arora, S., Ge, R., Liang, Y., Ma, T., & Zhang, Y. (2017). Generalization and equilibrium in generative adversarial nets (GANS). In Proceedings of the 34th international conference on machine learning (Vol. 70, pp. 224–232). JMLR. org. es_ES
dc.description.references Bhatnagar, S., et al. (2017). Mapping intelligence: Requirements and possibilities. In PTAI (pp. 117–135). New York: Springer. es_ES
dc.description.references Bongard, M. M. (1970). Pattern Recognition. New York: Spartan Books. es_ES
dc.description.references Borg, M., Johansen, S. S., Thomsen, D. L., & Kraus, M. (2012). Practical implementation of a graphics Turing test. In Advances in visual computing (pp. 305–313). New York: Springer. es_ES
dc.description.references Bostrom, N. (2014). Superintelligence: Paths, dangers, strategies. Oxford: Oxford University Press. es_ES
dc.description.references Brock, A., Donahue, J., & Simonyan, K. (2018). Large scale GAN training for high fidelity natural image synthesis. arXiv:180911096. es_ES
dc.description.references Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al. (2020). Language models are few-shot learners. arXiv:200514165. es_ES
dc.description.references Burkart, J. M., Schubiger, M. N., & van Schaik, C. P. (2017). The evolution of general intelligence. Behavioral and Brain Sciences, 40, e195. es_ES
dc.description.references Burr, C., & Cristianini, N. (2019). Can machines read our minds? Minds and Machines, 29(3), 461–494. es_ES
dc.description.references Campbell, M., Hoane, A. J., & Hsu, F. (2002). Deep Blue. Artificial Intelligence, 134(1–2), 57–83. es_ES
dc.description.references Chollet, F. (2019). The measure of intelligence. arXiv:191101547. es_ES
dc.description.references Cohen, P. R. (2005). If not Turing’s test, then what? AI Magazine, 26(4), 61. es_ES
dc.description.references Copeland, B. J. (2000). The Turing test. Minds and Machines, 10(4), 519–539. es_ES
dc.description.references Copeland, J., & Proudfoot, D. (2008). Turing’s test. A philosophical and historical guide. In R. Epstein, G. Roberts, G. Beber (Eds.), Parsing the Turing Test. Philosophical and Methodological Issues in the Quest for the Thinking Computer. New York: Springer. es_ES
dc.description.references Crosby, M., Beyret, B., Shanahan, M., Hernandez-Orallo, J., Cheke, L., & Halina, M. (2020). The animal-AI testbed and competition. Proceedings of Machine Learning Research, 123, 164–176. es_ES
dc.description.references Crosby, M., Beyret, B., Hernandez-Orallo, J., Cheke, L., Halina, M., & Shanahan, M. (2019). Translating from animal cognition to AI. NeurIPS workshop on biological and artificial reinforcement learning. es_ES
dc.description.references Davis, E., & Marcus, G. (2015). Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM, 58(9), 92–103. es_ES
dc.description.references Dennett, D. C. (1971). Intentional systems. The Journal of Philosophy, 68, 87–106. es_ES
dc.description.references Dodge, S., & Karam, L. (2017). A study and comparison of human and deep learning recognition performance under visual distortions. In ICCCN (pp. 1–7). IEEE. es_ES
dc.description.references Dowe, D. L., & Hernández-Orallo, J. (2012). IQ tests are not for machines, yet. Intelligence, 40(2), 77–81. es_ES
dc.description.references Dowe, D. L., & Hernández-Orallo, J. (2014). How universal can an intelligence test be? Adaptive Behavior, 22(1), 51–69. es_ES
dc.description.references Dowe, D. L., Hernández-Orallo, J., & Das, P. K. (2011). Compression and intelligence: Social environments and communication. In J. Schmidhuber, K. Thórisson, & M. Looks (Eds.), Artificial general intelligence (Vol. 6830, pp. 204–211)., LNAI series New York: Springer. es_ES
dc.description.references Dowe, D. L., Hajek, A. R. (1997). A computational extension to the Turing test. In Proceedings of the 4th Conference of the Australasian Cognitive Science Society, University of Newcastle, NSW, Australia. Also as Technical Report #97/322, Dept Computer Science, Monash University, Australia. es_ES
dc.description.references Dowe, D. L., Hajek, A. R. (1998). A non-behavioural, computational extension to the Turing Test. In Intl. conf. on computational intelligence & multimedia applications (ICCIMA’98) (pp. 101–106). Gippsland, Australia. es_ES
dc.description.references Fabra-Boluda, R., Ferri, C., Martínez-Plumed, F., Hernández-Orallo, J., & Ramírez-Quintana, M. J. (2020). Family and prejudice: A behavioural taxonomy of machine learning techniques. In ECAI 2020—24st European conference on artificial intelligence. es_ES
dc.description.references Flach, P. (2019). Performance evaluation in machine learning: The good, the bad, the ugly and the way forward. In AAAI. es_ES
dc.description.references Fostel, G. (1993). The Turing test is for the birds. ACM SIGART Bulletin, 4(1), 7–8. es_ES
dc.description.references French, R. M. (1990). Subcognition and the limits of the Turing test. Mind, 99(393), 53–65. es_ES
dc.description.references French, R. M. (2000). The Turing test: The first 50 years. Trends in Cognitive Sciences, 4(3), 115–122. es_ES
dc.description.references French, R. M. (2012). Moving beyond the Turing test. Communications of the ACM, 55(12), 74–77. https://doi.org/10.1145/2380656.2380674. es_ES
dc.description.references Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. Cambridge: MIT press. es_ES
dc.description.references Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014a). Generative adversarial nets. In Advances in neural information processing systems (pp 2672–2680). es_ES
dc.description.references Goodfellow, I. J., Shlens, J., & Szegedy, C. (2014b). Explaining and harnessing adversarial examples. arXiv:14126572. es_ES
dc.description.references Groß, R., Gu, Y., Li, W., & Gauci, M. (2017). Generalizing GANs: A Turing perspective. In Advances in neural information processing systems (pp. 6316–6326). es_ES
dc.description.references Gunning, D. (2018). Machine common sense concept paper. arXiv:181007528. es_ES
dc.description.references Harnad, S. (1992). The Turing test is not a trick: Turing indistinguishability is a scientific criterion. ACM SIGART Bulletin, 3(4), 9–10. es_ES
dc.description.references Hayes, P., & Ford, K. (1995). Turing test considered harmful. In International joint conference on artificial intelligence (IJCAI) (pp 972–977). es_ES
dc.description.references Hernandez-Orallo, J. (2015). Stochastic tasks: Difficulty and Levin search. In J. Bieger, B. Goertzel, & A. Potapov (Eds.), Artificial general intelligence—8th international conference, AGI 2015, Berlin, Germany, July 22–25, 2015 (pp. 90–100). New York: Springer. es_ES
dc.description.references Hernández-Orallo, J. (2000). Beyond the Turing test. Journal of Logic, Language & Information, 9(4), 447–466. es_ES
dc.description.references Hernández-Orallo, J. (2001). On the computational measurement of intelligence factors (pp. 72–79). Gaithersburg: NIST Special Publication. es_ES
dc.description.references Hernández-Orallo, J. (2015). On environment difficulty and discriminating power. Autonomous Agents and Multi-Agent Systems, 29, 402–454. es_ES
dc.description.references Hernández-Orallo, J. (2017a). Evaluation in artificial intelligence: From task-oriented to ability-oriented measurement. Artificial Intelligence Review, 48(3), 397–447. es_ES
dc.description.references Hernández-Orallo, J. (2017b). The measure of all minds: Evaluating natural and artificial intelligence. Cambridge: Cambridge University Press. es_ES
dc.description.references Hernández-Orallo, J. (2019a). Gazing into clever Hans machines. Nature Machine Intelligence, 1(4), 172–173. es_ES
dc.description.references Hernández-Orallo, J. (2019b). Unbridled mental power. Nature Physics, 15(1), 106. es_ES
dc.description.references Hernández-Orallo, J., & Dowe, D. L. (2010). Measuring universal intelligence: Towards an anytime intelligence test. Artificial Intelligence, 174(18), 1508–1539. es_ES
dc.description.references Hernández-Orallo, J., & Dowe, D. L. (2013). On potential cognitive abilities in the machine kingdom. Minds and Machines, 23(2), 179–210. es_ES
dc.description.references Hernández-Orallo, J., Dowe, D. L., España-Cubillo, S., Hernández-Lloreda, M. V., & Insa-Cabrera, J. (2011). On more realistic environment distributions for defining, evaluating and developing intelligence. In J. Schmidhuber, K. Thórisson, & M. Looks (Eds.), Artificial general intelligence (Vol. 6830, pp. 82–91)., LNAI New York: Springer. es_ES
dc.description.references Hernández-Orallo, J., Dowe, D. L., & Hernández-Lloreda, M. V. (2014). Universal psychometrics: Measuring cognitive abilities in the machine kingdom. Cognitive Systems Research, 27, 50–74. es_ES
dc.description.references Hernández-Orallo, J., Insa-Cabrera, J., Dowe, D. L., & Hibbard, B. (2012). Turing tests with Turing machines. Turing, 10, 140–156. es_ES
dc.description.references Hernández-Orallo, J., Martínez-Plumed, F., Schmid, U., Siebers, M., & Dowe, D. L. (2016). Computer models solving intelligence test problems: Progress and implications. Artificial Intelligence, 230, 74–107. es_ES
dc.description.references Hernández-Orallo, J. (2015). C-tests revisited: Back and forth with complexity. In J. Bieger, B. Goertzel, & A. Potapov (Eds.), Artificial general intelligence—8th international conference, AGI 2015, Berlin, Germany, July 22–25, 2015. New York: Springer (pp. 272–282). es_ES
dc.description.references Hernández-Orallo, J. (2020). AI evaluation: On broken yardsticks and measurement scales. Evaluating AI Evaluation @ AAAI. es_ES
dc.description.references Hernández-Orallo, J., & Minaya-Collado, N. (1998). A formal definition of intelligence based on an intensional variant of Kolmogorov complexity. In Proc. intl symposium of engineering of intelligent systems (EIS’98) (pp. 146–163). ICSC Press. es_ES
dc.description.references Hernández-Orallo, J., & Vold, K. (2019). Ai extenders: The ethical and societal implications of humans cognitively extended by ai. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society (pp. 507–513). es_ES
dc.description.references Hernández-Orallo, J., Insa-Cabrera, J., Dowe, D.L., & Hibbard, B. (2012). Turing machines and recursive Turing Tests. In V. Muller, & A. Ayesh (Eds.), AISB/IACAP 2012 Symposium “Revisiting Turing and his Test”, The Society for the Study of Artificial Intelligence and the Simulation of Behaviour, pp 28–33. es_ES
dc.description.references Hernández-Orallo, J., Baroni, M., Bieger, J., Chmait, N., Dowe, D. L., Hofmann, K., et al. (2017). A new AI evaluation cosmos: Ready to play the game? AI Magazine, 38(3), Fall 2007. es_ES
dc.description.references Hibbard, B. (2008). Adversarial sequence prediction. Frontiers in Artificial Intelligence and Applications, 171, 399. es_ES
dc.description.references Hibbard, B. (2011). Measuring agent intelligence via hierarchies of environments. In Artificial general intelligence (pp. 303–308). New York: Springer. es_ES
dc.description.references Hingston, P. (2009). The 2k botprize. In IEEE symposium on computational intelligence and games (CIG 2009) (pp. 1–1). IEEE. es_ES
dc.description.references Hinton, G. E., & Zemel, R. S. (1994). Autoencoders, minimum description length and Helmholtz free energy. In Advances in neural information processing systems (pp. 3–10). es_ES
dc.description.references Hofstadter, D. R. (1980). Gödel, escher, bach. New York: Vintage Books. es_ES
dc.description.references Hofstadter, D. R., & Mitchell, M. (1994). The Copycat project: A model of mental fluidity and analogy-making. Norwood, NJ: Ablex Publishing. es_ES
dc.description.references Insa-Cabrera, J., Dowe, D. L., España-Cubillo, S., Hernández-Lloreda, M. V., & Hernández-Orallo, J. (2011a). Comparing humans and AI agents. In International conference on artificial general intelligence (pp. 122–132). New York: Springer. es_ES
dc.description.references Insa-Cabrera, J., Dowe, D. L., & Hernández-Orallo, J. (2011b). Evaluating a reinforcement learning algorithm with a general intelligence test. In J. Lozano, J. Gamez, & J. Moreno (Eds.), Current topics in artificial intelligence (CAEPIA 2011). LNAI Series 7023. New York: Springer. es_ES
dc.description.references Jiang, Z., Xu, F. F., Araki, J., & Neubig, G. (2020b). How can we know what language models know? Transactions of the Association for Computational Linguistics, 8, 423–438. es_ES
dc.description.references Jiang, M., Luketina, J., Nardelli, N., Minervini, P., Torr, P. H., Whiteson, S., & Rocktäschel, T. (2020a). Wordcraft: An environment for benchmarking commonsense agents. arXiv:200709185. es_ES
dc.description.references Krizhevsky, A. (2009). Learning multiple layers of features from tiny images. Master’s thesis, University of Toronto, https://www.cs.toronto.edu/~kriz/cifar.html. es_ES
dc.description.references Kynkäänniemi, T., Karras, T., Laine, S., Lehtinen, J., & Aila, T. (2019). Improved precision and recall metric for assessing generative models. arXiv:190406991. es_ES
dc.description.references Legg, S., & Hutter, M. (2007). Universal intelligence: A definition of machine intelligence. Minds and Machines, 17(4), 391–444. es_ES
dc.description.references Leibo, J. Z., et al. (2018). Psychlab: A psychology laboratory for deep reinforcement learning agents. arXiv:180108116. es_ES
dc.description.references Levesque, H. J. (2017). Common sense, the Turing test, and the quest for real AI. New York: MIT Press. es_ES
dc.description.references Levesque, H., Davis, E., & Morgenstern, L. (2012). The Winograd schema challenge. In Thirteenth international conference on the principles of knowledge representation and reasoning. es_ES
dc.description.references Li, W., Gauci, M., & Groß, R. (2016). Turing learning: A metric-free approach to inferring behavior and its application to swarms. Swarm Intelligence, 10(3), 211–243. es_ES
dc.description.references Li, W., Gauci, M., & Groß, R. (2013). A coevolutionary approach to learn animal behavior through controlled interaction. In Proceedings of the 15th annual conference on Genetic and evolutionary computation (pp. 223–230). es_ES
dc.description.references van der Linden, W. J. (2008). Using response times for item selection in adaptive testing. Journal of Educational and Behavioral Statistics, 33(1), 5–20. es_ES
dc.description.references Mahoney, M. V. (1999). Text compression as a test for artificial intelligence. In Proceedings of the national conference on artificial intelligence (pp 970–970). AAAI. es_ES
dc.description.references Marcus, G., Rossi, F., & Veloso, M. (2016). Beyond the Turing test (special issue). AI Magazine, 37(1), 3–101. es_ES
dc.description.references Marcus, G. (2020). The next decade in AI: Four steps towards robust artificial intelligence. arXiv:200206177. es_ES
dc.description.references Marcus, G., Ross, F., & Veloso, M. (2015). Beyond the Turing test. AAAI workshop, http://www.math.unipd.it/~frossi/BeyondTuring2015/. es_ES
dc.description.references Martinez-Plumed, F., & Hernandez-Orallo, J. (2018). Dual indicators to analyse AI benchmarks: Difficulty, discrimination, ability and generality. IEEE Transactions on Games, 12, 121–131. es_ES
dc.description.references Martínez-Plumed, F., Prudêncio, R. B., Martínez-Usó, A., & Hernández-Orallo, J. (2019). Item response theory in AI: Analysing machine learning classifiers at the instance level. Artificial Intelligence, 271, 18–42. es_ES
dc.description.references Martínez-Plumed, F., Gomez, E., & Hernández-Orallo, J. (2020). Tracking AI: The capability is (not) near. In European conference on artificial intelligence. es_ES
dc.description.references Masum, H., Christensen, S., & Oppacher, F. (2002). The Turing ratio: Metrics for open-ended tasks. In Conf. on genetic and evolutionary computation (pp. 973–980). Morgan Kaufmann. es_ES
dc.description.references McCarthy, J. (1983). Artificial intelligence needs more emphasis on basic research: President’s quarterly message. AI Magazine, 4(4), 5. es_ES
dc.description.references McDermott, D. (2007). Level-headed. Artificial Intelligence, 171(18), 1183–1186. es_ES
dc.description.references Mishra, A., Bhattacharyya, P., & Carl, M. (2013). Automatically predicting sentence translation difficulty. In ACL (pp 346–351). es_ES
dc.description.references Mitchell, M. (2019). Artificial intelligence: A guide for thinking humans. UK: Penguin. es_ES
dc.description.references Moor, J. (2003). The Turing test: the elusive standard of artificial intelligence (Vol. 30). New York: Springer Science & Business Media. es_ES
dc.description.references Nie, Y., Williams, A., Dinan, E., Bansal, M., Weston, J., & Kiela, D. (2019). Adversarial nli: A new benchmark for natural language understanding. arXiv:191014599. es_ES
dc.description.references Nilsson, N. J. (2006). Human-level artificial intelligence? Be serious!. AI Magazine, 26(4), 68. es_ES
dc.description.references Oppy, G., & Dowe, D. L. (2011). The turing test. In: Zalta, E. N. (Ed.), Stanford encyclopedia of philosophy, Stanford University. http://plato.stanford.edu/entries/turing-test/. es_ES
dc.description.references Preston, B. (1991). AI, anthropocentrism, and the evolution of ‘intelligence’. Minds and Machines, 1(3), 259–277. es_ES
dc.description.references Proudfoot, D. (2011). Anthropomorphism and AI: Turing’s much misunderstood imitation game. Artificial Intelligence, 175(5), 950–957. es_ES
dc.description.references Proudfoot, D. (2017). The Turing test-from every angle. In J. Bowen, M. Sprevak, R. Wilson, & B. J. Copeland (Eds.), The Turing Guide. Oxford: Oxford University Press. es_ES
dc.description.references Rahwan, I., Cebrian, M., Obradovich, N., Bongard, J., Bonnefon, J. F., Breazeal, C., et al. (2019). Machine behaviour. Nature, 568(7753), 477–486. es_ES
dc.description.references Rajalingham, R., Issa, E. B., Bashivan, P., Kar, K., Schmidt, K., & DiCarlo, J. J. (2018). Large-scale, high-resolution comparison of the core visual object recognition behavior of humans, monkeys, and state-of-the-art deep artificial neural networks. Journal of Neuroscience, 38(33), 7255–7269. es_ES
dc.description.references Rajpurkar, P., Jia, R., & Liang, P. (2018). Know what you don’t know: Unanswerable questions for squad. arXiv:180603822. es_ES
dc.description.references Rozen, O., Shwartz, V., Aharoni, R., & Dagan, I. (2019). Diversify your datasets: Analyzing generalization via controlled variance in adversarial datasets. In Proceedings of the 23rd conference on computational natural language learning (CoNLL), Association for Computational Linguistics, Hong Kong, China, pp. 196–205. es_ES
dc.description.references Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252. es_ES
dc.description.references Sakaguchi, K., Bras, R. L., Bhagavatula, C., & Choi, Y. (2019). Winogrande: An adversarial winograd schema challenge at scale. arXiv:190710641. es_ES
dc.description.references Samuel, A. L. (1959). Some studies in machine learning using the game of checkers. IBM Journal of Research and Development, 3(3), 210–229. es_ES
dc.description.references Saygin, A. P., Cicekli, I., & Akman, V. (2000). Turing test: 50 years later. Minds and Machines, 10(4), 463–518. es_ES
dc.description.references Schlangen, D. (2019). Language tasks and language games: On methodology in current natural language processing research. arXiv:190810747. es_ES
dc.description.references Schoenick, C., Clark, P., Tafjord, O., Turney, P., & Etzioni, O. (2017). Moving beyond the Turing test with the Allen AI science challenge. Communications of the ACM, 60(9), 60–64. es_ES
dc.description.references Schrimpf, M., Kubilius, J., Hong, H., Majaj, N. J., Rajalingham, R., Issa, E. B., Kar, K., Bashivan, P., Prescott-Roy, J., Schmidt, K., Yamins, D. L. K., & DiCarlo, J. J. (2018). Brain-score: Which artificial neural network for object recognition is most brain-like? bioRxiv preprint. es_ES
dc.description.references Schweizer, P. (1998). The truly total Turing test. Minds and Machines, 8(2), 263–272. es_ES
dc.description.references Sebeok, T. A., & Rosenthal, R. E. (1981). The clever Hans phenomenon: Communication with horses, whales, apes, and people. Annals of the NY Academy of Sciences, 364, 1–17. es_ES
dc.description.references Seber, G. A. F., & Salehi, M. M. (2013). Adaptive cluster sampling. In Adaptive sampling designs (pp 11–26). New York: Springer. es_ES
dc.description.references Settles, B. (2009). Active learning. Tech. rep., synthesis lectures on artificial intelligence and machine learning. Morgan & Claypool. es_ES
dc.description.references Shah, H., & Warwick, K. (2015). Human or machine? Communications of the ACM, 58(4), 8. es_ES
dc.description.references Shanahan, M. (2015). The technological singularity. New York: MIT Press. es_ES
dc.description.references Shoham, Y. (2017). Towards the AI index. AI Magazine, 38(4), 71–77. es_ES
dc.description.references Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., et al. (2017b). Mastering the game of Go without human knowledge. Nature, 550(7676), 354–359. es_ES
dc.description.references Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel T, et al. (2017a). Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv:171201815. es_ES
dc.description.references Sloman, A. (2014). Judging chatbots at Turing test. http://www.csbhamacuk/research/projects/cogaff/misc/turing-test-2014html. es_ES
dc.description.references Stern, R., Sturtevant, N., Felner, A., Koenig, S, et al. (2019). Multi-agent pathfinding: Definitions, variants, and benchmarks. arXiv:190608291. es_ES
dc.description.references Sturm, B. L. (2014). A simple method to determine if a music information retrieval system is a “horse”. IEEE Transactions on Multimedia, 16(6), 1636–1644. es_ES
dc.description.references Turing, A. M. (1950). Computing machinery and intelligence. Mind, 59, 433–460. es_ES
dc.description.references Turing, A. (1952). Can automatic calculating machines be said to think? BBC. BBC Third Programme, 14 and 23 Jan. 1952, between M. H. A. Newman, A. M. T., Sir Geoffrey Jefferson and R. B. Braithwaite. Reprinted in Copeland, B. J. (ed.) The essential Turing(pp. 494–495). Oxford: Oxford University Press. http://www.turingarchive.org/browse.php/B/6. es_ES
dc.description.references Vale, C. D., & Weiss, D. J. (1975). A study of computer-administered stradaptive ability testing. Tech. rep., Minnesota Univ. Minneapolis Dept. of Psychology. es_ES
dc.description.references Van Seijen, H., Fatemi, M., Romoff, J., Laroche, R., Barnes, T., & Tsang, J. (2017). Hybrid reward architecture for reinforcement learning. In NIPS (pp. 5392–5402). es_ES
dc.description.references Vardi, M. Y. (2015). Human or machine? Response. Communications of the ACM, 58(4), 8–8. es_ES
dc.description.references Vinyals, O., Ewalds, T., Bartunov, S., Georgiev, P., Vezhnevets, A. S., Yeo, M., Makhzani, A., Küttler, H., Agapiou, J., Schrittwieser, J., et al. (2017). Starcraft ii: A new challenge for reinforcement learning. arXiv:170804782. es_ES
dc.description.references von Ahn, L., Blum, M., & Langford, J. (2004). Telling humans and computers apart automatically. Communications of the ACM, 47(2), 56–60. es_ES
dc.description.references von Ahn, L., Maurer, B., McMillen, C., Abraham, D., & Blum, M. (2008). RECAPTCHA: Human-based character recognition via web security measures. Science, 321(5895), 1465. es_ES
dc.description.references Wainer, H. (2000). Computerized adaptive testing: A primer (2nd ed.). Mahwah, NJ: Lawrence Erlabaum Associate Publishers. es_ES
dc.description.references Wang, A., Pruksachatkun, Y., Nangia, N., Singh, A., Michael, J., Hill, F., Levy, O., & Bowman, S. R. (2019). Superglue: A stickier benchmark for general-purpose language understanding systems. arXiv:190500537. es_ES
dc.description.references Watt, S. (1996). Naive psychology and the inverted Turing test. Psycoloquy, 7(14), 463–518. es_ES
dc.description.references Weiss, D. J. (2011). Better data from better measurements using computerized adaptive testing. Journal of Methods and Measurement in the Social Sciences, 2(1), 1–27. es_ES
dc.description.references You, J. (2015). Beyond the Turing test. Science, 347(6218), 116–116. es_ES
dc.description.references Youyou, W., Kosinski, M., & Stillwell, D. (2015). Computer-based personality judgments are more accurate than those made by humans. Proceedings of the National Academy of Sciences, 112(4), 1036–1040. es_ES
dc.description.references Zadeh, L. A. (2008). Toward human level machine intelligence-Is it achievable? The need for a paradigm shift. IEEE Computational Intelligence Magazine, 3(3), 11–22. es_ES
dc.description.references Zellers, R., Bisk, Y., Schwartz, R., & Choi, Y. (2018). Swag: A large-scale adversarial dataset for grounded commonsense inference. In Proceedings of the 2018 conference on empirical methods in natural language processing (EMNLP). es_ES
dc.description.references Zellers, R., Holtzman, A., Bisk, Y., Farhadi, A., & Choi, Y. (2019). Hellaswag: Can a machine really finish your sentence? arXiv:190507830. es_ES
dc.description.references Zhou, P., Khanna, R., Lin, B. Y., Ho, D., Ren, X., & Pujara, J. (2020). Can BERT reason? logically equivalent probes for evaluating the inference capabilities of language models. arXiv:200500782. es_ES
dc.description.references Zillich, M. (2012). My robot is smarter than your robot. on the need for a total Turing test for robots. In: V. Muller & A. Ayesh (Eds.), AISB/IACAP 2012 symposium “revisiting turing and his test”, The Society for the Study of Artificial Intelligence and the Simulation of Behaviour, pp. 12–15. es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem