Research community dynamics behind popular AI benchmarks

Martínez-Plumed, Fernando; Barredo, Pablo; Ó HÉigeartaigh, Seán; Hernández-Orallo, José

doi:10.1038/s42256-021-00339-6

Identificarse

Buscar en RiuNet

Listar

Todo RiuNet
Esta colección

Mi cuenta

Acceder

Estadísticas

Ver Estadísticas de uso

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Research community dynamics behind popular AI benchmarks

Mostrar el registro sencillo del ítem

Ficheros en el ítem

Nombre: Martinez-PlumedBa ...

Tamaño: 1.420Mb

Formato: PDF

Descripción: Versión editorial

Abrir

dc.contributor.author	Martínez-Plumed, Fernando	es_ES
dc.contributor.author	Barredo, Pablo	es_ES
dc.contributor.author	Ó HÉigeartaigh, Seán	es_ES
dc.contributor.author	Hernández-Orallo, José	es_ES
dc.date.accessioned	2022-04-27T06:28:27Z
dc.date.available	2022-04-27T06:28:27Z
dc.date.issued	2021-07	es_ES
dc.identifier.uri	http://hdl.handle.net/10251/182155
dc.description.abstract	[EN] The widespread use of experimental benchmarks in AI research has created competition and collaboration dynamics that are still poorly understood. Here we provide an innovative methodology to explore these dynamics and analyse the way different entrants in these challenges, from academia to tech giants, behave and react depending on their own or others' achievements. We perform an analysis of 25 popular benchmarks in AI from Papers With Code, with around 2,000 result entries overall, connected with their underlying research papers. We identify links between researchers and institutions (that is, communities) beyond the standard co-authorship relations, and we explore a series of hypotheses about their behaviour as well as some aggregated results in terms of activity, performance jumps and efficiency. We characterize the dynamics of research communities at different levels of abstraction, including organization, affiliation, trajectories, results and activity. We find that hybrid, multi-institution and persevering communities are more likely to improve state-of-the-art performance, which becomes a watershed for many community members. Although the results cannot be extrapolated beyond our selection of popular machine learning benchmarks, the methodology can be extended to other areas of artificial intelligence or robotics, and combined with bibliometric studies.	es_ES
dc.description.sponsorship	F.M.-P. acknowledges funding from the AI-Watch project by DG CONNECT and DG JRC of the European Commission. J.H.-O. and S.O.h. were funded by the Future of Life Institute, FLI, under grant RFP2-152. J.H.-O. was supported by the EU (FEDER) and Spanish MINECO under RTI2018-094403-B-C32, Generalitat Valenciana under PROMETEO/2019/098 and European Union's Horizon 2020 grant no. 952215 (TAILOR).	es_ES
dc.language	Inglés	es_ES
dc.publisher	Nature Publishing Group	es_ES
dc.relation.ispartof	Nature Machine Intelligence	es_ES
dc.rights	Reserva de todos los derechos	es_ES
dc.subject.classification	LENGUAJES Y SISTEMAS INFORMATICOS	es_ES
dc.title	Research community dynamics behind popular AI benchmarks	es_ES
dc.type	Artículo	es_ES
dc.identifier.doi	10.1038/s42256-021-00339-6	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/EC/H2020/952215/EU	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/GVA//PROMETEO%2F2019%2F098//DEEPTRUST/	es_ES
dc.rights.accessRights	Abierto	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació	es_ES
dc.description.bibliographicCitation	Martínez-Plumed, F.; Barredo, P.; Ó Héigeartaigh, S.; Hernández-Orallo, J. (2021). Research community dynamics behind popular AI benchmarks. Nature Machine Intelligence. 3(7):581-589. https://doi.org/10.1038/s42256-021-00339-6	es_ES
dc.description.accrualMethod	S	es_ES
dc.relation.publisherversion	https://doi.org/10.1038/s42256-021-00339-6	es_ES
dc.description.upvformatpinicio	581	es_ES
dc.description.upvformatpfin	589	es_ES
dc.type.version	info:eu-repo/semantics/publishedVersion	es_ES
dc.description.volume	3	es_ES
dc.description.issue	7	es_ES
dc.identifier.eissn	2522-5839	es_ES
dc.relation.pasarela	S\458370	es_ES
dc.contributor.funder	European Commission	es_ES
dc.contributor.funder	Generalitat Valenciana	es_ES
dc.contributor.funder	COMISION DE LAS COMUNIDADES EUROPEA	es_ES
dc.description.references	Fortunato, S. et al. Science of science. Science 359, eaao0185 (2018).	es_ES
dc.description.references	Wu, L., Wang, D. & Evans, J. A. Large teams develop and small teams disrupt science and technology. Nature 566, 378–382 (2019).	es_ES
dc.description.references	Frank, M. R., Wang, D., Cebrian, M. & Rahwan, I. The evolution of citation graphs in artificial intelligence research. Nat. Mach. Intell. 1, 79–85 (2019).	es_ES
dc.description.references	Martínez-Plumed, F. et al. Accounting for the neglected dimensions of AI progress. Preprint at https://arxiv.org/abs/1806.00610 (2018).	es_ES
dc.description.references	Perrault, R. et al. The AI Index 2019 Annual Report (AI Index Steering Committee, Human-Centered AI Institute, Stanford Univ. 2019); https://hai.stanford.edu/ai-index-2019	es_ES
dc.description.references	Clauset, A., Newman, M. E. J. & Moore, C. Finding community structure in very large networks. Phys. Rev. E 70, 66–111 (2004).	es_ES
dc.description.references	Van Raan, A. The influence of international collaboration on the impact of research results: some simple mathematical considerations concerning the role of self-citations. Scientometrics 42, 423–428 (1998).	es_ES
dc.description.references	Deng, J. et al. ImageNet: a large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition 248–55 (IEEE, 2009).	es_ES
dc.description.references	Rajpurkar, P., Zhang, J., Lopyrev, K. & Liang, P. SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing 2383–2392 (Association for Computational Linguistics, 2016).	es_ES
dc.description.references	Bonferroni, C. Teoria statistica delle classi e calcolo delle probabilita. Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commericiali di Firenze 8, 3–62 (1936).	es_ES
dc.description.references	Kwok, R. Junior AI researchers are in demand by universities and industry. Nature 568, 581–584 (2019).	es_ES
dc.description.references	Rhoades, S. A. The Herfindahl–Hirschman index. Fed. Res. Bull. 79, 188–189 (1993).	es_ES
dc.description.references	Cave, S. & Ó hÉigeartaigh, S. S. An AI race for strategic advantage: rhetoric and risks. In Proc. 2018 AAAI/ACM Conference on AI, Ethics, and Society 36–40 (Association for Computing Machinery, 2018).	es_ES
dc.description.references	Lee, K.-F. AI Superpowers: China, Silicon Valley, and the New World Order (Houghton Mifflin Harcourt, 2018).	es_ES
dc.description.references	Horowitz, M. C., Allen, G. C., Kania, E. B. & Scharre, P. Strategic Competition in an Era of Artificial Intelligence 8 (Center for New American Security, 2018).	es_ES
dc.description.references	Li, W. C., Nirei, M. & Yamana, K. Value of Data: There’s No Such Thing as a Free Lunch in the Digital Economy Working Paper (US Bureau of Economic Analysis, 2019).	es_ES
dc.description.references	Krizhevsky, A. Learning Multiple Layers of Features from Tiny Images. https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf (2009).	es_ES
dc.description.references	Hernández-Orallo, J. et al. A new AI evaluation cosmos: Ready to play the game? AI Magazine 38, 66–69 (2017).	es_ES
dc.description.references	Shoham, Y. Towards the AI index. AI Magazine 38, 71–77 (2017).	es_ES
dc.description.references	Niu, J., Tang, W., Xu, F., Zhou, X. & Song, Y. Global research on AI from 1990–2014: spatially-explicit bibliometric analysis. ISPRS Int. J. Geoinf. 5, 66 (2016).	es_ES
dc.description.references	Juan Mateos-Garcia, K. S., Klinger, J. & Winch, R. A Semantic Analysis of the Recent Evolution of AI Research. https://www.nesta.org.uk/report/semantic-analysis-recent-evolution-ai-research/ (NESTA, 2019).	es_ES
dc.description.references	Gao, F. et al. Bibliometric analysis on tendency and topics of artificial intelligence over last decade. Microsyst. Technol. 1–13 (2019).	es_ES
dc.description.references	Tran, B. X. et al. Global evolution of research in artificial intelligence in health and medicine: a bibliometric study. J. Clin. Med. 8, 360 (2019).	es_ES
dc.description.references	Tang, X., Li, X., Ding, Y., Song, M. & Bu, Y. The pace of artificial intelligence innovations: speed, talent, and trial-and-error. J. Inf. 14, 101094 (2020).	es_ES
dc.description.references	Qian, Y., Liu, Y. & Sheng, Q. Z. Understanding hierarchical structural evolution in a scientific discipline: a case study of artificial intelligence. J. Inf. 14, 101047 (2020).	es_ES
dc.description.references	Serenko, A. The development of an AI journal ranking based on the revealed preference approach. J. Inf. 4, 447–459 (2010).	es_ES
dc.description.references	Campbell, M., Hoane Jr, A. J. & Hsu, F.-h Deep Blue. Artif. Intell. 134, 57–83 (2002).	es_ES
dc.description.references	Ferrucci, D. A. Introduction to ‘This is Watson’. IBM J. Res. Dev. 56, 235–249 (2012).	es_ES
dc.description.references	Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).	es_ES
dc.description.references	Silver, D. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016).	es_ES
dc.description.references	Schlangen, D. Language tasks and language games: on methodology in current natural language processing research. Preprint at https://arxiv.org/abs/1908.10747 (2019).	es_ES
dc.description.references	Zellers, R., Holtzman, A., Bisk, Y., Farhadi, A. & Choi, Y. Hellaswag: can a machine really finish your sentence? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics 4791–4800 (Association for Computational Linguistics, 2019).	es_ES
dc.description.references	Lei, Y. & Liu, Z. The development of artificial intelligence: a bibliometric analysis, 2007–2016. J. Physi. 1168, 022027 (2019).	es_ES
dc.description.references	Martínez-Plumed, F. et al. The facets of artificial intelligence: a framework to track the evolution of AI. In Proc. Twenty-Seventh International Joint Conference on Artificial Intelligence 5180–5187 (International Joint Conferences on Artificial Intelligence Organization, 2018).	es_ES
dc.description.references	Bhattacharya, J. & Packalen, M. Stagnation and Scientific Incentives Technical Report (National Bureau of Economic Research, 2020).	es_ES
dc.description.references	Houghton, B. et al. Guaranteeing reproducibility in deep learning competitions. Preprint at https://arxiv.org/abs/2005.06041 (2020).	es_ES
dc.description.references	Lucic, M., Kurach, K., Michalski, M., Gelly, S. & Bousquet, O. Are gans created equal? A large-scale study. Adv. Neural Inf. Process. Syst. 700–709 (2018).	es_ES
dc.description.references	Hernandez, D. & Brown, T. B. Measuring the algorithmic efficiency of neural networks. Preprint at https://arxiv.org/abs/2005.04305 (2020).	es_ES
dc.description.references	Mattson, P. et al. MLPerf training benchmark. Preprint https://arxiv.org/abs/1910.01500 (2019).	es_ES
dc.description.references	Martínez-Plumed, F. & Hernández-Orallo, J. Dual indicators to analyse AI benchmarks: difficulty, discrimination, ability, and generality. IEEE Trans. Games 12, 121–131 (2020).	es_ES
dc.description.references	Martínez-Plumed, F., Barredo, P., hÉigeartaigh, S. Ó. & Hernández-Orallo, J. AI research dynamics. GitHub https://github.com/nandomp/AI_Research_Dynamics (2021).	es_ES
dc.description.references	Kuehne, H., Jhuang, H., Garrote, E., Poggio, T. & Serre, T. HMDB: a large video database for human motion recognition. In 2011 International Conference on Computer Vision 2556–2563 (IEEE, 2011).	es_ES
dc.description.references	Soomro, K., Zamir, A. R. & Shah, M. UCF101: a dataset of 101 human actions classes from videos in the wild. Preprint at https://arxiv.org/abs/1212.0402 (2012).	es_ES
dc.description.references	Bellemare, M. G., Naddaf, Y., Veness, J. & Bowling, M. The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013).	es_ES
dc.description.references	Timofte, R., De Smet, V. & Van Gool, L. Anchored neighborhood regression for fast example-based super-resolution. In Proc. IEEE International Conference on Computer Vision 1920–1927 (IEEE, 2013).	es_ES
dc.description.references	Hutter, M. Human knowledge compression contest. Hutter Prize http://prize.hutter1.net/ (2006).	es_ES
dc.description.references	Mikolov, T., Deoras, A., Kombrink, S., Burget, L. & Černocky, J. Empirical evaluation and combination of advanced language modeling techniques. In Twelfth Annual Conference of the International Speech Communication Association 605–608 (2011).	es_ES
dc.description.references	Dettmers, T., Minervini, P., Stenetorp, P. & Riedel, S. Convolutional 2D knowledge graph embeddings. In Proc. AAAI Conference on Artificial Intelligence Vol. 32 (2018).	es_ES
dc.description.references	Bojar, O. et al. Findings of the 2014 workshop on statistical machine translation. In Proc. Ninth Workshop on Statistical Machine Translation 12–58 (Association for Computational Linguistics, 2014); http://www.aclweb.org/anthology/W/W14/W14-3302	es_ES
dc.description.references	Sang, E. F. & De Meulder, F. Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003 142–147 (2003).	es_ES
dc.description.references	Weischedel, R. et al. Ontonotes Release 5.0 ldc2013t19 23 (Linguistic Data Consortium, 2013).	es_ES
dc.description.references	Lin, T.-Y. et al. Microsoft COCO: common objects in context. In European Conference on Computer Vision 740–755 (Springer, 2014).	es_ES
dc.description.references	Andriluka, M., Pishchulin, L., Gehler, P. & Schiele, B. 2D human pose estimation: new benchmark and state of the art analysis. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 3686–3693 (IEEE, 2014).	es_ES
dc.description.references	Yang, Y., Yih, W.-t. & Meek, C. Wikiqa: a challenge dataset for open-domain question answering. In Proc. 2015 Conference on Empirical Methods in Natural Language Processing 2013–2018 (Association for Computational Linguistics, 2015).	es_ES
dc.description.references	Cordts, M. et al. The cityscapes dataset for semantic urban scene understanding. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 3213–3223 (IEEE, 2016).	es_ES
dc.description.references	Everingham, M. et al. The Pascal visual object classes challenge: a retrospective. Int. J. Comput. Vis. 111, 98–136 (2015).	es_ES
dc.description.references	Maas, A. L. et al. Learning word vectors for sentiment analysis. In Proc. 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies Vol. 1, 142–150 (Association for Computational Linguistics, 2011).	es_ES
dc.description.references	Socher, R. et al. Recursive deep models for semantic compositionality over a sentiment treebank. In Proc. 2013 Conference on Empirical Methods in Natural Language Processing 1631–1642 (Association for Computational Linguistics, 2013).	es_ES
dc.description.references	Panayotov, V., Chen, G., Povey, D. & Khudanpur, S. Librispeech: an ASR corpus based on public domain audio books. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 5206–5210 (IEEE, 2015).	es_ES

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem

Research community dynamics behind popular AI benchmarks

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Buscar en RiuNet

Listar

Todo RiuNet

Esta colección

Mi cuenta

Estadísticas

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Research community dynamics behind popular AI benchmarks

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)