Moros Daval, Yael(Universitat Politècnica de València, 2023-09-19)
[EN] Large language models can be used for a wide range of tasks. The performance on each task instance depends on the specific characteristics of the question (e.g., knowledge or reasoning required) but also on its ...
Martínez-Plumed, Fernando; Hernández-Orallo, José(Institute of Electrical and Electronics Engineers (IEEE), 2020-06)
[EN] With the purpose of better analyzing the result of artificial intelligence (AI) benchmarks, we present two indicators on the side of the AI problems, difficulty and discrimination, and two indicators on the side of ...
José Hernández-Orallo(Springer Verlag (Germany), 2016-08-19)
The evaluation of artificial intelligence systems and components is crucial for the
progress of the discipline. In this paper we describe and critically assess the different ways
AI systems are evaluated, and the role ...
This supplementary material serves as technical appendix of the paper When AI Difficulty is Easy: The Explanatory Power of Predicting IRT Difficulty (Martínez-Plumed
et al. 2022), published in The Thirty-Sixth AAAI ...
Zhou, Lexin(Universitat Politècnica de València, 2023-06-20)
[EN] Pretrained artificial intelligence models are made more human-like and human-aligned by scaling them up in resources (e.g., by increasing compute, training data and parameter size) and shaping them up with human ...