Resumen:
|
[EN] Many performance metrics have been introduced in the literature for the evaluation of classification
performance, each of them with different origins and areas of application. These metrics include
accuracy, unweighted ...[+]
[EN] Many performance metrics have been introduced in the literature for the evaluation of classification
performance, each of them with different origins and areas of application. These metrics include
accuracy, unweighted accuracy, the area under the ROC curve or the ROC convex hull, the mean
absolute error and the Brier score or mean squared error (with its decomposition into refinement and
calibration). One way of understanding the relations among these metrics is by means of variable
operating conditions (in the form of misclassification costs and/or class distributions). Thus, a
metric may correspond to some expected loss over different operating conditions. One dimension
for the analysis has been the distribution for this range of operating conditions, leading to some
important connections in the area of proper scoring rules. We demonstrate in this paper that there
is an equally important dimension which has so far received much less attention in the analysis of
performance metrics. This dimension is given by the decision rule, which is typically implemented
as a threshold choice method when using scoring models. In this paper, we explore many old and
new threshold choice methods: fixed, score-uniform, score-driven, rate-driven and optimal, among
others. By calculating the expected loss obtained with these threshold choice methods for a uniform
range of operating conditions we give clear interpretations of the 0-1 loss, the absolute error, the
Brier score, the AUC and the refinement loss respectively. Our analysis provides a comprehensive
view of performance metrics as well as a systematic approach to loss minimisation which can be
summarised as follows: given a model, apply the threshold choice methods that correspond with
the available information about the operating condition, and compare their expected losses. In
order to assist in this procedure we also derive several connections between the aforementioned
performance metrics, and we highlight the role of calibration in choosing the threshold choice
method.
[-]
|