Mostrar el registro sencillo del ítem
dc.contributor.author | ORTIZ, V. | es_ES |
dc.contributor.author | Pérez-Benito, Francisco Javier | es_ES |
dc.contributor.author | del Tejo Catalá, Omar | es_ES |
dc.contributor.author | Salvador Igual, Ismael | es_ES |
dc.contributor.author | Llobet Azpitarte, Rafael | es_ES |
dc.contributor.author | Perez-Cortes, Juan-Carlos | es_ES |
dc.date.accessioned | 2023-05-31T18:01:51Z | |
dc.date.available | 2023-05-31T18:01:51Z | |
dc.date.issued | 2022-11-14 | es_ES |
dc.identifier.uri | http://hdl.handle.net/10251/193768 | |
dc.description.abstract | [EN] The a priori probability of a dataset is usually used as a baseline for comparing a particular algorithm's accuracy in a given binary classification task. ZeroR is the simplest algorithm for this, predicting the majority class for all examples. However, this is an extremely simple approach that has no predictive power and does not describe other dataset features that could lead to a more demanding baseline. In this paper, we present the Extended A Priori Probability (EAPP), a novel semi-supervised baseline metric for binary classification tasks that considers not only the a priori probability but also some possible bias present in the dataset as well as other features that could provide a relatively trivial separability of the target classes. The approach is based on the area under the ROC curve (AUC ROC), known to be quite insensitive to class imbalance. The procedure involves multiobjective feature extraction and a clustering stage in the input space with autoencoders and a subsequent combinatory weighted assignation from clusters to classes depending on the distance to nearest clusters for each class. Class labels are then assigned to establish the combination that maximizes AUC ROC for each number of clusters considered. To avoid overfit in the combined feature extraction and clustering method, a cross-validation scheme is performed in each case. EAPP is defined for different numbers of clusters, starting from the inverse of the minority class proportion, which is useful for a fair comparison among diversely imbalanced datasets. A high EAPP usually relates to an easy binary classification task, but it also may be due to a significant coarse-grained bias in the dataset, when the task is previously known to be difficult. This metric represents a baseline beyond the a priori probability to assess the actual capabilities of binary classification models. | es_ES |
dc.description.sponsorship | This work was supported in part by the Generalitat Valenciana through the Valencian Institute of Business Competitiveness (IVACE) Distributed Nominatively to Valencian Technological Innovation Centers under Project IMAMCN/2021/1, in part by the Cervera Network of Excellence Project in Data-Based Enabling Technologies (AI4ES) Co-Funded by the Centre for Industrial and Technological Development¿E. P. E. (CDTI), and in part by the European Union through the Next Generation EU Fund within the Cervera Aids Program for Technological Centers under Project CER-20211030. | es_ES |
dc.language | Inglés | es_ES |
dc.publisher | Institute of Electrical and Electronics Engineers | es_ES |
dc.relation.ispartof | IEEE Access | es_ES |
dc.rights | Reconocimiento (by) | es_ES |
dc.subject | A priori probability | es_ES |
dc.subject | EAPP | es_ES |
dc.subject | Clustering | es_ES |
dc.subject | Autoencoder | es_ES |
dc.subject | Semisupervised | es_ES |
dc.subject | Combinatory | es_ES |
dc.subject | Bias | es_ES |
dc.subject.classification | LENGUAJES Y SISTEMAS INFORMATICOS | es_ES |
dc.subject.classification | ARQUITECTURA Y TECNOLOGIA DE COMPUTADORES | es_ES |
dc.title | Extended a Priori Probability (EAPP): A Data-Driven Approach for Machine Learning Binary Classification Tasks | es_ES |
dc.type | Artículo | es_ES |
dc.identifier.doi | 10.1109/ACCESS.2022.3221936 | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/EC//CER-20211030//Next Generation EU Fund/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/IVACE//IMAMCN%2F2021%2F1/ | es_ES |
dc.rights.accessRights | Abierto | es_ES |
dc.contributor.affiliation | Universitat Politècnica de València. Escola Tècnica Superior d'Enginyeria Informàtica | es_ES |
dc.contributor.affiliation | Universitat Politècnica de València. Escuela Técnica Superior de Ingenieros de Telecomunicación - Escola Tècnica Superior d'Enginyers de Telecomunicació | es_ES |
dc.description.bibliographicCitation | Ortiz, V.; Pérez-Benito, FJ.; Del Tejo Catalá, O.; Salvador Igual, I.; Llobet Azpitarte, R.; Perez-Cortes, J. (2022). Extended a Priori Probability (EAPP): A Data-Driven Approach for Machine Learning Binary Classification Tasks. IEEE Access. 10:120074-120085. https://doi.org/10.1109/ACCESS.2022.3221936 | es_ES |
dc.description.accrualMethod | S | es_ES |
dc.relation.publisherversion | https://doi.org/10.1109/ACCESS.2022.3221936 | es_ES |
dc.description.upvformatpinicio | 120074 | es_ES |
dc.description.upvformatpfin | 120085 | es_ES |
dc.type.version | info:eu-repo/semantics/publishedVersion | es_ES |
dc.description.volume | 10 | es_ES |
dc.identifier.eissn | 2169-3536 | es_ES |
dc.relation.pasarela | S\477905 | es_ES |
dc.contributor.funder | European Commission | es_ES |
dc.contributor.funder | Institut Valencià de Competitivitat Empresarial | es_ES |
dc.contributor.funder | Centro para el Desarrollo Tecnológico Industrial | es_ES |