- -

Extended a Priori Probability (EAPP): A Data-Driven Approach for Machine Learning Binary Classification Tasks

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

Extended a Priori Probability (EAPP): A Data-Driven Approach for Machine Learning Binary Classification Tasks

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author ORTIZ, V. es_ES
dc.contributor.author Pérez-Benito, Francisco Javier es_ES
dc.contributor.author del Tejo Catalá, Omar es_ES
dc.contributor.author Salvador Igual, Ismael es_ES
dc.contributor.author Llobet Azpitarte, Rafael es_ES
dc.contributor.author Perez-Cortes, Juan-Carlos es_ES
dc.date.accessioned 2023-05-31T18:01:51Z
dc.date.available 2023-05-31T18:01:51Z
dc.date.issued 2022-11-14 es_ES
dc.identifier.uri http://hdl.handle.net/10251/193768
dc.description.abstract [EN] The a priori probability of a dataset is usually used as a baseline for comparing a particular algorithm's accuracy in a given binary classification task. ZeroR is the simplest algorithm for this, predicting the majority class for all examples. However, this is an extremely simple approach that has no predictive power and does not describe other dataset features that could lead to a more demanding baseline. In this paper, we present the Extended A Priori Probability (EAPP), a novel semi-supervised baseline metric for binary classification tasks that considers not only the a priori probability but also some possible bias present in the dataset as well as other features that could provide a relatively trivial separability of the target classes. The approach is based on the area under the ROC curve (AUC ROC), known to be quite insensitive to class imbalance. The procedure involves multiobjective feature extraction and a clustering stage in the input space with autoencoders and a subsequent combinatory weighted assignation from clusters to classes depending on the distance to nearest clusters for each class. Class labels are then assigned to establish the combination that maximizes AUC ROC for each number of clusters considered. To avoid overfit in the combined feature extraction and clustering method, a cross-validation scheme is performed in each case. EAPP is defined for different numbers of clusters, starting from the inverse of the minority class proportion, which is useful for a fair comparison among diversely imbalanced datasets. A high EAPP usually relates to an easy binary classification task, but it also may be due to a significant coarse-grained bias in the dataset, when the task is previously known to be difficult. This metric represents a baseline beyond the a priori probability to assess the actual capabilities of binary classification models. es_ES
dc.description.sponsorship This work was supported in part by the Generalitat Valenciana through the Valencian Institute of Business Competitiveness (IVACE) Distributed Nominatively to Valencian Technological Innovation Centers under Project IMAMCN/2021/1, in part by the Cervera Network of Excellence Project in Data-Based Enabling Technologies (AI4ES) Co-Funded by the Centre for Industrial and Technological Development¿E. P. E. (CDTI), and in part by the European Union through the Next Generation EU Fund within the Cervera Aids Program for Technological Centers under Project CER-20211030. es_ES
dc.language Inglés es_ES
dc.publisher Institute of Electrical and Electronics Engineers es_ES
dc.relation.ispartof IEEE Access es_ES
dc.rights Reconocimiento (by) es_ES
dc.subject A priori probability es_ES
dc.subject EAPP es_ES
dc.subject Clustering es_ES
dc.subject Autoencoder es_ES
dc.subject Semisupervised es_ES
dc.subject Combinatory es_ES
dc.subject Bias es_ES
dc.subject.classification LENGUAJES Y SISTEMAS INFORMATICOS es_ES
dc.subject.classification ARQUITECTURA Y TECNOLOGIA DE COMPUTADORES es_ES
dc.title Extended a Priori Probability (EAPP): A Data-Driven Approach for Machine Learning Binary Classification Tasks es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.1109/ACCESS.2022.3221936 es_ES
dc.relation.projectID info:eu-repo/grantAgreement/EC//CER-20211030//Next Generation EU Fund/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/IVACE//IMAMCN%2F2021%2F1/ es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Escola Tècnica Superior d'Enginyeria Informàtica es_ES
dc.contributor.affiliation Universitat Politècnica de València. Escuela Técnica Superior de Ingenieros de Telecomunicación - Escola Tècnica Superior d'Enginyers de Telecomunicació es_ES
dc.description.bibliographicCitation Ortiz, V.; Pérez-Benito, FJ.; Del Tejo Catalá, O.; Salvador Igual, I.; Llobet Azpitarte, R.; Perez-Cortes, J. (2022). Extended a Priori Probability (EAPP): A Data-Driven Approach for Machine Learning Binary Classification Tasks. IEEE Access. 10:120074-120085. https://doi.org/10.1109/ACCESS.2022.3221936 es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion https://doi.org/10.1109/ACCESS.2022.3221936 es_ES
dc.description.upvformatpinicio 120074 es_ES
dc.description.upvformatpfin 120085 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 10 es_ES
dc.identifier.eissn 2169-3536 es_ES
dc.relation.pasarela S\477905 es_ES
dc.contributor.funder European Commission es_ES
dc.contributor.funder Institut Valencià de Competitivitat Empresarial es_ES
dc.contributor.funder Centro para el Desarrollo Tecnológico Industrial es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem