- -

Binarised regression tasks: methods and evaluation metrics

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

Binarised regression tasks: methods and evaluation metrics

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Hernández Orallo, José es_ES
dc.contributor.author Ferri Ramírez, César es_ES
dc.contributor.author Lachiche, Nicolas es_ES
dc.contributor.author Martínez Usó, Adolfo es_ES
dc.contributor.author Ramírez Quintana, María José es_ES
dc.date.accessioned 2016-05-04T11:45:23Z
dc.date.available 2016-05-04T11:45:23Z
dc.date.issued 2015-11
dc.identifier.issn 1384-5810
dc.identifier.uri http://hdl.handle.net/10251/63549
dc.description “The final publication is available at Springer via http://dx.doi.org/ 10.1007/s10618-015-0443-9" es_ES
dc.description.abstract Some supervised tasks are presented with a numerical output but decisions have to be made in a discrete, binarised, way, according to a particular cutoff. This binarised regression task is a very common situation that requires its own analysis, different from regression and classification—and ordinal regression. We first investigate the application cases in terms of the information about the distribution and range of the cutoffs and distinguish six possible scenarios, some of which are more common than others. Next, we study two basic approaches: the retraining approach, which discretises the training set whenever the cutoff is available and learns a new classifier from it, and the reframing approach, which learns a regression model and sets the cutoff when this is available during deployment. In order to assess the binarised regression task, we introduce context plots featuring error against cutoff. Two special cases are of interest, the UCEUCE and OCEOCE curves, showing that the area under the former is the mean absolute error and the latter is a new metric that is in between a ranking measure and a residual-based measure. A comprehensive evaluation of the retraining and reframing approaches is performed using a repository of binarised regression problems created on purpose, concluding that no method is clearly better than the other, except when the size of the training data is small. es_ES
dc.description.sponsorship We thank the anonymous reviewers for their comments, which have helped to improve this paper significantly. We thank Peter Flach and Meelis Kull for their insightful comments and very useful suggestions. This work was supported by the Spanish MINECO under Grant TIN 2013-45732-C4-1-P and by Generalitat Valenciana PROMETEOII2015/013. This research has been developed within the REFRAME project, granted by the European Coordinated Research on Long-term Challenges in Information and Communication Sciences & Technologies ERA-Net (CHIST-ERA), and funded by the Ministerio de Economia y Competitividad in Spain (PCIN-2013-037) and the Agence Nationale pour la Recherche in France (ANR-12-CHRI-0005-03). en_EN
dc.language Inglés es_ES
dc.publisher Springer Verlag (Germany) es_ES
dc.relation.ispartof Data Mining and Knowledge Discovery es_ES
dc.rights Reserva de todos los derechos es_ES
dc.subject Regression es_ES
dc.subject Classification es_ES
dc.subject Reframinng es_ES
dc.subject Mean absolute error es_ES
dc.subject Cutoff es_ES
dc.subject Binarisation es_ES
dc.subject.classification LENGUAJES Y SISTEMAS INFORMATICOS es_ES
dc.title Binarised regression tasks: methods and evaluation metrics es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.1007/s10618-015-0443-9
dc.relation.projectID info:eu-repo/grantAgreement/MINECO//TIN2013-45732-C4-1-P/ES/UNA APROXIMACION DECLARATIVA AL MODELADO, ANALISIS Y RESOLUCION DE PROBLEMAS/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/CHIST-ERA//CHIST-ERA-2011/EU/Rethinking the Essence, Flexibility and Reusability of Advanced Model Exploitation/REFRAME/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/ANR//ANR-12-CHRI-0005/FR/Repenser l'essence, la flexibilité et la réutilisabilité de l'exploitation de modèles avancés)/REFRAME/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/GVA//PROMETEOII%2F2015%2F013/ES/SmartLogic: Logic Technologies for Software Security and Performance/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/MINECO//PCIN-2013-037/ES/RETHINKING THE ESSENCE, FLEXIBILITY AND REUSABILITY OF ADVANCED MODEL EXPLOITATION/ es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació es_ES
dc.description.bibliographicCitation Hernández Orallo, J.; Ferri Ramírez, C.; Lachiche, N.; Martínez Usó, A.; Ramírez Quintana, MJ. (2015). Binarised regression tasks: methods and evaluation metrics. Data Mining and Knowledge Discovery. 1-43. https://doi.org/10.1007/s10618-015-0443-9 es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion http://link.springer.com/article/10.1007/s10618-015-0443-9 es_ES
dc.description.upvformatpinicio 1 es_ES
dc.description.upvformatpfin 43 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.relation.senia 302244 es_ES
dc.contributor.funder Ministerio de Economía y Competitividad es_ES
dc.contributor.funder Agence Nationale de la Recherche, Francia es_ES
dc.contributor.funder Generalitat Valenciana es_ES
dc.description.references Bache K, Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml es_ES
dc.description.references Bella A, Ferri C, Hernández-Orallo J, Ramírez-Quintana MJ (2014) Aggregative quantification for regression. Data Min Knowl Discov 28(2):475–518 es_ES
dc.description.references Bi J, Bennett KP (2003) Regression error characteristic curves. In: Twentieth international conference on machine learning (ICML-2003). Washington, DC es_ES
dc.description.references Brooks AD (2007) knnflex: a more flexible KNN. R package version 1.1.1 es_ES
dc.description.references Cohen I, Goldszmidt M (2004) Properties and benefits of calibrated classifiers. Knowl Discov Database 2004:125–136 es_ES
dc.description.references Drummond C, Holte R (2000) Explicitly representing expected cost: an alternative to ROC representation. In: Knowledge discovery and data mining, pp 198–207 es_ES
dc.description.references Drummond C, Holte R (2006) Cost curves: an improved method for visualizing classifier performance. Mach Learn 65:95–130 es_ES
dc.description.references Fawcett T (2006) ROC graphs with instance-varying costs. Pattern Recognit Lett 27(8):882–891 es_ES
dc.description.references Fawcett T, Provost F (1997) Adaptive fraud detection. Data Min Knowl Discov 1(3):291–316 es_ES
dc.description.references Federal Financial Institutions Examination Council (2013) Home mortgage disclosure act (HMDA). http://www.ffiec.gov/hmda/ es_ES
dc.description.references Ferri C, Hernández-Orallo J (2004) Cautious classifiers. In: Proceedings of the 1st international workshop on ROC analysis in artificial intelligence (ROCAI-2004), pp 27–36 es_ES
dc.description.references Ferri C, Hernández-Orallo J, Modroiu R (2009) An experimental comparison of performance measures for classification. Pattern Recognit Lett 30(1):27–38 es_ES
dc.description.references Flach P (2003) The geometry of ROC space: understanding machine learning metrics through ROC isometrics. In: Machine learning, proceedings of the twentieth international conference (ICML 2003), pp 194–201 es_ES
dc.description.references Guo Y, Schuurmans D (2008) Discriminative batch mode active learning. In: Platt J, Koller D, Singer Y, Roweis S (eds) Advances in neural information processing systems, vol 20. Curran Associates, Inc, pp 593–600 es_ES
dc.description.references Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor Newsl 11(1):10–18 es_ES
dc.description.references Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer series in statistics. Springer New York Inc., New York es_ES
dc.description.references Hernández-Orallo J (2013) ROC curves for regression. Pattern Recognit 46(12):3395–3411 es_ES
dc.description.references Hernández-Orallo J (2014) Probabilistic reframing for context-sensitive regression. ACM Trans Knowl Discov Data 8(3) es_ES
dc.description.references Hernández-Orallo J, Flach P, Ferri C (2012) A unified view of performance metrics: translating threshold choice into expected classification loss. J Mach Learn Res (JMLR) 13:2813–2869 es_ES
dc.description.references Hornik K, Buchta C, Zeileis A (2009) Open-source machine learning: R meets Weka. Comput Stat 24(2):225–232. doi: 10.1007/s00180-008-0119-7 es_ES
dc.description.references Hsu CN, Knoblock CA (1998) Discovering robust knowledge from databases that change. Data Min Knowl Discov 2(1):69–95 es_ES
dc.description.references Kocjan E, Kononenko I (2009) Regression as cost-sensitive classification. In: International multiconference on information society, pp 38–41 es_ES
dc.description.references Koenker R (2005) Quantile regression, vol 38. Cambridge University Press, Cambridge es_ES
dc.description.references Langford J, Oliveira R, Zadrozny B (2012) Predicting conditional quantiles via reduction to classification. arXiv:1206.6860 es_ES
dc.description.references Langford J, Zadrozny B (2005) Estimating class membership probabilities using classifier learners. In: Proceedings of the tenth international workshop on artificial intelligence and statistics (AISTAT05), pp 198–205 es_ES
dc.description.references Martin A, Doddington G, Kamm T, Ordowski M, Przybocki M (1997) The DET curve in assessment of detection task performance. In: Fifth european conference on speech communication and technology. Citeseer es_ES
dc.description.references Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359 es_ES
dc.description.references Piatetsky-Shapiro G, Masand B (1999) Estimating campaign benefits and modeling lift. In: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, p 193 es_ES
dc.description.references Pietraszek T (2007) On the use of ROC analysis for the optimization of abstaining classifiers. Mach Learn 68(2):137–169 es_ES
dc.description.references Prati RC, Batista GE, Monard MC (2011) A survey on graphical methods for classification predictive performance evaluation. IEEE Trans Knowl Data Eng 23:1601–1618. doi: 10.1109/TKDE.2011.59 es_ES
dc.description.references Rosset S, Perlich C, Zadrozny B (2007) Ranking-based evaluation of regression models. Knowl Inf Syst 12(3):331–353 es_ES
dc.description.references Sammut C, Webb G (2011) Encyclopedia of machine learning. Encyclopedia of machine learning. Springer, New York es_ES
dc.description.references Swets JA, Dawes RM, Monahan J (2000) Better decisions through science. Sci Am 283(4):82–87 es_ES
dc.description.references Torgo L (2005) Regression error characteristic surfaces. In: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. ACM, pp 697–702 es_ES
dc.description.references Torgo L, Gama J (1996) Regression by classification. In: Advances in artificial intelligence. Springer, pp 51–60 es_ES
dc.description.references The keel-dataset repository (2002). http://www.keel.es/ es_ES
dc.description.references Yang Y, Wu X, Zhu X (2006) Mining in anticipation for concept change: proactive-reactive prediction in data streams. Data Min Knowl Discov 13(3):261–289 es_ES
dc.description.references Zillow (2013) Zillow API. http://www.zillow.com/howto/api/APIOverview.htm es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem