- -

Bridging the Gap between Distance and Generalization

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

Bridging the Gap between Distance and Generalization

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Estruch Gregori, Vicente es_ES
dc.contributor.author Ferri Ramírez, César es_ES
dc.contributor.author José Hernández-Orallo es_ES
dc.contributor.author Ramírez Quintana, María José es_ES
dc.date.accessioned 2014-01-16T15:43:02Z
dc.date.issued 2012-11-29
dc.identifier.issn 0824-7935
dc.identifier.uri http://hdl.handle.net/10251/34946
dc.description.abstract Distance-based and generalization-based methods are two families of artificial intelligence techniques that have been successfully used over a wide range of real-world problems. In the first case, general algorithms can be applied to any data representation by just changing the distance. The metric space sets the search and learning space, which is generally instance-oriented. In the second case, models can be obtained for a given pattern language, which can be comprehensible. The generality-ordered space sets the search and learning space, which is generally model-oriented. However, the concepts of distance and generalization clash in many different ways, especially when knowledge representation is complex (e.g., structured data). This work establishes a framework where these two fields can be integrated in a consistent way. We introduce the concept of distance-based generalization, which connects all the generalized examples in such a way that all of them are reachable inside the generalization by using straight paths in the metric space. This makes the metric space and the generality-ordered space coherent (or even dual). Additionally, we also introduce a definition of minimal distance-based generalization that can be seen as the first formulation of the Minimum Description Length (MDL)/Minimum Message Length (MML) principle in terms of a distance function. We instantiate and develop the framework for the most common data representations and distances, where we show that consistent instances can be found for numerical data, nominal data, sets, lists, tuples, graphs, first-order atoms, and clauses. As a result, general learning methods that integrate the best from distance-based and generalization-based methods can be defined and adapted to any specific problem by appropriately choosing the distance, the pattern language and the generalization operator. es_ES
dc.description.sponsorship We would like to thank the anonymous reviewers for their insightful comments. This work has been partially supported by the EU (FEDER) and the Spanish MICINN, under grant TIN2010-21062-C02-02, the Spanish project "Agreement Technologies" (Consolider Ingenio CSD2007-00022) and the GVA project PROMETEO/2008/051. en_EN
dc.format.extent 41 es_ES
dc.language Inglés es_ES
dc.publisher Wiley-Blackwell es_ES
dc.relation.ispartof Computational Intelligence es_ES
dc.rights Reserva de todos los derechos es_ES
dc.subject Learning from structured data representations es_ES
dc.subject Comprehensible models es_ES
dc.subject Distance-based methods es_ES
dc.subject Generalization operators es_ES
dc.subject Minimal generalization es_ES
dc.subject.classification LENGUAJES Y SISTEMAS INFORMATICOS es_ES
dc.title Bridging the Gap between Distance and Generalization es_ES
dc.type Artículo es_ES
dc.embargo.lift 10000-01-01
dc.embargo.terms forever es_ES
dc.identifier.doi 10.1111/coin.12004
dc.relation.projectID info:eu-repo/grantAgreement/MICINN//TIN2010-21062-C02-02/ES/SWEETLOGICS-UPV/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/Generalitat Valenciana//PROMETEO08%2F2008%2F051/ES/Advances on Agreement Technologies for Computational Entities (atforce)/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/MEC//CSD2007-00022/ES/Agreement Technologies/ es_ES
dc.rights.accessRights Cerrado es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació es_ES
dc.description.bibliographicCitation Estruch Gregori, V.; Ferri Ramírez, C.; José Hernández-Orallo; Ramírez Quintana, MJ. (2012). Bridging the Gap between Distance and Generalization. Computational Intelligence. https://doi.org/10.1111/coin.12004 es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion http://dx.doi.org/10.1111/coin.12004 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.relation.senia 238198
dc.identifier.eissn 1467-8640
dc.contributor.funder Generalitat Valenciana es_ES
dc.contributor.funder Ministerio de Educación y Ciencia es_ES
dc.contributor.funder Ministerio de Ciencia e Innovación es_ES
dc.description.references Armengol , E. E. Plaza S. Ontanón 2004 Explaining similarity in CBR In ECCBR 2004 Workshop Proceedings 155 164 es_ES
dc.description.references Bargiela, A., & Pedrycz, W. (2003). Granular Computing. doi:10.1007/978-1-4615-1033-8 es_ES
dc.description.references Bunke, H. (1997). On a relation between graph edit distance and maximum common subgraph. Pattern Recognition Letters, 18(8), 689-694. doi:10.1016/s0167-8655(97)00060-3 es_ES
dc.description.references Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1), 21-27. doi:10.1109/tit.1967.1053964 es_ES
dc.description.references Develin, M. (2006). Dimensions of Tight Spans. Annals of Combinatorics, 10(1), 53-61. doi:10.1007/s00026-006-0273-y es_ES
dc.description.references Domingos, P. (1996). Unifying instance-based and rule-based induction. Machine Learning, 24(2), 141-168. doi:10.1007/bf00058656 es_ES
dc.description.references Driessens, K., & Džeroski, S. (2005). Combining model-based and instance-based learning for first order regression. Proceedings of the 22nd international conference on Machine learning - ICML ’05. doi:10.1145/1102351.1102376 es_ES
dc.description.references Eiter, T., & Mannila, H. (1997). Distance measures for point sets and their computation. Acta Informatica, 34(2), 109-133. doi:10.1007/s002360050075 es_ES
dc.description.references Estruch , V. 2008 Bridging the gap between distance and generalisation: Symbolic learning in metric spaces Ph. D. Thesis http://www.dsic.upv.es/~flip/papers/thesisvestruch.pdf es_ES
dc.description.references Estruch , V. C. Ferri J. Hernández-Orallo M. Ramírez-Quintana 2010 Generalisation operators for lists embedded in a metric space In Approaches and Applications of Inductive Programming, Third International Workshop, AAIP 2009 5812 117 139 es_ES
dc.description.references Estruch , V. C. Ferri J. Hernández-Orallo M. J. Ramírez-Quintana 2005 Distance based generalisation In the 15th International Conference on Inductive Logic Programming, Volume 3625 of LNCS 87 102 es_ES
dc.description.references Estruch , V. C. Ferri J. Hernández-Orallo M. J. Ramírez-Quintana 2006a Minimal distance-based generalisation operators for first-order objects In the 16th International Conference on Inductive Logic Programming 169 183 es_ES
dc.description.references Estruch, V., Ferri, C., Hernández-Orallo, J., & Ramírez-Quintana, M. J. (2006). Web Categorisation Using Distance-Based Decision Trees. Electronic Notes in Theoretical Computer Science, 157(2), 35-40. doi:10.1016/j.entcs.2005.12.043 es_ES
dc.description.references Finnie, G., & Sun, Z. (2002). Similarity and metrics in case-based reasoning. International Journal of Intelligent Systems, 17(3), 273-287. doi:10.1002/int.10021 es_ES
dc.description.references Frank , A. A. Asuncion 2010 UCI machine learning repository http://archive.ics.uci.edu/ml es_ES
dc.description.references Funes, A., Ferri, C., Hernández-Orallo, J., & Ramírez-Quintana, M. J. (2009). An Instantiation of Hierarchical Distance-Based Conceptual Clustering for Propositional Learning. Lecture Notes in Computer Science, 637-646. doi:10.1007/978-3-642-01307-2_63 es_ES
dc.description.references Gärtner, T., Lloyd, J. W., & Flach, P. A. (2004). Kernels and Distances for Structured Data. Machine Learning, 57(3), 205-232. doi:10.1023/b:mach.0000039777.23772.30 es_ES
dc.description.references Gao , B 2006 Hyper-rectangle-based discriminative data generalization and applications in data mining Ph. D. Thesis Simon Frasier University es_ES
dc.description.references Golding , A. P. Rosenbloom 1991 Improving rule-based systems through case-based reasoning In National Conference on Artificial Intelligence 22 27 es_ES
dc.description.references Hahn, U., Chater, N., & Richardson, L. B. (2003). Similarity as transformation. Cognition, 87(1), 1-32. doi:10.1016/s0010-0277(02)00184-1 es_ES
dc.description.references Hu , C. 2008 Interval rule matrices for decision making In Knowledge Processing with Interval and Soft Computing, Chapter 6 Edited by Springer 135 146 es_ES
dc.description.references Juszczak, P., Tax, D. M. J., Pe¸kalska, E., & Duin, R. P. W. (2009). Minimum spanning tree based one-class classifier. Neurocomputing, 72(7-9), 1859-1869. doi:10.1016/j.neucom.2008.05.003 es_ES
dc.description.references Kearfott , R. C. Hu 2008 Fundamentals of interval computing In Knowledge Processing with Interval and Soft Computing, Chapter 1 Edited by Spinger 1 12 es_ES
dc.description.references Muggleton, S. (1999). Inductive Logic Programming: Issues, results and the challenge of Learning Language in Logic. Artificial Intelligence, 114(1-2), 283-296. doi:10.1016/s0004-3702(99)00067-3 es_ES
dc.description.references Piramuthu, S., & Sikora, R. T. (2009). Iterative feature construction for improving inductive learning algorithms. Expert Systems with Applications, 36(2), 3401-3406. doi:10.1016/j.eswa.2008.02.010 es_ES
dc.description.references De Raedt, L., & Ramon, J. (2009). Deriving distance metrics from generality relations. Pattern Recognition Letters, 30(3), 187-191. doi:10.1016/j.patrec.2008.09.007 es_ES
dc.description.references Ramon , J. M. Bruynooghe 1998 A framework for defining distances between first-order logic objects In Proceedings of the International Conference on Inductive Logic Programming, Volume, 1446 of LNCS 271 280 es_ES
dc.description.references Rissanen, J. (1999). Hypothesis Selection and Testing by the MDL Principle. The Computer Journal, 42(4), 260-269. doi:10.1093/comjnl/42.4.260 es_ES
dc.description.references Salzberg, S. (1991). A nearest hyperrectangle learning method. Machine Learning, 6(3), 251-276. doi:10.1007/bf00114779 es_ES
dc.description.references Stanfill, C., & Waltz, D. (1986). Toward memory-based reasoning. Communications of the ACM, 29(12), 1213-1228. doi:10.1145/7902.7906 es_ES
dc.description.references Vapnik, V. N., & Chervonenkis, A. Y. (1971). On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities. Theory of Probability & Its Applications, 16(2), 264-280. doi:10.1137/1116025 es_ES
dc.description.references Wallace, C. S. (1999). Minimum Message Length and Kolmogorov Complexity. The Computer Journal, 42(4), 270-283. doi:10.1093/comjnl/42.4.270 es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem