- -

On the influence of model fragment properties on a machine learning-based approach for feature location

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

On the influence of model fragment properties on a machine learning-based approach for feature location

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Ballarin, Manuel es_ES
dc.contributor.author Marcén, Ana C. es_ES
dc.contributor.author Pelechano Ferragud, Vicente es_ES
dc.contributor.author Cetina, Carlos es_ES
dc.date.accessioned 2021-09-02T03:31:25Z
dc.date.available 2021-09-02T03:31:25Z
dc.date.issued 2021-01 es_ES
dc.identifier.issn 0950-5849 es_ES
dc.identifier.uri http://hdl.handle.net/10251/171217
dc.description.abstract [EN] Context: Leveraging machine learning techniques to address feature location on models has been gaining attention. Machine learning techniques empower software product companies to take advantage of the knowledge and the experience to improve the performance of the feature location process. Most of the machine learning-based works for feature location on models report the machine learning techniques and the tuning parameters in detail. However, these works focus on the size and the distribution of the data sets, neglecting the properties of their contents. Objective: In this paper, we analyze the influence of three model fragment properties (density, multiplicity, and dispersion) on a machine learning-based approach for feature location. Method: The analysis of these properties is based on an industrial case provided by CAF, a worldwide provider of railway solutions. The test cases were evaluated through a machine learning technique that uses different subsets of a knowledge base to learn how to locate unknown features. Results: Results show that the density and dispersion properties have a direct impact on the results. In our case study, the model fragments with extra-small density values achieve results with up to 43% more precision, 41% more recall, 42% more F-measure, and 0.53 more Matthews Correlation Coefficient (MCC) than the model fragments with other density values. On the other hand, the model fragments with extra-small and small dispersion values achieve results with up to 53% more precision, 52% more recall, 52% more F-measure, and 0.57 more MCC than the model fragments with other dispersion values. Conclusions: The analysis of the results shows that both density and dispersion properties significantly influence the results. These results can serve not only to improve the reports by means of the model fragment properties, but also to be able to compare machine learning-based feature location approaches fairly improving the feature location results. es_ES
dc.description.sponsorship This work has been partially supported by the Ministry of Economy and Competitiveness (MINECO), Spain through the Spanish National R+D+i Plan and ERDF funds under the Project ALPS (RTI2018096411-B-I00). We also thank the ITEA3 15010 REVaMP2 Project and ACIF/2018/171. es_ES
dc.language Inglés es_ES
dc.publisher Elsevier es_ES
dc.relation.ispartof Information and Software Technology es_ES
dc.rights Reconocimiento - No comercial - Sin obra derivada (by-nc-nd) es_ES
dc.subject Model fragment location es_ES
dc.subject Feature location es_ES
dc.subject Machine learning es_ES
dc.subject Learning to rank es_ES
dc.subject.classification LENGUAJES Y SISTEMAS INFORMATICOS es_ES
dc.title On the influence of model fragment properties on a machine learning-based approach for feature location es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.1016/j.infsof.2020.106430 es_ES
dc.relation.projectID info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/RTI2018-096411-B-I00/ES/ASISTENTES EVOLUTIVOS INTELIGENTES PARA INICIAR LINEAS DE PRODUCTO SOFTWARE/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/GVA//ACIF%2F2018%2F171/ es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació es_ES
dc.description.bibliographicCitation Ballarin, M.; Marcén, AC.; Pelechano Ferragud, V.; Cetina, C. (2021). On the influence of model fragment properties on a machine learning-based approach for feature location. Information and Software Technology. 129:1-19. https://doi.org/10.1016/j.infsof.2020.106430 es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion https://doi.org/10.1016/j.infsof.2020.106430 es_ES
dc.description.upvformatpinicio 1 es_ES
dc.description.upvformatpfin 19 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 129 es_ES
dc.relation.pasarela S\418275 es_ES
dc.contributor.funder Generalitat Valenciana es_ES
dc.contributor.funder Agencia Estatal de Investigación es_ES
dc.contributor.funder European Regional Development Fund es_ES
dc.description.references Marcén, A. C., Lapeña, R., Pastor, Ó., & Cetina, C. (2020). Traceability Link Recovery between Requirements and Models using an Evolutionary Algorithm Guided by a Learning to Rank Algorithm: Train control and management case. Journal of Systems and Software, 163, 110519. doi:10.1016/j.jss.2020.110519 es_ES
dc.description.references Pérez, F., Font, J., Arcega, L., & Cetina, C. (2019). Collaborative feature location in models through automatic query expansion. Automated Software Engineering, 26(1), 161-202. doi:10.1007/s10515-019-00251-9 es_ES
dc.description.references ZHUANG, X., ENGEL, B. A., LOZANO-GARCIA, D. F., FERNÁNDEZ, R. N., & JOHANNSEN, C. J. (1994). Optimization of training data required for neuro-classification. International Journal of Remote Sensing, 15(16), 3271-3277. doi:10.1080/01431169408954326 es_ES
dc.description.references Foody, G. M., & Mathur, A. (2004). A relative evaluation of multiclass image classification by support vector machines. IEEE Transactions on Geoscience and Remote Sensing, 42(6), 1335-1343. doi:10.1109/tgrs.2004.827257 es_ES
dc.description.references Foody, G. M., Mathur, A., Sanchez-Hernandez, C., & Boyd, D. S. (2006). Training set size requirements for the classification of a specific class. Remote Sensing of Environment, 104(1), 1-14. doi:10.1016/j.rse.2006.03.004 es_ES
dc.description.references Weiss, G. M., & Provost, F. (2003). Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction. Journal of Artificial Intelligence Research, 19, 315-354. doi:10.1613/jair.1199 es_ES
dc.description.references Buda, M., Maki, A., & Mazurowski, M. A. (2018). A systematic study of the class imbalance problem in convolutional neural networks. Neural Networks, 106, 249-259. doi:10.1016/j.neunet.2018.07.011 es_ES
dc.description.references Arcuri, A., & Fraser, G. (2013). Parameter tuning or default values? An empirical investigation in search-based software engineering. Empirical Software Engineering, 18(3), 594-623. doi:10.1007/s10664-013-9249-9 es_ES
dc.description.references Lapeña, R., Font, J., Pastor, Ó., & Cetina, C. (2017). Analyzing the impact of natural language processing over feature location in models. ACM SIGPLAN Notices, 52(12), 63-76. doi:10.1145/3170492.3136052 es_ES
dc.description.references Shabtai, A., Moskovitch, R., Elovici, Y., & Glezer, C. (2009). Detection of malicious code by applying machine learning classifiers on static features: A state-of-the-art survey. Information Security Technical Report, 14(1), 16-29. doi:10.1016/j.istr.2009.03.003 es_ES
dc.description.references Song, Q., Jia, Z., Shepperd, M., Ying, S., & Liu, J. (2011). A General Software Defect-Proneness Prediction Framework. IEEE Transactions on Software Engineering, 37(3), 356-370. doi:10.1109/tse.2010.90 es_ES
dc.description.references Cao, Z., Tian, Y., Le, T.-D. B., & Lo, D. (2018). Rule-based specification mining leveraging learning to rank. Automated Software Engineering, 25(3), 501-530. doi:10.1007/s10515-018-0231-z es_ES
dc.description.references Arcuri, A., & Briand, L. (2012). A Hitchhiker’s guide to statistical tests for assessing randomized algorithms in software engineering. Software Testing, Verification and Reliability, 24(3), 219-250. doi:10.1002/stvr.1486 es_ES
dc.description.references García, S., Fernández, A., Luengo, J., & Herrera, F. (2010). Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Information Sciences, 180(10), 2044-2064. doi:10.1016/j.ins.2009.12.010 es_ES
dc.description.references Falessi, D., Di Penta, M., Canfora, G., & Cantone, G. (2016). Estimating the number of remaining links in traceability recovery. Empirical Software Engineering, 22(3), 996-1027. doi:10.1007/s10664-016-9460-6 es_ES
dc.description.references Jialei Wang, Peilin Zhao, Hoi, S. C. H., & Rong Jin. (2014). Online Feature Selection and Its Applications. IEEE Transactions on Knowledge and Data Engineering, 26(3), 698-710. doi:10.1109/tkde.2013.32 es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem