- -

Cross-language Plagiarism Detection over Continuous-space- and Knowledge Graph-based Representations of Language

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

Cross-language Plagiarism Detection over Continuous-space- and Knowledge Graph-based Representations of Language

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Franco-Salvador, Marc es_ES
dc.contributor.author Gupta, Parth Alokkumar es_ES
dc.contributor.author Rosso, Paolo es_ES
dc.contributor.author Banchs, Rafael es_ES
dc.date.accessioned 2017-06-07T08:27:54Z
dc.date.available 2017-06-07T08:27:54Z
dc.date.issued 2016-11-01
dc.identifier.issn 0950-7051
dc.identifier.uri http://hdl.handle.net/10251/82493
dc.description This is the author’s version of a work that was accepted for publication in Knowledge-Based Systems. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Knowledge-Based Systems 111 (2016) 87–99. DOI 10.1016/j.knosys.2016.08.004. es_ES
dc.description.abstract Cross-language (CL) plagiarism detection aims at detecting plagiarised fragments of text among documents in different languages. The main research question of this work is on whether knowledge graph representations and continuous space representations can complement to each other and improve the state-of-the-art performance in CL plagiarism detection methods. In this sense, we propose and evaluate hybrid models to assess the semantic similarity of two segments of text in different languages. The proposed hybrid models combine knowledge graph representations with continuous space representations aiming at exploiting their complementarity in capturing different aspects of cross-lingual similarity. We also present the continuous word alignment-based similarity analysis, a new model to estimate similarity between text fragments. We compare the aforementioned approaches with several state-of-the-art models in the task of CL plagiarism detection and study their performance in detecting different length and obfuscation types of plagiarism cases. We conduct experiments over Spanish-English and GermanEnglish datasets. Experimental results show that continuous representations allow the continuous word alignment-based similarity analysis model to obtain competitive results and the knowledge-based document similarity model to outperform the state-of-the-art in CL plagiarism detection. © 2016 Elsevier B.V. All rights reserved. es_ES
dc.description.sponsorship This research has been carried out in framework of the FPI-UPV pre-doctoral grant (No de registro - 3505) awarded to Parth Gupta and in the framework of the national projects DIANA-APPLICATIONS - Finding Hidden Knowledge in Texts: Applications (TIN2012-38603-C02-01), and SomEMBED: SOcial Media language understanding - EMBEDing contexts (TIN2015-71147-C2-1-P). We would like to thank Martin Potthast, Daniel Ortiz-Martinez, and Luis A. Leiva for their support and comments during this research. en_EN
dc.language Inglés es_ES
dc.publisher Elsevier es_ES
dc.relation.ispartof Knowledge-Based Systems es_ES
dc.rights Reserva de todos los derechos es_ES
dc.subject Cross-language es_ES
dc.subject Plagiarism detection es_ES
dc.subject Continuous representations es_ES
dc.subject Knowledge graphs es_ES
dc.subject Multilingual semantic network es_ES
dc.subject.classification LENGUAJES Y SISTEMAS INFORMATICOS es_ES
dc.title Cross-language Plagiarism Detection over Continuous-space- and Knowledge Graph-based Representations of Language es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.1016/j.knosys.2016.08.004
dc.relation.projectID info:eu-repo/grantAgreement/UPV//PRE-DOCTORAL GRANT%2F3505/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/MINECO//TIN2012-38603-C02-01/ES/DIANA-APPLICATIONS: FINDING HIDDEN KNOWLEDGE IN TEXTS: APPLICATIONS/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/MINECO//TIN2015-71147-C2-1-P/ES/COMPRENSION DEL LENGUAJE EN LOS MEDIOS DE COMUNICACION SOCIAL - REPRESENTANDO CONTEXTOS DE FORMA CONTINUA/ es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Escola Tècnica Superior d'Enginyeria Informàtica es_ES
dc.description.bibliographicCitation Franco-Salvador, M.; Gupta, PA.; Rosso, P.; Banchs, R. (2016). Cross-language Plagiarism Detection over Continuous-space- and Knowledge Graph-based Representations of Language. Knowledge-Based Systems. 111:87-99. https://doi.org/10.1016/j.knosys.2016.08.004 es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion http://dx.doi.org/10.1016/j.knosys.2016.08.004 es_ES
dc.description.upvformatpinicio 87 es_ES
dc.description.upvformatpfin 99 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 111 es_ES
dc.relation.senia 326671 es_ES
dc.contributor.funder Ministerio de Economía y Competitividad es_ES
dc.contributor.funder Universitat Politècnica de València es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem