- -

Semantically-informed distance and similarity measures for paraphrase plagiarism identification

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

Semantically-informed distance and similarity measures for paraphrase plagiarism identification

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Álvarez Carmona, M.A. es_ES
dc.contributor.author Franco-Salvador, Marc es_ES
dc.contributor.author Villatoro-Tello, Esaú es_ES
dc.contributor.author Montes Gomez, Manuel es_ES
dc.contributor.author Rosso, Paolo es_ES
dc.contributor.author Villaseñor Pineda, Luis es_ES
dc.date.accessioned 2020-06-13T03:32:38Z
dc.date.available 2020-06-13T03:32:38Z
dc.date.issued 2018-05-24 es_ES
dc.identifier.issn 1064-1246 es_ES
dc.identifier.uri http://hdl.handle.net/10251/146280
dc.description.abstract [EN] Paraphrase plagiarism identification represents a very complex task given that plagiarized texts are intentionally modified through several rewording techniques. Accordingly, this paper introduces two new measures for evaluating the relatedness of two given texts: a semantically-informed similarity measure and a semantically-informed edit distance. Both measures are able to extract semantic information from either an external resource or a distributed representation of words, resulting in informative features for training a supervised classifier for detecting paraphrase plagiarism. Obtained results indicate that the proposed metrics are consistently good in detecting different types of paraphrase plagiarism. In addition, results are very competitive against state-of-the art methods having the advantage of representing a much more simple but equally effective solution. es_ES
dc.description.sponsorship This work was partially supported by CONACYT under scholarship 401887, project grants 257383, 258588 and 2016-01-2410 and under the Thematic Networks program (Language Technologies Thematic Network project 281795). The work of the fourth author was partially supported by the SomEMBED TIN2015-71147-C2-1-P MINECO research project and by the Generalitat Valenciana under the grant ALMAMATER (Prometeo II/2014/030). es_ES
dc.language Inglés es_ES
dc.publisher IOS Press es_ES
dc.relation.ispartof Journal of Intelligent & Fuzzy Systems es_ES
dc.rights Reserva de todos los derechos es_ES
dc.subject Plagiarism identification es_ES
dc.subject Paraphrase plagiarism es_ES
dc.subject Semantic similarity es_ES
dc.subject Edit distance es_ES
dc.subject Word2vec representation es_ES
dc.subject.classification LENGUAJES Y SISTEMAS INFORMATICOS es_ES
dc.title Semantically-informed distance and similarity measures for paraphrase plagiarism identification es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.3233/JIFS-169483 es_ES
dc.relation.projectID info:eu-repo/grantAgreement/CONACyT//401887/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/CONACyT//257383/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/CONACyT//258588/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/CONACyT//2016-01-2410/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/CONACyT//281795/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/GVA//PROMETEOII%2F2014%2F030/ES/ Adaptive learning and multimodality in machine translation and text transcription/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/MINECO//TIN2015-71147-C2-1-P/ES/COMPRENSION DEL LENGUAJE EN LOS MEDIOS DE COMUNICACION SOCIAL - REPRESENTANDO CONTEXTOS DE FORMA CONTINUA/ es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació es_ES
dc.description.bibliographicCitation Álvarez Carmona, M.; Franco-Salvador, M.; Villatoro-Tello, E.; Montes Gomez, M.; Rosso, P.; Villaseñor Pineda, L. (2018). Semantically-informed distance and similarity measures for paraphrase plagiarism identification. Journal of Intelligent & Fuzzy Systems. 34(5):2983-2990. https://doi.org/10.3233/JIFS-169483 es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion https://doi.org/10.3233/JIFS-169483 es_ES
dc.description.upvformatpinicio 2983 es_ES
dc.description.upvformatpfin 2990 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 34 es_ES
dc.description.issue 5 es_ES
dc.relation.pasarela S\384155 es_ES
dc.contributor.funder Ministerio de Economía y Competitividad es_ES
dc.contributor.funder Generalitat Valenciana es_ES
dc.contributor.funder Consejo Nacional de Ciencia y Tecnología, México es_ES
dc.description.references Abdi, A., Idris, N., Alguliyev, R. M., & Aliguliyev, R. M. (2015). PDLK: Plagiarism detection using linguistic knowledge. Expert Systems with Applications, 42(22), 8936-8946. doi:10.1016/j.eswa.2015.07.048 es_ES
dc.description.references Barrón-Cedeño, A., Vila, M., Martí, M., & Rosso, P. (2013). Plagiarism Meets Paraphrasing: Insights for the Next Generation in Automatic Plagiarism Detection. Computational Linguistics, 39(4), 917-947. doi:10.1162/coli_a_00153 es_ES
dc.description.references Biggins S. , Mohammed S. and Oakley S. , University of shefield: Two approaches to semantic text similarity, In First Joint Conference on Lexical and Computational Semantics (SEM at NAACL 2012), Montreal, Canada, 2012, pp. 655–661. es_ES
dc.description.references Chatterjee K. , Henzinger T.A. , Ibsen-Jensen R. and Otop J. , Edit distance for pushdown automata. arXiv preprint arXiv:1504.08259, 2015. es_ES
dc.description.references Cheng J. and Kartsaklis D. , Syntax-aware multi-sense word embeddings for deep compositional models of meaning, In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015 pp. 1531–1542. es_ES
dc.description.references Courtney C. and Mihalcea R. , Measuring the semantic similarity of texts, In Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment (EMSEE at NAALC 2005), 2005, pp. 13–18. es_ES
dc.description.references Dolan W.B. and Brockett C. , Automatically constructing a corpus of sentential paraphrases, In Proc of IWP, 2005. es_ES
dc.description.references H.Gomaa, W., & A. Fahmy, A. (2013). A Survey of Text Similarity Approaches. International Journal of Computer Applications, 68(13), 13-18. doi:10.5120/11638-7118 es_ES
dc.description.references Hoad, T. C., & Zobel, J. (2003). Methods for identifying versioned and plagiarized documents. Journal of the American Society for Information Science and Technology, 54(3), 203-215. doi:10.1002/asi.10170 es_ES
dc.description.references Kim S. , Wilbur W.J. and Lu Z. , Bridging the gap:Asemantic similarity measure between queries and documents., arXiv preprint arXiv:1608.01972, 2016. es_ES
dc.description.references Lukashenko R. , Graudina V. and Grundspenkis J. , Computerbased plagiarism detection methods and tools: An overview, In Proceedings of the 2007 International Conference on Computer Systems and Technologies, 2007, p. 40 ACM. es_ES
dc.description.references Miller, G. A. (1995). WordNet. Communications of the ACM, 38(11), 39-41. doi:10.1145/219717.219748 es_ES
dc.description.references Palkovskii Y. , Belov A. and Muzyka I. , Using wordnet-based semantic similarity measurement in external plagiarism detection, In Notebook for PAN at CLEF’11, 2011. es_ES
dc.description.references Pandey, A., Kaur, M., & Goyal, P. (2015). The menace of plagiarism: How to detect and curb it. 2015 4th International Symposium on Emerging Trends and Technologies in Libraries and Information Services. doi:10.1109/ettlis.2015.7048213 es_ES
dc.description.references Stamatatos, E. (2011). Plagiarism detection using stopword n-grams. Journal of the American Society for Information Science and Technology, 62(12), 2512-2527. doi:10.1002/asi.21630 es_ES
dc.description.references Wu Z. and Palmer M. , Verbs semantics and lexical selection, In Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, ACL ’94, 1994, Stroudsburg, PA, USA, pp. 133–138. Association for Computational Linguistic. es_ES
dc.description.references Zechner M. , Muhr M. , Kern R. and Granitzer M. , External and intrinsic plagiarism detection using vector space models, In CEUR Workshop Proceedings, vol. 502, 2009, pp. 47–55. es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem