Paraphrase Plagiarism Identifcation with Character-level Features

Sánchez-Vega, Fernando; Villatoro-Tello, Esaú; Montes-y-Gómez, Manuel; Rosso, Paolo; Stamatatos, Efstathios; Villaseñor-Pineda, Luis

doi:10.1007/s10044-017-0674-z

Identificarse

Buscar en RiuNet

Listar

Todo RiuNet
Esta colección

Mi cuenta

Acceder

Estadísticas

Ver Estadísticas de uso

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Paraphrase Plagiarism Identifcation with Character-level Features

Mostrar el registro sencillo del ítem

Ficheros en el ítem

Nombre: Sánchez-Vega;Vill ...

Tamaño: 856.0Kb

Formato: PDF

Descripción: Versión del Autor.

Abrir

Nombre: Sa&#769nchez-Vega ...

Tamaño: 1.116Mb

Formato: Desconocido

Descripción: Versión editorial

Solicitar una copia al autor

dc.contributor.author	Sánchez-Vega, Fernando	es_ES
dc.contributor.author	Villatoro-Tello, Esaú	es_ES
dc.contributor.author	Montes-y-Gómez, Manuel	es_ES
dc.contributor.author	Rosso, Paolo	es_ES
dc.contributor.author	Stamatatos, Efstathios	es_ES
dc.contributor.author	Villaseñor-Pineda, Luis	es_ES
dc.date.accessioned	2021-01-27T04:32:44Z
dc.date.available	2021-01-27T04:32:44Z
dc.date.issued	2019-05	es_ES
dc.identifier.issn	1433-7541	es_ES
dc.identifier.uri	http://hdl.handle.net/10251/159992
dc.description.abstract	[EN] Several methods have been proposed for determining plagiarism between pairs of sentences, passages or even full documents. However, the majority of these methods fail to reliably detect paraphrase plagiarism due to the high complexity of the task, even for human beings. Paraphrase plagiarism identi cation consists in automatically recognizing document fragments that contain re-used text, which is intentionally hidden by means of some rewording practices such as semantic equivalences, discursive changes, and morphological or lexical substitutions. Our main hypothesis establishes that the original author's writing style ngerprint prevails in the plagiarized text even when paraphrases occur. Thus, in this paper we propose a novel text representation scheme that gathers both content and style characteristics of texts, represented by means of character-level features. As an additional contribution, we describe the methodology followed for the construction of an appropriate corpus for the task of paraphrase plagiarism identi cation, which represents a new valuable resource to the NLP community for future research work in this field.	es_ES
dc.description.sponsorship	This work is the result of the collaboration in the framework of the CONACYT Thematic Networks program (RedTTL Language Technologies Network) and the WIQ-EI IRSES project (Grant No. 269180) within the FP7 Marie Curie action. The first author was supported by CONACYT (Scholarship 258345/224483). The second, third, and sixth authors were partially supported by CONACyT (Project Grants 258588 and 2410). The work of the fourth author was partially supported by the SomEMBED TIN2015-71147-C2-1-P MINECO research project and by the Generalitat Valenciana under the Grant ALMAMATER (PrometeoII/2014/030).	es_ES
dc.language	Inglés	es_ES
dc.publisher	Springer-Verlag	es_ES
dc.relation.ispartof	Pattern Analysis and Applications	es_ES
dc.rights	Reserva de todos los derechos	es_ES
dc.subject	Plagiarism identification	es_ES
dc.subject	Paraphrase plagiarism	es_ES
dc.subject	Text reuse	es_ES
dc.subject	Character n-grams	es_ES
dc.subject	Stylistic representation	es_ES
dc.subject.classification	LENGUAJES Y SISTEMAS INFORMATICOS	es_ES
dc.title	Paraphrase Plagiarism Identifcation with Character-level Features	es_ES
dc.type	Artículo	es_ES
dc.identifier.doi	10.1007/s10044-017-0674-z	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/CONACyT//FC 2016-2410/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/EC/FP7/269180/EU/Web Information Quality Evaluation Initiative/
dc.relation.projectID	info:eu-repo/grantAgreement/CONACyT//258588/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/GVA//PROMETEOII%2F2014%2F030/ES/ Adaptive learning and multimodality in machine translation and text transcription/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/CONACyT//258345%2F224483/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/MINECO//TIN2015-71147-C2-1-P/ES/COMPRENSION DEL LENGUAJE EN LOS MEDIOS DE COMUNICACION SOCIAL - REPRESENTANDO CONTEXTOS DE FORMA CONTINUA/	es_ES
dc.rights.accessRights	Abierto	es_ES
dc.contributor.affiliation	Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació	es_ES
dc.description.bibliographicCitation	Sánchez-Vega, F.; Villatoro-Tello, E.; Montes-Y-Gómez, M.; Rosso, P.; Stamatatos, E.; Villaseñor-Pineda, L. (2019). Paraphrase Plagiarism Identifcation with Character-level Features. Pattern Analysis and Applications. 22(2):669-681. https://doi.org/10.1007/s10044-017-0674-z	es_ES
dc.description.accrualMethod	S	es_ES
dc.relation.publisherversion	https://doi.org/10.1007/s10044-017-0674-z	es_ES
dc.description.upvformatpinicio	669	es_ES
dc.description.upvformatpfin	681	es_ES
dc.type.version	info:eu-repo/semantics/publishedVersion	es_ES
dc.description.volume	22	es_ES
dc.description.issue	2	es_ES
dc.relation.pasarela	S\409334	es_ES
dc.contributor.funder	Generalitat Valenciana	es_ES
dc.contributor.funder	European Commission	es_ES
dc.contributor.funder	Consejo Nacional de Ciencia y Tecnología, México	es_ES
dc.contributor.funder	Ministerio de Economía y Competitividad	es_ES
dc.description.references	Barrón-Cedeño A, Rosso P (2009) On automatic plagiarism detection based on n-grams comparison. In: Proceedings of the 31th European conference on IR research on advances in information retrieval (ECIR), LNCS vol 5478, Springer, Berlin, pp 696–700	es_ES
dc.description.references	Barron-Cedeño A, Vila M, Martí MA, Rosso P (2013) Plagiarism meets paraphrasing: insights for the next generation in automatic plagiarism detection. Comput Linguist 39(4):917–947	es_ES
dc.description.references	Basile C, Benedetto D, Caglioti E, Cristadoro G, Esposti M (2009) A plagiarism detection procedure in three steps: selection, matches and “squares”. In: Proceedings of the SEPLN 2009 workshop on uncovering plagiarism, authorship and social software misuse (PAN 2009), CEUR-WS vol 502. Donostia-San Sebastian, Spain	es_ES
dc.description.references	Biggins S, Mohammed S, Oakley S (2012) University of shefield: two approaches to semantic text similarity. In: First joint conference on lexical and computational semantics (SEM at NAACL 2012), Montreal, Canada, pp 655–661	es_ES
dc.description.references	Burrows S, Potthast M, Stein B (2013) Paraphrase acquisition via crowdsourcing and machine learning. ACM Trans Intell Syst Technol 4(3):43:1–43:21. https://doi.org/10.1145/2483669.2483676	es_ES
dc.description.references	Calvo H, Segura-Olivares A, García A (2014) Dependency vs. constituent based syntactic n-grams in text similarity measures for paraphrase recognition. Computación y Sistemas 18(3):517554	es_ES
dc.description.references	Chien-Ying C, Jen-Yuan Y, Hao-Ren K (2010) Plagiarism detection using rouge and wordnet. J Comput 2(3):34–44	es_ES
dc.description.references	Chong M, Specia L, Mitkov R (2010) Using natural language processing for automatic detection of plagiarism. In: Proceedings of the 4th international plagiarism conference. Newcastle-upon-Tyne, UK	es_ES
dc.description.references	Clough P (2003) Old a new challenges in automatic plagiarism detection. In: National plagiarism advisory service, pp 391–407	es_ES
dc.description.references	Clough P, Gaizauskas R, Piao SS, Wilks Y (2002) Meter: Measuring text reuse. In: Proceedings of the 40th annual meeting of the association for computational linguistics (ACL). Philadelphia	es_ES
dc.description.references	Courtney C, Mihalcea R (2005) Measuring the semantic similarity of texts. In: Proceedings of the ACL workshop on empirical modeling of semantic equivalence and entailment (EMSEE at NAALC 2005), pp 13–18	es_ES
dc.description.references	Daelemans W (2013) Explanation in computational stylometry. In: 14th International conference on intelligent text processing and computational linguistics (CIC-Ling 2013), Lecture Notes in Computer Science LNCS, vol 7817, pp 451–462	es_ES
dc.description.references	Ehsan N, Shakery A (2016) Candidate document retrieval for cross-lingual plagiarism detection using two-level proximity information. Inf Process Manag. https://doi.org/10.1016/j.ipm.2016.04.006	es_ES
dc.description.references	Grieve J (2007) Quantitative authorship attribution: an evaluation of techniques. Lit Linguist Comput 22(3):251–270	es_ES
dc.description.references	Hartrumpf S, vor Der Brück T, Eichhorn C (2010) Semantic duplicate identification with parsing and machine learning. In: Eleventh international conference on text, speech and dialogue (TSD 2010) LNAI vol 6231, Springer, Berlin, pp 84–92. Brno, Czech Republic	es_ES
dc.description.references	Hoad TC, Zobel J (2003) Methods for identifying versioned and plagiarised documents. J Am Soc Inform Sci Technol 54:203–215	es_ES
dc.description.references	Koppel M, Schler J, Argamon S (2009) Computational methods in authorship attribution. J Am Soc Inf Sci Technol 60(1):9–26	es_ES
dc.description.references	Koppel M, Schler J, Argamon S (2011) Authorship attribution in the wild. Lang Resour Eval 45:83–94	es_ES
dc.description.references	Man PD (1983) Blindness and insight: essays in the rhetoric of contemporary criticism, 2nd ed. chap. Literature and Language: A Commentary, pp. 277–89. Routtloedge	es_ES
dc.description.references	McNamee P, Mayfield J (2004) Character n-gram tokenization for european language text retrieval. Inf Retr 7(1–2):73–97	es_ES
dc.description.references	Oberreuter G, L’Huillier G, Ríos SA, Velásquez JD (2011) Approaches for intrinsic and external plagiarism detection. In: Notebook for PAN at CLEF’11	es_ES
dc.description.references	Palkovskii Y, Belov A, Muzyka I (2011) Using wordnet-based semantic similarity measurement in external plagiarism detection. In: Notebook for PAN at CLEF’11	es_ES
dc.description.references	Potthast M, Hagen M, Gollub T, Tippmann M, Kiesel J, Rosso P, Stamatatos E, Stein B (2013) Overview of the 5th international competition on plagiarism detection. In: CLEF 2013 evaluation labs and workshop working notes papers	es_ES
dc.description.references	Ravi NR, Gupta D (2015) Efficient paragraph based chunking and download filtering for plagiarism source retrieval. In: Notebook for PAN at CLEF 2015 evaluation labs and workshop working notes papers, PAN ’15. http://www.uni-weimar.de/medien/webis/events/pan-15/pan15-papers-final/pan15-plagiarism-detection/ravi15-notebook.pdf	es_ES
dc.description.references	Sapkota U, Bethard S, Montes-y Gómez M, Solorio T (2015) Not all character n-grams are created equal: a study in authorship attribution. In: Conference of the North American chapter of the association for computational linguistics human language technologies (NAACL-HLT 2015), pp 93–102	es_ES
dc.description.references	Sapkota U, Solorio T, Montes M, Bethard S, Rosso P (2014) Cross-topic authorship attribution: will out-of-topic data help? In: Proceedings of COLING 2014, the 25th international conference on computational linguistics: technical papers, pp 1228–1237. Dublin City University and Association for Computational Linguistics. http://aclweb.org/anthology/C14-1116	es_ES
dc.description.references	Schleimer S, Wilkerson DS, Aiken A (2003) Winnowing: local algorithms for document fingerprinting. In: Proceedings of the 2003 ACM SIGMOD international conference on management of data, SIGMOD ’03, pp 76–85. ACM, New York. https://doi.org/10.1145/872757.872770	es_ES
dc.description.references	Sediyono A, Mahamud K (2008) Algorithm of the longest commonly consecutive word for plagiarism detection in text based document. In: Digital information management, ICDIM ’08, pp 253–259. IEEE. https://doi.org/10.1109/ICDIM.2008.4746827	es_ES
dc.description.references	Shivakumar N, Garcia-Molina H (1995) Scam: a copy detection mechanism for digital documents. In: Proceedings of the second annual conference on the theory and practice of digital libraries	es_ES
dc.description.references	Si A, Leong HV, Lau RWH (1997) Check: a document plagiarism detection system. In: Proceedings of ACM symposium for applied computing, SAC ’97, pp. 70–77. ACM, New York. https://doi.org/10.1145/331697.335176	es_ES
dc.description.references	Sánchez-Vega F, Villatoro-Tello E, Montes-y Gómez M, Villaseñor-Pineda L, Rosso P (2013) Determining and characterizing the reused text for plagiarism detection. Expert Syst Appl 40(5):1804–1813	es_ES
dc.description.references	Stamatatos E (2011) Plagiarism detection using stopword n-grams. J Am Soc Inf Sci Technol 62(12):2512–2527	es_ES
dc.description.references	Stamatatos E (2013) On the robustness of authorship attribution based on character n-gram features. J Law Policy 21(2):421–439	es_ES
dc.description.references	Stein B, Potthast M, Rosso P, Barrón-Cedeño A, Stamatatos E, Koppel M (2011) Fourth international workshop on uncovering plagiarism, authorship, and social software misuse. SIGIR Forum 45:45–48	es_ES
dc.description.references	Uzuner Özlem, Katz B, Nahnsen T (2005) Using syntactic information to identify plagiarism. In: Proceedings of 2nd workshop on building educational applications using NLP. Ann Arbor	es_ES
dc.description.references	Xu W, Ritter A, Dolan WB, Grishman R, Cherry C (2012) Paraphrasing for style. In: Proceedings of COLING 2012: Technical Papers, pp 2899–2914. Mumbai	es_ES
dc.description.references	Zechner M, Muhr M, Kern R, Granitzer M (2009) External and intrinsic plagiarism detection using vector space models. In: SEPLN 2009, workshop on uncovering plagiarism, authorship, and social software misuse (PAN 09), pp 45–55	es_ES

Este ítem aparece en la(s) siguiente(s) colección(ones)

Artículos, conferencias, monografías [48344]

Mostrar el registro sencillo del ítem

Paraphrase Plagiarism Identifcation with Character-level Features

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Buscar en RiuNet

Listar

Todo RiuNet

Esta colección

Mi cuenta

Estadísticas

Ayuda RiuNet

Admin. UPV

Compartir/Enviar a

Citas

Estadísticas

Paraphrase Plagiarism Identifcation with Character-level Features

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)