- -

Plagiarism meets paraphrasing: insights for the next generation in automatic plagiarism detection

RiuNet: Repositorio Institucional de la Universidad Politécnica de Valencia

Compartir/Enviar a

Citas

Estadísticas

  • Estadisticas de Uso

Plagiarism meets paraphrasing: insights for the next generation in automatic plagiarism detection

Mostrar el registro sencillo del ítem

Ficheros en el ítem

dc.contributor.author Barrón Cedeño, Luis Alberto es_ES
dc.contributor.author Vila, Marta es_ES
dc.contributor.author Martí, M. Antònia es_ES
dc.contributor.author Rosso, Paolo es_ES
dc.date.accessioned 2015-01-23T13:24:08Z
dc.date.available 2015-01-23T13:24:08Z
dc.date.issued 2013-12
dc.identifier.issn 0891-2017
dc.identifier.uri http://hdl.handle.net/10251/46317
dc.description.abstract [EN] Although paraphrasing is the linguistic mechanism underlying many plagiarism cases, little attention has been paid to its analysis in the framework of automatic plagiarism detection. Therefore, state-of-the-art plagiarism detectors find it difficult to detect cases of paraphrase plagiarism. In this article, we analyze the relationship between paraphrasing and plagiarism, paying special attention to which paraphrase phenomena underlie acts of plagiarism and which of them are detected by plagiarism detection systems. With this aim in mind, we created the P4P corpus, a new resource that uses a paraphrase typology to annotate a subset of the PAN-PC-10 corpus for automatic plagiarism detection. The results of the Second International Competition on Plagiarism Detection were analyzed in the light of this annotation.The presented experiments show that (i) more complex paraphrase phenomena and a high density of paraphrase mechanisms make plagiarism detection more difficult, (ii) lexical substitutions are the paraphrase mechanisms used the most when plagiarizing, and (iii) paraphrase mechanisms tend to shorten the plagiarized text. For the first time, the paraphrase mechanisms behind plagiarism have been analyzed, providing critical insights for the improvement of automatic plagiarism detection systems. es_ES
dc.description.sponsorship We would like to thank the people who participated in the annotation of the P4P corpus, Horacio Rodriguez for his helpful advice as experienced researcher, and the reviewers of this contribution for their valuable comments to improve this article. This research work was partially carried out during the tenure of an ERCIM "Alain Bensoussan" Fellowship Programme. The research leading to these results received funding from the EU FP7 Programme 2007-2013 (grant no. 246016), the MICINN projects TEXT-ENTERPRISE 2.0 and TEXT-KNOWLEDGE 2.0 (TIN2009-13391), the EC WIQ-EI IRSES project (grant no. 269180), and the FP7 Marie Curie People Programme. The research work of A. Barron-Cedeno and M. Vila was financed by the CONACyT-Mexico 192021 grant and the MECD-Spain FPU AP2008-02185 grant, respectively. The research work of A. Barron-Cedeno was partially done in the framework of his Ph.D. at the Universitat Politecnica de Valencia. en_EN
dc.language Inglés es_ES
dc.publisher MIT Press es_ES
dc.relation.ispartof Computational Linguistics es_ES
dc.rights Reconocimiento (by) es_ES
dc.subject.classification LENGUAJES Y SISTEMAS INFORMATICOS es_ES
dc.title Plagiarism meets paraphrasing: insights for the next generation in automatic plagiarism detection es_ES
dc.type Artículo es_ES
dc.identifier.doi 10.1162/COLI_a_00153
dc.relation.projectID info:eu-repo/grantAgreement/EC/FP7/246016/EU/Alain Bensoussan Career Development Enhancer/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/MICINN//TIN2009-13391-C04-04/ES/Text-Knowledge 2.0: El Modelado Del Conocimiento Ante Los Nuevos Retos De La Comunicacion Digital/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/CONACyT//192021/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/EC/FP7/269180/EU/Web Information Quality Evaluation Initiative/
dc.relation.projectID info:eu-repo/grantAgreement/MECD//AP2008-02185/ES/AP2008-02185/ es_ES
dc.relation.projectID info:eu-repo/grantAgreement/MICINN//TIN2009-13391-C04-03/ES/Text-Enterprise 2.0: Tecnicas De Comprension De Textos Aplicadas A Las Necesidades De La Empresa 2.0/ es_ES
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació es_ES
dc.description.bibliographicCitation Barrón Cedeño, LA.; Vila, M.; Martí, MA.; Rosso, P. (2013). Plagiarism meets paraphrasing: insights for the next generation in automatic plagiarism detection. Computational Linguistics. 39(4):917-947. https://doi.org/10.1162/COLI_a_00153 es_ES
dc.description.accrualMethod S es_ES
dc.relation.publisherversion http://dx.doi.org/10.1162/COLI_a_00153 es_ES
dc.description.upvformatpinicio 917 es_ES
dc.description.upvformatpfin 947 es_ES
dc.type.version info:eu-repo/semantics/publishedVersion es_ES
dc.description.volume 39 es_ES
dc.description.issue 4 es_ES
dc.relation.senia 255757
dc.contributor.funder European Commission
dc.contributor.funder Ministerio de Ciencia e Innovación
dc.contributor.funder Ministerio de Educación, Cultura y Deporte
dc.contributor.funder Consejo Nacional de Ciencia y Tecnología, México
dc.description.references Barzilay, Regina. 2003. Information Fusion for Multidocument Summarization: Paraphrasing and Generation. Ph.D. thesis, Columbia University, New York. es_ES
dc.description.references Barzilay, R., & Lee, L. (2003). Learning to paraphrase. Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - NAACL ’03. doi:10.3115/1073445.1073448 es_ES
dc.description.references Barzilay, Regina and Kathleen R. McKeown. 2001. Extracting paraphrases from a parallel corpus. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics (ACL 2001), pages 50–57, Toulouse. es_ES
dc.description.references Barzilay, R., McKeown, K. R., & Elhadad, M. (1999). Information fusion in the context of multi-document summarization. Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics -. doi:10.3115/1034678.1034760 es_ES
dc.description.references Bhagat, Rahul. 2009. Learning Paraphrases from Text. Ph.D. thesis, University of Southern California, Los Angeles. es_ES
dc.description.references Cheung, Mei Ling Lisa. 2009. Merging Corpus Linguistics and Collaborative Knowledge Construction. Ph.D. thesis, University of Birmingham, Birmingham. es_ES
dc.description.references Cohn, T., Callison-Burch, C., & Lapata, M. (2008). Constructing Corpora for the Development and Evaluation of Paraphrase Systems. Computational Linguistics, 34(4), 597-614. doi:10.1162/coli.08-003-r1-07-044 es_ES
dc.description.references Dras, Mark. 1999. Tree Adjoining Grammar and the Reluctant Paraphrasing of Text. Ph.D. thesis, Macquarie University, Sydney. es_ES
dc.description.references Faigley, L., & Witte, S. (1981). Analyzing Revision. College Composition and Communication, 32(4), 400. doi:10.2307/356602 es_ES
dc.description.references Fujita, Atsushi. 2005. Automatic Generation of Syntactically Well-formed and Semantically Appropriate Paraphrases. Ph.D. thesis, Nara Institute of Science and Technology, Nara. es_ES
dc.description.references Grozea, C., & Popescu, M. (2010). Who’s the Thief? Automatic Detection of the Direction of Plagiarism. Lecture Notes in Computer Science, 700-710. doi:10.1007/978-3-642-12116-6_59 es_ES
dc.description.references GÜLICH, E. (2003). Conversational Techniques Used in Transferring Knowledge between Medical Experts and Non-experts. Discourse Studies, 5(2), 235-263. doi:10.1177/1461445603005002005 es_ES
dc.description.references Harris, Z. S. (1957). Co-Occurrence and Transformation in Linguistic Structure. Language, 33(3), 283. doi:10.2307/411155 es_ES
dc.description.references KETCHEN Jr., D. J., & SHOOK, C. L. (1996). THE APPLICATION OF CLUSTER ANALYSIS IN STRATEGIC MANAGEMENT RESEARCH: AN ANALYSIS AND CRITIQUE. Strategic Management Journal, 17(6), 441-458. doi:10.1002/(sici)1097-0266(199606)17:6<441::aid-smj819>3.0.co;2-g es_ES
dc.description.references McCarthy, D., & Navigli, R. (2009). The English lexical substitution task. Language Resources and Evaluation, 43(2), 139-159. doi:10.1007/s10579-009-9084-1 es_ES
dc.description.references Recasens, M., & Vila, M. (2010). On Paraphrase and Coreference. Computational Linguistics, 36(4), 639-647. doi:10.1162/coli_a_00014 es_ES
dc.description.references Shimohata, Mitsuo. 2004. Acquiring Paraphrases from Corpora and Its Application to Machine Translation. Ph.D. thesis, Nara Institute of Science and Technology, Nara. es_ES
dc.description.references Stein, B., Potthast, M., Rosso, P., Barrón-Cedeño, A., Stamatatos, E., & Koppel, M. (2011). Fourth international workshop on uncovering plagiarism, authorship, and social software misuse. ACM SIGIR Forum, 45(1), 45. doi:10.1145/1988852.1988860 es_ES


Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem