Mostrar el registro sencillo del ítem
dc.contributor.author | Barrón Cedeño, Luis Alberto | es_ES |
dc.contributor.author | Vila, Marta | es_ES |
dc.contributor.author | Martí, M. Antònia | es_ES |
dc.contributor.author | Rosso, Paolo | es_ES |
dc.date.accessioned | 2015-01-23T13:24:08Z | |
dc.date.available | 2015-01-23T13:24:08Z | |
dc.date.issued | 2013-12 | |
dc.identifier.issn | 0891-2017 | |
dc.identifier.uri | http://hdl.handle.net/10251/46317 | |
dc.description.abstract | [EN] Although paraphrasing is the linguistic mechanism underlying many plagiarism cases, little attention has been paid to its analysis in the framework of automatic plagiarism detection. Therefore, state-of-the-art plagiarism detectors find it difficult to detect cases of paraphrase plagiarism. In this article, we analyze the relationship between paraphrasing and plagiarism, paying special attention to which paraphrase phenomena underlie acts of plagiarism and which of them are detected by plagiarism detection systems. With this aim in mind, we created the P4P corpus, a new resource that uses a paraphrase typology to annotate a subset of the PAN-PC-10 corpus for automatic plagiarism detection. The results of the Second International Competition on Plagiarism Detection were analyzed in the light of this annotation.The presented experiments show that (i) more complex paraphrase phenomena and a high density of paraphrase mechanisms make plagiarism detection more difficult, (ii) lexical substitutions are the paraphrase mechanisms used the most when plagiarizing, and (iii) paraphrase mechanisms tend to shorten the plagiarized text. For the first time, the paraphrase mechanisms behind plagiarism have been analyzed, providing critical insights for the improvement of automatic plagiarism detection systems. | es_ES |
dc.description.sponsorship | We would like to thank the people who participated in the annotation of the P4P corpus, Horacio Rodriguez for his helpful advice as experienced researcher, and the reviewers of this contribution for their valuable comments to improve this article. This research work was partially carried out during the tenure of an ERCIM "Alain Bensoussan" Fellowship Programme. The research leading to these results received funding from the EU FP7 Programme 2007-2013 (grant no. 246016), the MICINN projects TEXT-ENTERPRISE 2.0 and TEXT-KNOWLEDGE 2.0 (TIN2009-13391), the EC WIQ-EI IRSES project (grant no. 269180), and the FP7 Marie Curie People Programme. The research work of A. Barron-Cedeno and M. Vila was financed by the CONACyT-Mexico 192021 grant and the MECD-Spain FPU AP2008-02185 grant, respectively. The research work of A. Barron-Cedeno was partially done in the framework of his Ph.D. at the Universitat Politecnica de Valencia. | en_EN |
dc.language | Inglés | es_ES |
dc.publisher | MIT Press | es_ES |
dc.relation.ispartof | Computational Linguistics | es_ES |
dc.rights | Reconocimiento (by) | es_ES |
dc.subject.classification | LENGUAJES Y SISTEMAS INFORMATICOS | es_ES |
dc.title | Plagiarism meets paraphrasing: insights for the next generation in automatic plagiarism detection | es_ES |
dc.type | Artículo | es_ES |
dc.identifier.doi | 10.1162/COLI_a_00153 | |
dc.relation.projectID | info:eu-repo/grantAgreement/EC/FP7/246016/EU/Alain Bensoussan Career Development Enhancer/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/MICINN//TIN2009-13391-C04-04/ES/Text-Knowledge 2.0: El Modelado Del Conocimiento Ante Los Nuevos Retos De La Comunicacion Digital/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/CONACyT//192021/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/EC/FP7/269180/EU/Web Information Quality Evaluation Initiative/ | |
dc.relation.projectID | info:eu-repo/grantAgreement/MECD//AP2008-02185/ES/AP2008-02185/ | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/MICINN//TIN2009-13391-C04-03/ES/Text-Enterprise 2.0: Tecnicas De Comprension De Textos Aplicadas A Las Necesidades De La Empresa 2.0/ | es_ES |
dc.rights.accessRights | Abierto | es_ES |
dc.contributor.affiliation | Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació | es_ES |
dc.description.bibliographicCitation | Barrón Cedeño, LA.; Vila, M.; Martí, MA.; Rosso, P. (2013). Plagiarism meets paraphrasing: insights for the next generation in automatic plagiarism detection. Computational Linguistics. 39(4):917-947. https://doi.org/10.1162/COLI_a_00153 | es_ES |
dc.description.accrualMethod | S | es_ES |
dc.relation.publisherversion | http://dx.doi.org/10.1162/COLI_a_00153 | es_ES |
dc.description.upvformatpinicio | 917 | es_ES |
dc.description.upvformatpfin | 947 | es_ES |
dc.type.version | info:eu-repo/semantics/publishedVersion | es_ES |
dc.description.volume | 39 | es_ES |
dc.description.issue | 4 | es_ES |
dc.relation.senia | 255757 | |
dc.contributor.funder | European Commission | |
dc.contributor.funder | Ministerio de Ciencia e Innovación | |
dc.contributor.funder | Ministerio de Educación, Cultura y Deporte | |
dc.contributor.funder | Consejo Nacional de Ciencia y Tecnología, México | |
dc.description.references | Barzilay, Regina. 2003. Information Fusion for Multidocument Summarization: Paraphrasing and Generation. Ph.D. thesis, Columbia University, New York. | es_ES |
dc.description.references | Barzilay, R., & Lee, L. (2003). Learning to paraphrase. Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - NAACL ’03. doi:10.3115/1073445.1073448 | es_ES |
dc.description.references | Barzilay, Regina and Kathleen R. McKeown. 2001. Extracting paraphrases from a parallel corpus. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics (ACL 2001), pages 50–57, Toulouse. | es_ES |
dc.description.references | Barzilay, R., McKeown, K. R., & Elhadad, M. (1999). Information fusion in the context of multi-document summarization. Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics -. doi:10.3115/1034678.1034760 | es_ES |
dc.description.references | Bhagat, Rahul. 2009. Learning Paraphrases from Text. Ph.D. thesis, University of Southern California, Los Angeles. | es_ES |
dc.description.references | Cheung, Mei Ling Lisa. 2009. Merging Corpus Linguistics and Collaborative Knowledge Construction. Ph.D. thesis, University of Birmingham, Birmingham. | es_ES |
dc.description.references | Cohn, T., Callison-Burch, C., & Lapata, M. (2008). Constructing Corpora for the Development and Evaluation of Paraphrase Systems. Computational Linguistics, 34(4), 597-614. doi:10.1162/coli.08-003-r1-07-044 | es_ES |
dc.description.references | Dras, Mark. 1999. Tree Adjoining Grammar and the Reluctant Paraphrasing of Text. Ph.D. thesis, Macquarie University, Sydney. | es_ES |
dc.description.references | Faigley, L., & Witte, S. (1981). Analyzing Revision. College Composition and Communication, 32(4), 400. doi:10.2307/356602 | es_ES |
dc.description.references | Fujita, Atsushi. 2005. Automatic Generation of Syntactically Well-formed and Semantically Appropriate Paraphrases. Ph.D. thesis, Nara Institute of Science and Technology, Nara. | es_ES |
dc.description.references | Grozea, C., & Popescu, M. (2010). Who’s the Thief? Automatic Detection of the Direction of Plagiarism. Lecture Notes in Computer Science, 700-710. doi:10.1007/978-3-642-12116-6_59 | es_ES |
dc.description.references | GÜLICH, E. (2003). Conversational Techniques Used in Transferring Knowledge between Medical Experts and Non-experts. Discourse Studies, 5(2), 235-263. doi:10.1177/1461445603005002005 | es_ES |
dc.description.references | Harris, Z. S. (1957). Co-Occurrence and Transformation in Linguistic Structure. Language, 33(3), 283. doi:10.2307/411155 | es_ES |
dc.description.references | KETCHEN Jr., D. J., & SHOOK, C. L. (1996). THE APPLICATION OF CLUSTER ANALYSIS IN STRATEGIC MANAGEMENT RESEARCH: AN ANALYSIS AND CRITIQUE. Strategic Management Journal, 17(6), 441-458. doi:10.1002/(sici)1097-0266(199606)17:6<441::aid-smj819>3.0.co;2-g | es_ES |
dc.description.references | McCarthy, D., & Navigli, R. (2009). The English lexical substitution task. Language Resources and Evaluation, 43(2), 139-159. doi:10.1007/s10579-009-9084-1 | es_ES |
dc.description.references | Recasens, M., & Vila, M. (2010). On Paraphrase and Coreference. Computational Linguistics, 36(4), 639-647. doi:10.1162/coli_a_00014 | es_ES |
dc.description.references | Shimohata, Mitsuo. 2004. Acquiring Paraphrases from Corpora and Its Application to Machine Translation. Ph.D. thesis, Nara Institute of Science and Technology, Nara. | es_ES |
dc.description.references | Stein, B., Potthast, M., Rosso, P., Barrón-Cedeño, A., Stamatatos, E., & Koppel, M. (2011). Fourth international workshop on uncovering plagiarism, authorship, and social software misuse. ACM SIGIR Forum, 45(1), 45. doi:10.1145/1988852.1988860 | es_ES |