- -

Cross-Language Plagiarism Detection

RiuNet: Institutional repository of the Polithecnic University of Valencia

Share/Send to

Cited by

Statistics

Cross-Language Plagiarism Detection

Show simple item record

Files in this item

dc.contributor.author Potthast, Martin es_ES
dc.contributor.author Barrón Cedeño, Luis Alberto es_ES
dc.contributor.author Stein, Benno es_ES
dc.contributor.author Rosso, Paolo es_ES
dc.date.accessioned 2014-05-14T12:32:31Z
dc.date.issued 2011-03
dc.identifier.issn 1574-020X
dc.identifier.uri http://hdl.handle.net/10251/37479
dc.description.abstract Cross-language plagiarism detection deals with the automatic identification and extraction of plagiarism in a multilingual setting. In this setting, a suspicious document is given, and the task is to retrieve all sections from the document that originate from a large, multilingual document collection. Our contributions in this field are as follows: (1) a comprehensive retrieval process for cross-language plagiarism detection is introduced, highlighting the differences to monolingual plagiarism detection, (2) state-of-the-art solutions for two important subtasks are reviewed, (3) retrieval models for the assessment of cross-language similarity are surveyed, and, (4) the three models CL-CNG, CL-ESA and CL-ASA are compared. Our evaluation is of realistic scale: it relies on 120,000 test documents which are selected from the corpora JRC-Acquis and Wikipedia, so that for each test document highly similar documents are available in all of the six languages English, German, Spanish, French, Dutch, and Polish. The models are employed in a series of ranking tasks, and more than 100 million similarities are computed with each model. The results of our evaluation indicate that CL-CNG, despite its simple approach, is the best choice to rank and compare texts across languages if they are syntactically related. CL-ESA almost matches the performance of CL-CNG, but on arbitrary pairs of languages. CL-ASA works best on "exact" translations but does not generalize well. es_ES
dc.description.sponsorship This work was partially supported by the TEXT-ENTERPRISE 2.0 TIN2009-13391-C04-03 project and the CONACyT-Mexico 192021 grant. en_EN
dc.format.extent 18 es_ES
dc.language Inglés es_ES
dc.publisher Springer Verlag (Germany) es_ES
dc.relation TEXT-ENTERPRISE [2.0 TIN2009-13391-C04-03] es_ES
dc.relation CONACyT-Mexico [192021] es_ES
dc.relation.ispartof Language Resources and Evaluation es_ES
dc.rights Reserva de todos los derechos es_ES
dc.subject Cross-language es_ES
dc.subject Plagiarism detection es_ES
dc.subject Similarity es_ES
dc.subject Retrieval model es_ES
dc.subject.classification LENGUAJES Y SISTEMAS INFORMATICOS es_ES
dc.title Cross-Language Plagiarism Detection es_ES
dc.type Artículo es_ES
dc.embargo.lift 10000-01-01
dc.embargo.terms forever es_ES
dc.identifier.doi 10.1007/s10579-009-9114-z
dc.rights.accessRights Abierto es_ES
dc.contributor.affiliation Universitat Politècnica de València. Departamento de Sistemas Informáticos y Computación - Departament de Sistemes Informàtics i Computació es_ES
dc.description.bibliographicCitation Potthast, M.; Barrón Cedeño, LA.; Stein, B.; Rosso, P. (2011). Cross-Language Plagiarism Detection. Language Resources and Evaluation. 45(1):45-62. doi:10.1007/s10579-009-9114-z es_ES
dc.description.accrualMethod Senia es_ES
dc.relation.publisherversion http://link.springer.com/article/10.1007/s10579-009-9114-z es_ES
dc.description.upvformatpinicio 45 es_ES
dc.description.upvformatpfin 62 es_ES
dc.type.version info:eu repo/semantics/publishedVersion es_ES
dc.description.volume 45 es_ES
dc.description.issue 1 es_ES
dc.relation.senia 215389


This item appears in the following Collection(s)

Show simple item record