Wikipedia has been used as a source of comparable texts
for a range of tasks, such as Statistical Machine Translation and CrossLanguage
Information Retrieval. Articles written in different languages
on the same topic ...
This work addresses the issue of cross-language high similarity and
near-duplicates search, where, for the given document, a highly similar one is to
be identified from a large cross-language collection of documents. We ...
Cross-language plagiarism detection deals with the automatic identification and extraction of plagiarism in a multilingual setting. In this setting, a suspicious document is given, and the task is to retrieve all sections ...
Flores Sáez, Enrique; Barrón-Cedeño, Luis Alberto; Moreno Boronat, Lidia Ana; Rosso, Paolo(Graz University of Technology, Institut für Informationssysteme und Computer Medien (IICM), 2015)
[EN] Nowadays, Internet is the main source to get information from blogs, encyclopedias, discussion forums, source code repositories, and more resources which are available just one click away. The temptation to re-use ...
Barrón Cedeño, Luis Alberto(Universitat Politècnica de València, 2011-10-18)
El plagio de texto significa incluir en un documento texto escrito por otra persona sin darle crédito. Hemos probado algunos mètodos existentes y desarrollado dos nuevos para la detección de plagio: uno para la reducción ...
Silvestre Cerdà, Joan Albert; Garcia Martinez, Maria Mercedes; Barrón Cedeño, Luis Alberto; Civera Saiz, Jorge; Rosso ., Paolo(CEUR Workshop Proceedings, 2011)
[EN] This paper presents a proposal for extracting parallel corpora from Wikipedia on the basis of statistical machine translation techniques. We have used
word-level alignment models from IBM in order to obtain phrase-level ...
Stein, Benno; Rosso, Paolo; Stamatatos, Efstathios; Potthast, Martin; Barrón Cedeño, Luis Alberto; Koppel, Moshe(Association for Computing Machinery (ACM), 2011-06)
[EN] The Fourth International Workshop on Uncovering Plagiarism, Authorship, and Social
Software Misuse (PAN 10) was held in conjunction with the 2010 Conference on Multilingual
and Multimodal Information Access Evaluation ...
Three reasons make plagiarism across languages to be on the rise: (i) speakers of under-resourced languages often consult documentation in a foreign language, (ii) people immersed in a foreign country can still consult ...
[EN] This paper overviews eleven plagiarism detectors that have been developed
and evaluated within PAN’11. We survey the detection approaches developed
for the two sub-tasks “external plagiarism detection” and “intrinsic ...
Potthast, Martin; Gollub, Tim; Hagen, Mathias; Graßegger, Jan; Kiesel, Johannes; Michel, Maximiliano Luis; Oberländer, Arnd; Tippmann, Martin; Barrón Cedeño, Luis Alberto; Gupta, Parth; Rosso, Paolo; Stein, Benno Maria(CLEF Initiative (Conference and Labs of the Evaluation Forum), 2012)
[EN] This paper overviews 15 plagiarism detectors that have been evaluated
within the fourth international competition on plagiarism detection at PAN 12.
We report on their performances for two sub-tasks of external ...
The development of models for automatic detection of text re-use and plagiarism across languages has received increasing attention in recent years. However, the lack of an evaluation framework composed of annotated datasets ...
Barrón Cedeño, Luis Alberto; Vila, Marta; Martí, M. Antònia; Rosso, Paolo(MIT Press, 2013-12)
[EN] Although paraphrasing is the linguistic mechanism underlying many plagiarism cases, little attention has been paid to its analysis in the framework of automatic plagiarism detection. Therefore, state-of-the-art ...
Internet has made available huge amounts of information,
also source code. Source code repositories and, in general, programming
related websites, facilitate its reuse. In this work, we propose a simple
approach to the ...
The advent of the Internet has caused an increase in content reuse, including source code. The
purpose of this research is to uncover potential cases of source code reuse in large-scale environments. A good
example is ...