Resumen:
|
Wikipedia is an online encyclopedia that anyone can edit. The fact that
there are almost no restrictions to contributing content is at the core of its
success. However, it also attracts pranksters, lobbysts, spammers and ...[+]
Wikipedia is an online encyclopedia that anyone can edit. The fact that
there are almost no restrictions to contributing content is at the core of its
success. However, it also attracts pranksters, lobbysts, spammers and other
people who degradatesWikipedia's contents. One of the most frequent kind
of damage is vandalism, which is defined as any bad faith attempt to damage
Wikipedia's integrity.
For some years, the Wikipedia community has been fighting vandalism
using automatic detection systems. In this work, we develop one of such
systems, which won the 1st International Competition on Wikipedia Vandalism
Detection. This system consists of a feature set exploiting textual
content of Wikipedia articles. We performed a study of different supervised
classification algorithms for this task, concluding that ensemble methods
such as Random Forest and LogitBoost are clearly superior.
After that, we combine this system with two other leading approaches
based on different kind of features: metadata analysis and reputation. This
joint system obtains one of the best results reported in the literature. We
also conclude that our approach is mostly language independent, so we can
adapt it to languages other than English with minor changes.
[-]
|