Wikipedia Vandalism… continued!
I have previously written about vandalism on Wikipedia when I was constructing Wikipedia Vandalism Watch. After continued watching and reverting on Wikipedia I started to notice trends in the vandalism, and that most malicious edits almost always included certain phrases and symbols. Some of these are hard to separate out from real edits, and others are not. The most used and easiest to detect phrase is the good old “!!!!!”. So I toyed with the idea of incorporating some sort of Recent Changes RSS feed reader, and then adding in some text checks, but with Wikipedia Vandalism Watch being a read only client, i.e. just shows you edits from flagged users I didn’t think this to be much use. So instead I went about creating a new application, this time an unattended robot to scan for, and automatically remove vandalism.
Admittedly this sounded much easier than it actually is when I started, with various hurdles getting in the way, one example of which is the Wikipedia token editing. Which means you can’t just send a POST request to the server to edit an article. Once things like that had been solved things slowly fell into place. About half way through development I started the Request for Approval process. This is done by the Bot Approvals Group, as this allows them to regulate the standard that bots are made to before you are allowed to run them on Wikipedia. But of course, I can see you asking already, “it’s Wikipedia, and public web-site, how could they stop me?”. Well the simple answer to that one is that if a bot is spotted without being approved, the bot, and your account will be banned straight away. Bots without approval are not welcome on Wikipedia in any shape or form, as unless its been approved it could at any moment go off the rails and destroy many articles before someone could either turn it off or ban it. It’s also the ruling with approved bots that if your bot does the same thing, the owner is responsible and he or she should clean up the mess.
After some discussions with some members of the Bot Approvals Group of how the bot worked, and how it detected vandalism I was granted a trial of 50 edits in the main name space. This trial was completed quite slowly due to the restrictions that were applied to my bot, of 2 edits per minute for 40 reverts, then for the last 10 reverts, I could let it run as fast as it could find vandalism. After a few false positives along the way, and some conflicts, the trial was completed.
At the time of writing, the bot has completed the trial and is currently awaiting news and information on how to proceed. If you want some more technical information, or just to see what the bot is currently up to, check out its User page on Wikipedia.
Share This
