Semi Protection

UESPWiki:Administrator Noticeboard/Archives/Possible Solutions for Spam Attacks

The UESPWiki – Your source for The Elder Scrolls since 1995
Jump to: navigation, search
This is an archive of past UESPWiki:Administrator Noticeboard discussions. Do not edit the contents of this page, except for maintenance such as updating links.

Possible Solutions for Spam Attacks

Okay, this being the worst single attack I've seen (I counted 53 spam edts, and added 23 notches to my BanHammer blocking them), I'm beginning to think some change needs to be made. I'm not sure what, but this is just getting ridiculous. Possible solutions I've been considering:

  • Change UESPWiki:Spam Blacklist from Full-Protection to Semi-Protection. This would allow non-Admins to edit it, but not new accounts or IP editors. In this case, it would have allowed Eshe or one of the other editors to block this spam attack long before it got as bad as it did. Downside - could be abused if a spam-bot was smart enough to create an account, hold it dormant for a while, and then edit the Blacklist to free itself to do more damage. However, I don't think this is a likely occurance, and if it were to happen, damage to the Blacklist can easily be reverted. Worst case, we could just Full-Protect it again if this happened.
  • Promote more Admins. Right now, with only 3 active Admins, one of them on vacation, and another asleep during these hours, the responsibility fell on only me, and I just can't be around all the time. (Even if I am unemployed.) Or we could create a Semi-Admin position, who could edit protected pages, but not block users or delete pages. Again, this would allow certain people we trusted with the position to be able to edit the Blacklist and prevent these attacks before they get serious. I could think of a few people we could entrust with that power, for example, most of our Patrollers.
  • CAPTCHA. It may be a pain, but requiring IP-editors to prove they're human would go a long way towards preventing these attacks. We could couple the CAPTCHA page with a suggestion that creating an account means you won't have to go through that every time, which would encourage more people to sign up. Admittedly, it might turn away some editors, which is a possible downside. Though honestly, with few exceptions, most of the edits from IP-editors have been somewhat sub-par anyhow.
  • Smarter wiki software. I mean, there have got to be ways of preventing this sort of thing. Somebody has to have written plug-ins or something that can recognize and squash these attacks before they happen. What are they using on Wikipedia? Sure, there's lots of vandalism on Wikipedia, but the one thing I've never seen is spam. Their vandals seem to be all human, and I'd really like to know how they manage that. We've seen some definite patterns in the way these spam-bots work. Plus-sign vandalism is easy to spot. Large 12K+ edits by anonymous IPs or accounts with names that follow some basic rules (6 random characters, 1st and 4th always capital letters) are obvious warning signs. Talk pages created for non-existant articles, or with "index.php" in the title or a / as the final character, all good clues. (See :Category:Spam-Blockers for examples of these.) Pages being suddenly filled of hundreds of links, etc. We can recognize these things instantly. Surely there must be a way to program the site to recognize some of these same signs and catch them before they happen.

If anybody has any thoughts, suggestions, etc., I'd really like to hear them, because this has been getting steadily worse over the past several months, and we really need to do something to stop it. --TheRealLurlock Talk 23:34, 3 October 2007 (EDT)

I just did some research. This blog spam type really does get annoying. I'm by no means a computer whiz (I haven't even taken my first computer class, and I won't until my Sophomore high school year), but couldn't we do a keyword block? I checked the histories of the spam we just got and they repeat a lot of words like windjammer, 500hats, and justforkeepers. They are words practically never used by contributers. Is that possible? Just a thought. --Vesna 23:45, 3 October 2007
That's pretty much exactly what the Spam Blacklist (linked in my first post above) does. Take a look at it. Problem is that every time we block a batch of these keywords, it stops them for a while, but they just come back a few days later with a dozen new sites to spam us with. There's also some worry that if the Blacklist gets too long, it may impact general performance of the site, though I'm note sure there's any basis to that. (The Blacklist is only checked when you post an external link to another website on an article. Since very few legitimate edits ever do this, and when they do, it's at most one or two links, not hundreds like the spam-posts, it would probably not affect the site too much.) The problem is that right now, the Blacklist can only be editted by Admins, and right now there's only 3 of us that are active on the site. (Or right now, only 1, since one is on vacation and the other hasn't woken up yet.) Hence the reasoning behind my first two suggestions, making it possible for more people to edit this page. --TheRealLurlock Talk 00:16, 4 October 2007 (EDT)
Captcha. CS Wiki pretty much eliminated their spam problem by installing captcha. It kicks in on account creation and whenever adding a new link to an external site. It's really not too intrusive (relatively few valid edits add new links to external sites), so you don't see it that much. And when you do get it, it only adds a few seconds to editing time. --Wrye 00:13, 4 October 2007 (EDT)
We're going to need our own WikiSpam Project soon. I'll do some more research on this. I'll post back in a few hours. --Vesna
Hmm, that does sound good - though if it also has an option to make it only affect IP editors and new accounts, that would minimize the impact even further. Sure, spammers could create accounts and let them lay dormant until they get past the newbie-phase, but they'd still have to get past the Captcha in order to create the accounts in the first place, so that would require at least some human-intervention to spam the site, and that might be all the protection that's needed. Normal editors would then feel no effects from the change whatsoever, other than the site being blissfully spam-free. Of course, any such change would require Daveh to do the work to install it, so we might try my earlier suggestions in the mean-time. Semi-Protecting the Blacklist is something we can do right away if people agree with it. --TheRealLurlock Talk 00:27, 4 October 2007 (EDT)
I think a Captcha system is going to be the way to go in the long run. Of the other ideas, I suppose changing the spam blacklist to semi-protection would work - I just feel somewhat nervous that most people don't get regular expressions and might make mistakes. Only one way to find out there, though. More admins? Well I'm still peeved I didn't get to dirty my ban-stick this time! The problem is that the obvious candidates for admin-hood all share roughly the same time, and school / college means there will still be problems with non-coverage. It's not a reason not to do it but I doubt it will solve the issue. Smarter wiki software... well there's this but I don't know enough about the software to say much. It's just depressing that some people are so destructive. --RpehTalk 03:56, 4 October 2007 (EDT)
Okay, now why limit ourselves to one solution, guys? I'm all for promoting more admins, as per the "the more, the merrier" attitude that we have about patrollers. I mean, is it just me, or do we have more new users? So our proportion of admins to users has shrunk. I think a semi-admin position is a less-than-worthwhile pursuit. Most semi-admins could quickly become useful enough to be full admins. And if we trust someone to edit the spam blacklist, why don't we trust them to block bots, or delete pages that are created by spammers?
That said, I'm also for semi-protecting the spam blacklist. Heck, I tried to edit it once before!
As for Captcha, well, I can see the benefits. I just personally can't stand having to squint at things like that to get into a site. Somercy 10:36, 4 October 2007 (EDT)
See, but with the settings I suggested, you'd only have to do it once when you created your account. (Of course, since you already have an account, you wouldn't have to do it at all, but I'm talking about the general "you", not you specifically.) Between Wrye's suggestion and mine, I think the Captcha would be a pretty non-intrusive solution. The only way you'd ever see it is if you were a IP-editor trying to add an external link to a page, or if you were creating a new account. Everybody else would barely even notice it was there.
One other possibility that occurred to me was to create a new level of page-protection in between semi-protection and full-protection. The reason for this is that while we could trust most of our regular editors to treat the Spam Blacklist properly, there's a LOT of pages that are full-protected for a reason. Everything in the Mediawiki namespace, for example, needs to be protected, because a careless edit there could affect the whole site and screw a lot of things up. This is where my semi-admin idea comes in. Handing out the keys to more people and allowing the potential for them to cause site-wide damage could be a very bad idea. But at the same time, I'd like to be able to trust people to use tools such as the Blacklist without the need for an Admin to be on vigil 24/7. --TheRealLurlock Talk 11:24, 4 October 2007 (EDT)
Re captcha. I really think that captcha should trigger on both new accounts and new external links. If you only trigger on new accounts, then your encourage spammers to set up bogus accounts (which is a pain in the butt by itself). And yes, they will do this. Before anonymous editing was allowed here, plenty of bogus accounts were created early and used later. Same thing happened at CS wiki when they got spam nuked back in June. And the captcha really isn't bad. Around the time CS wiki got nuked, I was adding articles over there and I had an unusually high number of external links in the articles -- I was surprised at how non-invasive captcha was. As an active editor there, it really wasn't a problem at all. Again, since it only triggers when you add new external links, most edits won't trigger it. I would say configure it that way first. If after using it for a while it seems to be too much of a hassle, turn it down a little bit. --Wrye 15:32, 4 October 2007 (EDT)
I'd agree with Wrye. Apart from anything else the number of external links (not including spam) is vanishingly small so most users won't notice the difference. Now. Is it actually possible? --RpehTalk 15:43, 4 October 2007 (EDT)
See brief prior discussion UESPWiki:Administrator_Noticeboard/Vandalism#Captcha for a few links. So, it's possible and (apparently) not too hard to do. It's an extension, so it has to be added by Dave. Other than that, I don't know the tech details. I believe that Nephele's up on the details and probably did more research into various alternatives, so if we want to do it, it's probably best to wait until she gets back from vacation. (A couple of weeks, I think.) --Wrye 16:32, 4 October 2007 (EDT)
One minor question - can the Captcha distinguish between an actual link to an external site and one which is a link to the site but using the external link formatting? E.g.: UESP One thing we sometimes use external links for is to point to an old version of a page, e.g. Penultimate Edit. These look like external links, but are actually within the site - actually while I think about it, any link to the Forums, the old site, the Oblivion Map, or any other non-wiki parts of the site would fall into this category as well. No big deal, just wondering how intrusive this will be... --TheRealLurlock Talk 17:14, 4 October 2007 (EDT)
I just tested that over at cs wiki. Yes, it will trigger on that. --Wrye 20:30, 4 October 2007 (EDT)
Although it's occurred to me that neither the Blacklist nor Captcha would have had any effect on the latest batch of spam. It was formatted with <a href=blahblah> instead of [blahblah], which technically doesn't create links in wiki-markup. This also makes it fairly ineffective as spam, since there's no actual links on the page. It's still annoying crap that has to be reverted, of course, but none of the automatic techniques we have discussed would do anything to stop it. I guess we just have to hope that most spam-bots will realize that this form of spamming doesn't benefit them at all, and just stick to the type we can easily block... --TheRealLurlock Talk 23:37, 4 October 2007 (EDT)
Always test before making such claims, Lurlock! :lol: I just did a quick test at cs wiki and captcha correctly triggers in response to href as well as regular linking. If you've got any further questions about what captcha does and doesn't do, you should set up an account at cs wiki, then experiment on your user page there to see when captcha does and does not trigger. --Wrye 15:30, 5 October 2007 (EDT)

Confirm Edit Installed

I've installed the ConfirmEdit extension as suggested and a quick test shows that it seems to be working fine. I'm currently just using the default configuration settings but just let me know if anything needs to be changed. -- Daveh 11:52, 6 October 2007 (EDT)

I like it! Completely invisible for registered users and reasonably non-intrusive for unregistered ones. Hopefully that'll keep the problem down to manageable levels. --RpehTalk 12:16, 6 October 2007 (EDT)
Thanks Dave. I was kind of expecting to see the cs wiki word recognition tests, and I am a little concerned about just encouraging bogus signups. But better a quick start, I think. --Wrye 14:11, 6 October 2007 (EDT)
Yah, I was assuming it was going to do that to. Looking more closely at the CS wiki it seems they use reCaptcha in addition to ConfirmEdit (or perhaps it is an extension/branch, its not completely clear). -- Daveh 15:16, 6 October 2007 (EDT)
I noticed that ever since the installation of ConfirmEdit, I have to confirm almost every edit I do. The reason being that many templates have "external" links to allow users to edit a page or part of it. E.g. the {{stub-mw}} template has a link like this:
You can help by <span class="plainlinks">
 [{{SERVER}}{{localurl:{{NAMESPACE}}:{{PAGENAME}}|action=edit}} expanding it]</span>.
which triggers the confirmation. These are not really external links, but links that point back to our wiki. Would it be possible to filter them out? --DrPhoton 08:34, 8 October 2007 (EDT)
I've found that the MediaWiki:Captcha-addurl-whitelist page can be edited with sites to omit from the captcha test. I've added the UESP and a couple of common ES sites to it and it seems to work fine. Let me know if doesn't solve the template issue though. -- Daveh 09:50, 8 October 2007 (EDT)
That worked, thanks! By the way, shouldn't that page be protected? --DrPhoton 13:41, 9 October 2007 (EDT)
Good idea - I'll just give it semi-protection for now. I honestly don't think spam-bots will be smart enough to get around that, and we can always change it to full-protection if the need arises in the future. --TheRealLurlock Talk 13:59, 9 October 2007 (EDT)
Hmm - never mind, for some reason, I can't protect it. I don't get the protect tab on the top of the page like most pages, and when I tried manually, I just got an error. That's kind of weird... --TheRealLurlock Talk 14:01, 9 October 2007 (EDT)
All pages in the MediaWiki namespace are automatically protected given their nature (see MediaWiki_namespace). -- Daveh 14:15, 9 October 2007 (EDT)

So, great...

We finally come up with a perfect solution for how to prevent bots from posting spam on the site. And now the bots decide they don't want to post spam any more. They'd rather just fill pages up with nonsense instead. Any idea what to do about these kinds of attacks? Seems to be a similar M/O to the Plus-Sign Vandal. It changes certain characters into other characters, and always adds some nonsense word to the top of the page as well. It's also created one page from scratch so far. (I deleted it.) At least we know it won't post any spam, but it still makes a big mess. I'm afraid any solution to this problem would have to get a bit more technical - find some way to recognize open proxies, or maybe some sort of restriction on rapid multiple posts in a short amount of time by a new IP editor. Like, say, you get a 60-second timer between posts before you can post again until your post-count gets above 20 or something. (IMDB forums do something like that.) I don't know, I just get frustrated, after all we've done cleaning up in here, some damn bot comes in and messes the place up again. Sometimes, I just wish the world would stay saved for a bit, you know what I'm saying? --TheRealLurlock Talk 22:56, 9 October 2007 (EDT)

We may want to wait to see if it continues to be a problem or if it's just temporary. --Wrye 16:56, 10 October 2007 (EDT)
Well, we've waited, we've seen, and it doesn't seem to be going away. Also, my previous idea of an edit-timer doesn't seem like it would be all that effective now. Most of these IPs have done only one edit, so this wouldn't stop them or slow them down, as it just gets a new IP every time. Even this latest IP which did four edits before being stopped had a gap of over 2 hours between each of its edits, so a flood-timer would have no effect. I do have another idea, however. Could we implement the Captcha so that it automatically kicks in for the first, say, 5 posts for every new editor? Add a note at the top saying "This will only be required for your first five edits. We apologize for the inconvenience, but this is the only way we can prove that you are human and not a vandal-bot." or something like that - probably somebody should re-word that to be a little more polite and avoid scaring off legitimate editors. I very much doubt that any vandal bot user would go to the trouble of manually doing a Captcha for 5 edits prior to being set to do bot vandalism - especially since the spam-potential has basically been defeated. (It was one thing when they were doing it for profit - quite another if it's just to be annoying.) I think this minor inconvenience to newbie editors would be more than worth it if it basically means no more bots - ever. What do people think about this? --TheRealLurlock Talk 11:26, 17 October 2007 (EDT)
Well we could turn off anonymous editing. (But, see previous archived discussion.) First five edits is an interesting idea. You would need to maintain a database of all ips that attempt to edit the site and track their completed edits. --Wrye 16:31, 17 October 2007 (EDT)
I'm not sure that'd be necessary. Just require a Captcha for any IP editor with <5 edits. While it'd be great to track the null-edits that don't complete, I'm not sure how easy or practical that is. But the site already keeps a track of how many edits each user has, so it shouldn't be too hard to use that count to determine if somebody was a new editor or not. And even though Nephele suggested at one point that these bots might be trojans on other people's machines (as opposed to open proxies), we've yet to see one of these nonsense edits come out of an IP which already has an established history of legit posts. Every one of the IPs has had no prior history, so a "prove you're human" thing for the first 5 posts would be 100% effective - at least for now (until the bot-user decides to waste their time doing 5 captchas in order to legitimize their IP, and then start the bot-vandal - which is a lot of work for someone to do for not much purpose considering they'd just get banned again almost immediately afterwards.) Of course, any such change would need to be implemented by Daveh, but I think in light of this bot's M/O, it seems to be the most reliable way of quashing this thing for good with minimal collateral damage. --TheRealLurlock Talk 14:53, 18 October 2007 (EDT)
I'd like to repeat this request in light of the most recent attack. 74.208.11.169 hit 36 pages before being stopped, and it's even started to hit the Template namespace. Given the severe impact that a bot like this could have on the site as a whole if it hit some of the right pages in the Template space, I think it's possibly a matter of urgency that we do something to stop it before it gets the chance. --TheRealLurlock Talk 11:19, 9 November 2007 (EST)
Unfortunately, if you check the documentation on ConfirmEdit, that is not an option that is available with the extension. Unless you can locate a wikimedia extension that provides this capability, it is going to require somebody writing a bunch of PHP code in order to implement any such change. And personally I feel like in the day or two that it would take me to work up any such code, there are dozens of other things that I can do that will help the site more. I'm not saying it wouldn't be useful, just that it's a lot of effort. --NepheleTalk 11:41, 9 November 2007 (EST)
Well, I posted a suggestion on their talk page. Hopefully somebody will take a look at it. I'm sure they've got people who know more about PHP than any of us, so if somebody wants to create this feature, they'll be able to do it. Let's just keep an eye out for any new versions of ConfirmEdit that get released. --TheRealLurlock Talk 12:18, 9 November 2007 (EST)
We might semi-protect the templates. Be a lot of work though -- you'd have to semi-protect a bunch of pages. --Wrye 18:31, 9 November 2007 (EST)
Well we wouldn't have to semi-protect all of them, just the ones that are used a lot like the NPC, quest, and a few others.--Ratwar 19:20, 9 November 2007 (EST)