Welcome to the Question2Answer Q&A. There's also a demo if you just want to try it out.
+5 votes
in Q2A Core by
Hi all,

We've currently got 323,937 users in our Q2A site.

I suspect a large percentage of these have erroneous e-mail addresses (possibly can be checked/filtered with something like http://kickbox.io) and are basically spam robots.

I'd like to clean out as many of the fake user accounts as possible (whilst not accidentally deleting legitimate 'lurker' accounts who don't post much.)

Any suggestions on how best to approach this?
Q2A version: 1.7
Is email confirmation feature enabled in your site?
No, it's not. (We're using reCAPTCHA though - https://www.google.com/recaptcha/intro/index.html - which has helped a lot though).
Typically, spam protection capability of reCAPTCHA is limited. I recommend to review spam option of your site. At least email confirmation feature is required on internet. Surely email confirmation feature is a option. But, it should be OFF only at safe place (intranet protected by firewall, etc).

1 Answer

+1 vote
selected by
Best answer
I guess this is a difficult question. However, you can identify potential users to be removed. The ones that (AND):

1) Do not have any upvote (Do spams vote each other?)

2) Do not have any best answer (Do spams choose best answers of other spams?)

3) They have questions but they never chosen a best answer. (Do they return to choose the best answer?)

3) Do not have confirmed emails

4) Has a question or an answer identified as an spam (I am not sure how to do that, since I do not know how clever these robots are). Do the questions that they pose make any sense or follow any kind of pattern)? Is it possible to recognize the spam by the tags they use? Do they access your site just one time to register? Or they use the same account frequently? There are also some programs (mainly in English) that were designed to recognize spam.

So, my point is before you find an automatic way to recognize them, you have by yourself to be able to identify them.
Agreed. The best idea I've come up with so far is to run all of their IP addresses through an IP geolocation database (probably GeoIP - e.g: https://github.com/jamesspittal/geoip-ip2country)

Since our website has a strong Australia focus, almost all legitimate users are in Australia and most spammers tend to be from IP addresses that are not located in Australia. The only problem with this approach is that when we moved over to Q2A, it was originally from OSQA and when we ported the database data over, I believe we lost all of the original IP address data from OSQA or it was not in the DB. So, it's likely that this mechanism will not work for old SQL user data. I'll do some further experimentation and report back if I make any major progress in this direction/have anything to share.Thanks danoc.