Welcome to the Question2Answer Q&A. There's also a demo if you just want to try it out.
0 votes
1.3k views
in Q2A Core by

Just checked table qa_eventlog and found dozens of SEARCH events, triggered by IP 66.249.78.148 which seems to be Google Search bot. crawl-66-249-78-148.googlebot.com

As this is not human, I do not want to have those events listed in my database (and who knows how enormous the data grows if bots visited my forum each day).

Any counter measure (but blocking the IP which is undesired in this case...)

Thanks,
Kai

by
And by the way, it is triggering the same search query all the time, how strange: "searched for" "was ist mit lim n ..?q=was ist mit lim..."
There is another ?q parameter inside the search string :?(
by
I checked the IP and its connected events:
21,611 total queries (!)
by
edited by
For now, I switched logging of search events off, see how-to here: http://www.question2answer.org/qa/22480/any-way-to-exclude-search-events-from-event-logger

Removing all "search" events from my database saved me 5 MB (!)

1 Answer

0 votes
by
selected by
 
Best answer
You can block google bot in your robots.txt
by
well, having the forum not indexed in google, may hurt a bit ;)
by
What it really sounds like is the google bot has hit a loop in you script. Read up on search Bots and loops.
by
wow, interesting. I did not know about this. For others to read: http://webmasters.stackexchange.com/questions/43342/fetch-as-google-goes-infinite-loop-again + "look at your server logs"

which revealed:
66.249.76.59 - - [19/Mar/2013:00:02:27 +0100] "GET /6242/was-ist-mit-lim-n-..?q=was+ist+mit+lim+n+..%3Fq%3Dwas%2Bist%2Bmit%2Blim%2Bn%2B..%253fq%253dwas%252bist%252bmit%252blim%252bn%252b..%25253fq%25253dwas%25252bist%25252bmit%25252blim%25252bn%25252b..%2525253fq%2525253dwas%2525252bist%2525252bmit%2525252blim%2525252bn%2525252b..%252525253fq%252525253dwas%252525252bist%252525252bmit%252525252blim%252525252bn%252525252b..%25252525253fq%25252525253dwas%25252525252bist%25252525252bmit%25252525252blim%25252525252bn%25252525252b..%2525252525253fq%2525252525253dwas%2525252525252bist%2525252525252bmit%2525252525252blim%2525252525252bn%2525252525252b..%25252525252526start%2525252525253d40%252525252526start%25252525253d20%2525252526start%252525253d60%25252526start%2525253d40%252526start%25253d80%2526start%253d40%26start%3D60&start=0 HTTP/1.1" 200 14240 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

Any idea how I can stop Google's loop?
by
My feeling tells me that it has to do with the core hack of page-not-found.php suggested here: http://www.question2answer.org/qa/21876/trick-redirect-error-404-to-search-page

There the words from the URL are set into the search field, that would explaing the ?q parameter somehow...
by
You can block search bots from web pages you don't want them to look at. So I'm going to do it to you again. Read up on Robot.txt blocking webpages from search bots. With this option you don't have to block the whole of the website.
by
of course, thank you, I overlooked the specific folder blocking...
by
All right, I am using now in robots.txt:

User-agent: *
Disallow: /search/
Disallow: /6242/

where /6242/ has been the post that caused the google loop!

---
And just now I see that the URL for the google loop has a "%3F" inside which hits my recent redirect loop problem: http://www.question2answer.org/qa/22544/redirecting-problem-redirect-loop-when-instead-question-mark ♫ that was the reason!
...