Welcome to the Question2Answer Q&A. There's also a demo if you just want to try it out.
0 votes
3.4k views
in Q2A Core by
If you do a search on Google for: "site:yourQ2Asite.com", you'll see a bunch of results that you don't necesarily want to show up. For example, the /questions?start=10, /questions?start=20, etc. And /tags?start=10, /tags?start=20, etc. Along with same for categories, etc.

Can anyone recommend a good robots.txt file that will keep Google and other search engines from indexing redundant data?
Q2A version: 1.5
by
Here are some suggestions:

Disallow: /questions?start=*
Disallow: /questions?sort=hot&start=*
Disallow: /questions?sort=votes&start=*
Disallow: /questions?sort=answers&start=*
Disallow: /questions?start=*
Disallow: /tags?start=*

2 Answers

+2 votes
by

Firstly those pages with "start" parameters you definitely do want being spidered by search engines! Otherwise how will they find all your questions?

I don't think there is any reason at all to prevent indexing of those pages as SEs won't show them very high anyway. But probably the best way is to add a meta tag like this: <meta name="robots" content="noindex,follow">

That will prevent indexing of the page but allow search engines to follow the links and find the questions. You'll need to make a plugin or a custom theme that checks what section you're on* and add the tag only to the approprite pages.

* I forget the exact code off-hand, I think something like $this->request shows you the current section.

by
Thanks. I didn't think that through. Maybe it'd be ok to block the hot, most voted, and most answered, since the bot will crawl all the questions via the /questions?start=. I'll have to learn some more code skills before being able to make such a plugin. Perhaps there's a way to do it with htaccess?
+2 votes
by
You can't stop search engines index or de-index your pages by using robots.tx. You should use meta tags in <head> section of your any websites.
...