Hello guys,
I once worked as a quality rater for a well-known search engine, so I realize that most helpful search result pages are those that contain the keywords in the title, in the URL as well as the site name.
So, for Q2A project, when a user does a query, the most important Q2A result pages are those that contains the query words in the title, as well as in the tags. Those containing the query words in the post content is of less importance.
For example, if I type tuberculosis in this question, and you search Q2A system with the query tuberculosis, you may find this question in the top 1 position because I now have 3 tuberculosis words. But obviously, my post is surely not about tuberculosis. If I received a task to rate the quality of this question post for the query "tuberculosis", I would rate it as OFF-TOPIC.
So, for a fairly new Q2A site, the qa_contentwords table may be a little of help, because you will have lots of Q2A result pages even though they are slightly relevant.
For a mature Q2A site with rich pool of questions, the qa_contentwords table becomes less necessary, as the users may find what he needs instantly with questions with higher level of relevancy.
So, how can I safely stop Q2A system from filling up qa_contentwords?
I can try some rude ways to stop it, but I need a more cultured way to do so.