Welcome to the Question2Answer Q&A. There's also a demo if you just want to try it out.
+7 votes
774 views
in Q2A Core by
It seems that literally every word in Q2A gets indexed in the "qa_word" table, even simple ones like 'I', 'the' and 'where'.

This seems to play havoc with the duplicate question function, since many questions have these simple words in and it brings up completely unrealted questions.

So I'd like to ignore some common words, if possible?

2 Answers

+5 votes
by
selected by
 
Best answer
A good question. A few comments on this:

1. The duplicate question checker takes word frequency into account, so these common words will contribute much less to whether a question matches, compared to less frequent, identifying words.

2. You can also try adjusting the sensitivity of the duplicate question checker in the admin panel, to reduce false matches. It can also be switched off.

3. You can also adjust the QA_IGNORED_WORDS_FREQ constant towards the bottom of your qa-config.php file - any words which appear more times that this value will be ignored when searching or looking for duplicate questions.

4. If you *really* want to avoid indexing common words, as opposed to these other solutions, you can modify the qa_post_index(...) function in qa-app-post-create.php - after the first three lines, you can add some code which goes over those arrays and removes anything in a list of common words which you create.

5. As a general point, the scoring function for searching (and by extension question matching) could certainly use some improvement, and I plan to address this in a future version.
by
Theres a side effect of the QA_IGNORED_WORDS_FREQ : if you search a for one word that is frequent, you get no results at all!
+1 vote
by

Thanks to gidgreen i solve this problem for myself:

In qa-search-basic.php file in qa_search_basic class at the first of index_post method change these line:

            $titlewords=removeCommonWords(array_unique(qa_string_to_words($title)));
            $contentcount=removeCommonWords(array_count_values(qa_string_to_words($text)));
            $tagwords=removeCommonWords(array_unique(qa_string_to_words($tagstring)));
            $wholetags=removeCommonWords(array_unique(qa_tagstring_to_tags($tagstring)));

then add removeCommonWords method like this:

            function removeCommonWords($words){
                $commonwordstring=qa_opt('block_common_words');
                
                $commonWords = explode(',', $commonwordstring);
                
                foreach ($words as $key => $word){
                    if(in_array($word, $commonWords, true)) {
                        unset($words[$key]);
                    }
                }
                
                return $words;
            }

finally from admin->stats tab click on reindex content button.

It works nice for me.

Thanks again gidgreen  because of your great project.

...