Can not search in Chinese

Question

Can not search in Chinese

asked Dec 31, 2010 in Q2A Core by Tim Zhang

I got the same issue when I tried to search in Chinese. My Q&A content are all written in Chinese. I found that if there are same key words in tags, it can give some results. But if the key words are not in tags, then I can get nothing even the key words are just in a question's title.

Let us say if I want to search a user's article by using the user's name(it is a Chinese name), and I get nothing.

Before I use Question2Answer, I used another script. There was an option like the following___________________

Search within words.

This search settings is for ideographic languages, like Chinese, Japanese, Korean and others. Enabling this search option search for the search keyword within words. Like for "ant" search keyword "elephant" would match, since it contains the search keyword "ant". This option can be also enabled for Non-Ideographic languages, if such functionality is desired.

Could anybody help me to solve the problem?

For a Q&A website, search is one of the most important functions......

related to an answer for: I translate other languages, but I only can use Search for English.

commented Dec 31, 2010 by gidgreen

commented Dec 31, 2010 by Tim Zhang
edited Dec 31, 2010 by Tim Zhang

commented Jan 2, 2011 by edward

2 Answers

gidgreen · Answer 1 · 2011-01-05T08:39:33+0000

OK, I certainly understand the problem. Q2A converts text into words (for searching) based on separating out words by word delimiters, like spaces (obviously), commas, quote marks, etc...

You can see the full set used at the top of qa-util-string.php, in the constant QA_PREG_INDEX_WORD_SEPARATOR and also in the mapping $qa_utf8punctuation which converts UTF-8 punctuation characters.

In the case of Chinese and similar languages, there are no word delimiters per se, but rather each multibyte (UTF-8) character is essentially a separate word.

So the solution is to modify function qa_string_to_words() in qa-util-string.php to identify UTF-8 characters in Chinese and similar languages, and split them into separate words as well. One simple way would be to do something at the start of the function that inserts a space before and after each Chinese character, which will then be detected later as a word separator.

Assuming you get this to work, it would be great if you would post your code.

Also, don't forget to click the 'Reindex' button in the 'Stats' section of the 'Admin' panel to make sure all the content is reindexed correctly.

Tim Zhang · Answer 2 · 2011-01-05T15:39:04+0000

My problem was solved by a coder from Korea, xguru. He is a member of this society as well.

That is what he did:

open qa-include/qa-db-selects.php file and go to qa_db_search_posts_selectspec() function.

Just before below line ( almost end of function )

if ($selectparts==0)

add this lines

if (!empty($handlewords)) {

$aaa = implode($handlewords);}

else

{$aaa = $handlewords;}

$selectspec['source'].=($selectparts++ ? " UNION ALL " : "").

"(SELECT postid AS questionid, 0 AS score, _utf8 'Q' AS matchposttype, postid AS matchpostid FROM ^posts JOIN ^users WHERE (^posts.title like _utf8 '%".$aaa."%' OR ^posts.content like _utf8 '%".$aaa."%' ) AND type='Q' )";

It works with the latest version.

Thank gidgreen, you designed a great script.

Thank xuguru, you saved my life.

Can not search in Chinese

Please log in or register to add a comment.

Please log in or register to answer this question.

2 Answers

Please log in or register to add a comment.

Please log in or register to add a comment.

Categories

Can not search in Chinese

Please log in or register to add a comment.

Please log in or register to answer this question.

2 Answers

Please log in or register to add a comment.

Please log in or register to add a comment.

Related questions

Categories