Welcome to the Question2Answer Q&A. There's also a demo if you just want to try it out.
+1 vote
2.9k views
in Q2A Core by

I got the same issue when I tried to search in Chinese. My Q&A content are all written in Chinese. I found that if there are same key words in tags, it can give some results. But if the key words are not in tags, then I can get nothing even the key words are just in a question's title.

Let us say if I want to search a user's article by using the user's name(it is a Chinese name), and I get nothing.

Before I use Question2Answer, I used another script. There was an option like the following___________________ 

 

 Search within words.

This search settings is for ideographic languages, like Chinese, Japanese, Korean and others. Enabling this search option search for the search keyword within words. Like for "ant" search keyword "elephant" would match, since it contains the search keyword "ant". This option can be also enabled for Non-Ideographic languages, if such functionality is desired.

Could anybody help me to solve the problem?

For a Q&A website, search is one of the most important functions......

by
Can you please post a link to your site so I can see this? For whole words the searching should work fine in any language, since Q2A uses UTF-8 throughout.
by
edited by
Thanks a million, gidgreen.

My URL is http://fmknow.com/weknowq2a, it is a website to help new immigrants to settle down in Vancouver. You can try to pick up a key word in the title of the first question, characters 2 to 4 is 加拿大, it is Canada in Chinese.

Cause it is not public open yet.
by
Yes, I have the same problem, it cannot search Chinese characters.

2 Answers

+1 vote
by

OK, I certainly understand the problem. Q2A converts text into words (for searching) based on separating out words by word delimiters, like spaces (obviously), commas, quote marks, etc...

You can see the full set used at the top of qa-util-string.php, in the constant QA_PREG_INDEX_WORD_SEPARATOR and also in the mapping $qa_utf8punctuation which converts UTF-8 punctuation characters.

In the case of Chinese and similar languages, there are no word delimiters per se, but rather each multibyte (UTF-8) character is essentially a separate word.

So the solution is to modify function qa_string_to_words() in qa-util-string.php to identify UTF-8 characters in Chinese and similar languages, and split them into separate words as well. One simple way would be to do something at the start of the function that inserts a space before and after each Chinese character, which will then be detected later as a word separator.

Assuming you get this to work, it would be great if you would post your code.

Also, don't forget to click the 'Reindex' button in the 'Stats' section of the 'Admin' panel to make sure all the content is reindexed correctly.

+1 vote
by

My problem was solved by a coder from Korea, xguru. He is a member of this society as well.

That is what he did:

 

open qa-include/qa-db-selects.php file and go to qa_db_search_posts_selectspec() function.

 
Just before below line ( almost end of function )
 
if ($selectparts==0)
 
add this lines
 
 
if (!empty($handlewords)) {
$aaa = implode($handlewords);}
else
{$aaa = $handlewords;}
 
$selectspec['source'].=($selectparts++ ? " UNION ALL " : "").
"(SELECT postid AS questionid, 0 AS score, _utf8 'Q' AS matchposttype, postid AS matchpostid FROM ^posts JOIN ^users WHERE (^posts.title like _utf8 '%".$aaa."%' OR ^posts.content like _utf8 '%".$aaa."%' ) AND type='Q' )";
 
It works with the latest version.
 
Thank gidgreen, you designed a great script.
 
Thank xuguru, you saved my life.
 
 
by
谢谢!
Thank you!
by
This will work fine, except if your database starts containing a large amount of content, in which case it could become quite slow. In that case it's worth looking into the other solution I posted above.
...