Welcome to the Question2Answer Q&A. There's also a demo if you just want to try it out.
+1 vote
682 views
in Q2A Core by
I have same problem when searching question or answer in japanese. Like chinese, japanese sentence has no space between words.

Gidgreen gave us a clue to deal with such kind of problem inserting space between characters so it can be splitted into words .But I still have trouble with this. My site is not pure japanese, It mixes between Japanese and Vietnamese (separates words by space like English). For example, sometime i got a question like this: [ Vì sao người Nhật không nói 愛している ?] or [にあずからなかった cấu trúc này là như nào vậy] . In this case, the solutions does not work well, because it may change vietnamese word into splitted characters like this: không ---> k h ô n g

Have you got any idea to deal with this?
related to an answer for: Can not search in Chinese

1 Answer

0 votes
by

I'll need to look into this in greater detail, but the key would be to modify function qa_string_to_words(...) in qa-util-string.php as follows.

In the function, pre-process $string by looking for the UTF-8 byte codes that appear as part of Japanese characters, and use PHP's string replacement function to insert a space before each of those byte codes. That space will then be processed later on in the function to separate out words.

You can find a table of UTF-8 byte codes here.

FYI I hope to implement something like this in the next major release, for all languages that use ideograms.

...