Welcome to the Question2Answer Q&A. There's also a demo if you just want to try it out.
+1 vote
1.0k views
in Plugins by
edited by

I am working on a custom plugin that import posts from external website.

I use following code to create a post in QA platform with data retrieved from external website.

qa_post_create('Q', null, $title, $content, $format = '', $categoryid, $tags, $userid, $notify = null, $email = null, $extravalue = null, $name = null);

I get $title, $content, and $tags successfully from external website using simple HTML DOM parser. And I use strip_tags() function to remove html tags and keep only text.

Everything is well. the qa_post_create() function creates post. However, in content of just created post there are non-utf-8 characters such as 

ý (ý)
ü
“
—
”
Ç (Ç)

Actually, in $title there are also some characters such as (ý), but it appears normal, not as ý. It happens only in content of question.

When I just retrieve the data and print it with echo() or print_r() functions the text appear normal without any irregular symbols. These symbols appear only when I use qa_post_create() function to create question.

Also, when I change format from '' to 'html' in qa_post_create() function, it resolves. But I do not want to use format='html'. Is there any other way to fix it? 

Q2A version: q2a 1.7.5 customized

1 Answer

+2 votes
by
selected by
 
Best answer

Everything would be clearer if you query the database and show exactly what you're storing. You are saying that what you see in the browser is X and you're assuming you're storing X while, in practice, it doesn't really have to be that way. For example a bold text is actually stored in the database as <strong>text</strong>.

The DOM parser is most likely getting the HTML content HTML encoded (e.g.: é becomes &eacute;). You just need to de-HTML the content. Check this function:

Just run it for the fields that are HTML encoded before inserting them in the database.

by
Yes DOM gets html content, however, I use strip_tags() function to remove html tags. But the above problem occurs.

I have not tried html decode function, let me try if it works.
by
I assumed you were removing the tags. However, you will also need to convert the HTML that's left into plain text. I think with those 2 steps you should be fine to store it in plain text. Also, don't miss the first sentence of the answer :)
by
@pupi1985 many thanks html_​entity_​decode() function fixed my issue.
before fixing the content in database looked like this. http://i.hizliresim.com/DDg8Bm.png
...