Welcome to the Question2Answer Q&A. There's also a demo if you just want to try it out.
+4 votes
1.0k views
in Q2A Core by
I would like to use "Unicode" for mapping characters to smileys. But it seems Q2A filter is removing them. Is there any issue in doing this?

http://apps.timwhitlock.info/emoji/tables/unicode
Q2A version: 1.8

1 Answer

+3 votes
by
selected by
 
Best answer

The answer to your question is here: http://www.question2answer.org/qa/28769?show=28769#q28769

In a recent commit a function was implemented to remove those characters from the core.

If you are not using MySQL 5.5.3+  or maybe you don't want to change your database structure you might also consider switching those characters to HTML before storing them. E.G.: 😁. A filter module should do the trick.

by
Thank you very much. That should do for me :)
by
Hi pupi, can you please tell me which function I have to modify in the core to use utf8mb4? (I have already changed the Database)
by
Check qa_remove_utf8mb4(). Just replace it with:

function qa_remove_utf8mb4($string)
{
    return $string;
}

I think that should be it (appealing to my memory!)
by
Thank you very much. That worked great (y)
by
+1

Switching emojis to HTML before storing them is a cool suggestion that works nicely in comments and answers; it also works in question's content but not in question's title where emojis are unfortunately misrepresented, see picture below:


but users would actually expect something like this:


Another inconvenience is that it hinders the built-in search engine since it is highly improbable that users would enter HTML entities:


rather than emojis:

Also note that emojis are shown correctly in search result page headings but not in single question pages.

by
Using the same plugin that contains the filter module, you can use a layer to change the texts this way: https://stackoverflow.com/a/35046840/268273 So you would get the right title (after some processing)

I hardly believe someone will actually search for emojis. Aside from that, extra attention needs to be paid to how Q2A splits in words the search term. The fact that the emoji is displayed correctly, I bet is because the emoji has not been converted to the HTML code.
by
edited by

Thank @pupi1985 for your thoughts.

https://stackoverflow.com/a/35046840/268273

This link is about turning Unicode characters into HTML entities, which is useful for applying this replacement upon posting or editing questions/answers/comments; but the layer would turn escaped HTML entities (like `😀`) back to unescaped HTML entities (like `😀`) and then getting the right title (this is dangerous, by the way).

The downside of it is that users will be unable to enter literal strings with this format (like `😀`) in question's title by themselves.

Another (less desirable) solution would be an override for `qa_post_html_fields` that switches HTML entities (like `😀`) out for emojis (like the Grinning face emoji) before it is processed  by `qa_html` in  qa-include/app/format.php:346

after some processing

Yes, which is performed upon visiting single question pages and listings (once per list item), including search result listings; this might be a drawback for large Q2A sites.

how Q2A splits in words the search term

HTML entities are split by the semicolon; emojis, on the other hand, are interpreted as regular characters, e.g., the Grinning face emoji in the picture above would be one-character word, while the Bear face emoji + Sunflower emoji would be a two-character word.

Since the Grinning face emoji (one-character word) is different than `&#128512` (eight-character word), it will never find the post listed in the picture above; unless the search page is overridden so that emojis are swapped out for HTML entities before processing the search.

The takeaways from all of this are:

  1. If the database uses utf8mb4, it will take up more bytes to represent characters and some queries might run slower
  2. If the database uses utf8 + HTML entities, then some side effects and performance issues are expected for large Q2A websites

Pick your poison, but if option 2 is chosen, then the following steps are needed:

  1. Keep the database as it is: it still uses utf8
  2. [Core hack] Modify `qa_remove_utf8mb4` according to this comment
  3. Create a filter plugin for replacing emojis with HTML entities in questions, answers, and comments
  4. Add a layer (or an override) for representing HTML entities correctly in question's title
  5. [Almost a core hack] Override qa-include/pages/search.php such that emojis are swapped out for HTML entities before invoking the search engine
  6. [Core hack] Make qa-include/qa-feed.php restore escaped HTML entities in questions' titles, like in step 4

While these are the steps for option 1:

  1. [Core hack] Update the database encoding as explained here and here
  2. [Core hack] Modify `qa_remove_utf8mb4` according to this comment
by
I believe I misunderstood the issue. You want to document all the steps that do NOT require a utf8mb4 schema that would allow emojis to work all over Q2A.

That might not be so useful, because at the end of the day, you would have to update custom themes and plugins as well. Assuming addons don't exist, making the Q2A core emoji-compatible is a huge amount of work. The reason is simple: you would need all fields to support HTML. If this is not the case, you'll have to adapt each of them.

And it is not only visual stuff.If you save HTML in any field, the HTML will eat up characters. For question titles, that would mean you could exceed the maximum characters allowed because of the conversion.

The workaround of the filter plugin would only make sense for specific cases, not the whole core. If you want the whole core, just modify the database structure and be aware that a future Q2A database upgrade might fail because of this (although it would be simple to fix)
by
These thoughtful comment would be really useful for those who make the decision on adding support for emojis.

Thank you @pupi1985 for your time and effort :-)
...