The lack of
utf8mb4 support in Tiki before
Tiki19 will cause issues if you attempt to use these special characters (This usually happens for emojis)
You can type them in but they are not properly handled after. Issues can go from characters being ignored or cause blank pages when viewing.
utf8mb4 is supported starting in Tiki19, including upgrading from previous versions. So if you upgrade
- from Tiki18 or older
- to Tiki 19 or more recent (ex. you can upgrade from Tiki15 to Tiki21 in one step)
the migration script will convert your database tables.
This is a transition other PHP/MySQL-MariaDB projects also had to go through:
Please note that Tiki18 and older can be inatalled on a utf8mb4 database, but Tiki code won't take advantage of it, and thus can't propey deal with emojis and other special characters that are in utf8mb4 but not in utf8.
Related wishes
Legacy discussions
http://tonyshowoff.com/articles/better-unicode-support-for-mysql-including-emoji/
We could simply convert the utf8 encoding to utf8mb4 like was done with utf8 in Tiki, but there is a slight performance penalty when using utf8mb4 over utf8.
One must also take into consideration that information stored in MySQL is stored in multibyte variable format, (when varchar is used) but not when char is, so chars (currently taking 3bytes) will need 4 under utf8mb4. Using ascii encoding only requires one byte. It might be a good idea if we start choosing encoding in Tiki based on the need for basic text (ascii) multi language (utf😎 and emoji support (utf8mb4), that would ensure the smallest and fastest database footprint. -drsassafras
In preparation for using this encoding, and also for good general database practices, we should cut down on the log indexes that occasionally occur. These were likely introduced by mistake by coders who didn't understand the optimal size of MySQL indexes anyhow. Index are not stored in variable encoding, so the addition of an extra byte per character in converted utf8mb4 encoding could (and does in a few places) exceed MySQL maximum index size. -drsassafras
Another barrier to implementation is that utf8mb4 is supported from MySQL 5.5.3 (early 2010) . Right now our current minimum MySQL version is 5.0. The easiest way to deal with this is likely to bump the minimum version number to 5.5.3. Almost all MySQL servers are at least this version at this point, I'm relatively sure. Might need to change the MySQL version checks etc. -drsassafras
@drsassafras: Post-LTS versions are a great time to bump requirements. Just after Tiki 18 LTS is released, we can do so, and everyone who can't yet follow can stay on Tiki18 LTS for 5 years. See Versions. I think we should move to PHP7.1 or 7.2 as well.
https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-conversion.html
^ The lack of [https://stackoverflow.com/questions/30074492/what-is-the-difference-between-utf8mb4-and-utf8-charsets-in-mysql|utf8mb4] support in Tiki before ((Tiki19)) will cause issues if you attempt to use these special characters (This usually happens for emojis)
You can type them in but they are not properly handled after. Issues can go from characters being ignored or cause blank pages when viewing.
utf8mb4 is supported starting in ((Tiki19)), including upgrading from previous versions. So if you upgrade
* from Tiki18 or older
* to Tiki 19 or more recent (ex. you can upgrade from Tiki15 to Tiki21 in one step)
the migration script will convert your database tables.
This is a transition other PHP/MySQL-MariaDB projects also had to go through:
* https://core.trac.wordpress.org/ticket/21212
* https://www.drupal.org/node/1314214
Please note that Tiki18 and older can be inatalled on a utf8mb4 database, but Tiki code won't take advantage of it, and thus can't propey deal with emojis and other special characters that are in utf8mb4 but not in utf8.
^
!! Related wishes
* {wish id=4885}
* {wish id=6189}
* {wish id=6245}
* {wish id=6191}
!! Legacy discussions
http://tonyshowoff.com/articles/better-unicode-support-for-mysql-including-emoji/
We could simply convert the utf8 encoding to utf8mb4 like was done with utf8 in Tiki, but there is a slight performance penalty when using utf8mb4 over utf8.
One must also take into consideration that information stored in MySQL is stored in multibyte variable format, (when varchar is used) but not when char is, so chars (currently taking 3bytes) will need 4 under utf8mb4. Using ascii encoding only requires one byte. It might be a good idea if we start choosing encoding in Tiki based on the need for basic text (ascii) multi language (utf8) and emoji support (utf8mb4), that would ensure the smallest and fastest database footprint. -drsassafras
In preparation for using this encoding, and also for good general database practices, we should cut down on the log indexes that occasionally occur. These were likely introduced by mistake by coders who didn't understand the optimal size of MySQL indexes anyhow. Index are not stored in variable encoding, so the addition of an extra byte per character in converted utf8mb4 encoding could (and does in a few places) exceed MySQL maximum index size. -drsassafras
Another barrier to implementation is that utf8mb4 is supported from MySQL 5.5.3 (early 2010) . Right now our current minimum MySQL version is 5.0. The easiest way to deal with this is likely to bump the minimum version number to 5.5.3. Almost all MySQL servers are at least this version at this point, I'm relatively sure. Might need to change the MySQL version checks etc. -drsassafras
@drsassafras: Post-LTS versions are a great time to bump requirements. Just after Tiki 18 LTS is released, we can do so, and everyone who can't yet follow can stay on Tiki18 LTS for 5 years. See ((tw:Versions)). I think we should move to PHP7.1 or 7.2 as well. {sign user="marclaporte" datetime="2017-04-25T05:51:06+00:00"}
https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-conversion.html