MySQL / MariaDB replace broken HTML entities

After an upgrade of Friendica, a PHP application using MySQL or MariaDB as a backend I found that while the columns containing the raw textual content had been changed successfully from utf8 to utf8mb4 character sets, some other colums that contained pre-rendered HTML were broken, as they now contained 2 byte sequences turned into HTML entities.

As an example, here is how the German a-umlaut looked like: „ä“ (instead of „ä“) – in the database table this was encoded as „ä“ (instead of „ä“). Things like this can be repaired using the REPLACE function. Here are examples for some common HTML entities and their 2-byte sequences (item-content is the table name, rendered-html is the column that contains the mangled characters):

UPDATE `item-content`
SET `rendered-html` = REPLACE(`rendered-html`, ‚ä‘, ‚ä‘)
WHERE `rendered-html` LIKE ‚%Ã%’\G

UPDATE `item-content`
SET `rendered-html` = REPLACE(`rendered-html`, ‚ü‘, ‚ü‘)
WHERE `rendered-html` LIKE ‚%Ã%’\G

UPDATE `item-content`
SET `rendered-html` = REPLACE(`rendered-html`, ‚ö‘, ‚ö‘)
WHERE `rendered-html` LIKE ‚%Ã%’\G

UPDATE `item-content`
SET `rendered-html` = REPLACE(`rendered-html`, ‚Ä‘, ‚Ä‘)
WHERE `rendered-html` LIKE ‚%Ã%’\G

UPDATE `item-content`
SET `rendered-html` = REPLACE(`rendered-html`, ‚Ãœ‘, ‚Ü‘)
WHERE `rendered-html` LIKE ‚%Ã%’\G

UPDATE `item-content`
SET `rendered-html` = REPLACE(`rendered-html`, ‚Ö‘, ‚Ö‘)
WHERE `rendered-html` LIKE ‚%Ã%’\G

UPDATE `item-content`
SET `rendered-html` = REPLACE(`rendered-html`, ‚ß‘, ‚ß‘)
WHERE `rendered-html` LIKE ‚%Ã%’\G

UPDATE `item-content`
SET `rendered-html` = REPLACE(`rendered-html`, ‚à‘, ‚à‘)
WHERE `rendered-html` LIKE ‚%Ã%’\G

UPDATE `item-content`
SET `rendered-html` = REPLACE(`rendered-html`, ‚è‘, ‚è‘)
WHERE `rendered-html` LIKE ‚%Ã%’\G

Discussion Area - Leave a Comment