Use utf8mb4 charset for DB

XMLWordPrintable

    • Critical
    • Hide
      - This update modifies the whole data of the database!
      Therefore a back up of the database directly before the update is mandatory!
      - On larger installations this update may need some time (for example for 2.5 million segments in 700 tasks the script needed about 15 minutes)
      - The database script requires user with ALTER privilege on the database - which should be the case if installed translate5 as described in the installation guide.
      - In the case of errors collect them and send them to us and restore your backup.
      - In the issue description is a manual guide for the migration
      Show
      - This update modifies the whole data of the database! Therefore a back up of the database directly before the update is mandatory! - On larger installations this update may need some time (for example for 2.5 million segments in 700 tasks the script needed about 15 minutes) - The database script requires user with ALTER privilege on the database - which should be the case if installed translate5 as described in the installation guide. - In the case of errors collect them and send them to us and restore your backup. - In the issue description is a manual guide for the migration
    • Change all utf8 fields to the mysql datatype utf8mb4.

      Reason: https://medium.com/@adamhooper/in-mysql-never-use-utf8-use-utf8mb4-11761243e434

      change all to utf8mb4:

      • in confluence documentation
      • in preconditioncheck
      • modify all columns / tables via alter script

      Manuel guide for utf8 to utf8mb4 migration.

      1. Change the database character set and collation

      ALTER DATABASE `<database name>` CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

      2. Disables the foreign keys check

      SET FOREIGN_KEY_CHECKS=0;

      3. Change the field types.

      ALTER TABLE `LEK_term_proposal`
      CHANGE COLUMN `term` `term` TEXT CHARACTER SET 'utf8mb4' NOT NULL DEFAULT '' COMMENT 'the proposed term' ;
       
      ALTER TABLE `LEK_terms`
      CHANGE COLUMN `term` `term` TEXT CHARACTER SET 'utf8mb4' NOT NULL DEFAULT '' ;
       
      ALTER TABLE `LEK_plugin_segmentstatistic_terms`
      CHANGE COLUMN `term` `term` TEXT CHARACTER SET 'utf8mb4' NOT NULL DEFAULT '' ;
       
      ALTER TABLE `LEK_term_history`
      CHANGE COLUMN `term` `term` TEXT CHARACTER SET 'utf8mb4' NOT NULL DEFAULT '' ;

      4. This script will auto-generate the alter scripts for all tables for character set and collation migration from
      utf8 to utf8mb4 and utf8_general_ci to utf8mb4_unicode_ci. Execute all result scripts after step 5.

      SELECT CONCAT( 
       'ALTER TABLE ', table_name, ' CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ', 
       'ALTER TABLE ', table_name, ' CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ') as r 
       FROM information_schema.TABLES AS T, information_schema.`COLLATION_CHARACTER_SET_APPLICABILITY` AS C 
       WHERE C.collation_name = T.table_collation 
       AND T.table_schema = <database name> 
       AND (C.CHARACTER_SET_NAME != 'utf8mb4' OR C.COLLATION_NAME not like 'utf8mb4%

       

      5. This script will auto-generate the alter scripts for migration from utf8_bin to utf8mb4_bin for all table columns where the table collation is utf8_bin.
      Execute this script after the scripts from the step 4 are executed.

      SELECT CONCAT('ALTER TABLE ', 
       t1.table_name, 
       ' MODIFY ', 
       t1.column_name, 
       ' ', 
       if(lower(t1.data_type) = 'varchar',concat(t1.data_type,'(' , CHARACTER_MAXIMUM_LENGTH, ')'),t1.data_type),
       ' CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;'
       )as r
      FROM 
       information_schema.columns t1
      WHERE 
       t1.TABLE_SCHEMA like <database name> AND
       t1.COLLATION_NAME IS NOT NULL AND
       t1.COLLATION_NAME IN ('utf8_bin');

      6. Client character set

      SET NAMES utf8mb4;

      7. Generate repair and optimize queries for all database tables.

      SELECT CONCAT( 'REPAIR TABLE ', table_name, '; OPTIMIZE TABLE ',table_name,';') as r 
       FROM information_schema.TABLES AS T, information_schema.`COLLATION_CHARACTER_SET_APPLICABILITY` AS C 
       WHERE C.collation_name = T.table_collation 
       AS AND T.table_schema = <database name> ;

       

      Make sure to set the client and server character set as well.

      Add In my.cnf:

      [client]
      default-character-set = utf8mb4
      
      [mysql]
      default-character-set = utf8mb4
      
      [mysqld]
      character-set-client-handshake = FALSE
      character-set-server = utf8mb4
      collation-server = utf8mb4_unicode_ci

      Additional info:

      character-set-client-handshake=FALSE
      With this the server will ignore character set set from the client. With this we will assure that utf8mb4 will be used always

            Assignee:
            Aleksandar Mitrev
            Reporter:
            Thomas Lauria
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: