Uploaded image for project: 'translate5'
  1. translate5
  2. TRANSLATE-2045

Use utf8mb4 charset for DB

    XMLWordPrintable

Details

    • Critical
    • Hide
      - This update modifies the whole data of the database!
      Therefore a back up of the database directly before the update is mandatory!
      - On larger installations this update may need some time (for example for 2.5 million segments in 700 tasks the script needed about 15 minutes)
      - The database script requires user with ALTER privilege on the database - which should be the case if installed translate5 as described in the installation guide.
      - In the case of errors collect them and send them to us and restore your backup.
      - In the issue description is a manual guide for the migration
      Show
      - This update modifies the whole data of the database! Therefore a back up of the database directly before the update is mandatory! - On larger installations this update may need some time (for example for 2.5 million segments in 700 tasks the script needed about 15 minutes) - The database script requires user with ALTER privilege on the database - which should be the case if installed translate5 as described in the installation guide. - In the case of errors collect them and send them to us and restore your backup. - In the issue description is a manual guide for the migration
    • Change all utf8 fields to the mysql datatype utf8mb4.
    • -

    Description

      Reason: https://medium.com/@adamhooper/in-mysql-never-use-utf8-use-utf8mb4-11761243e434

      change all to utf8mb4:

      • in confluence documentation
      • in preconditioncheck
      • modify all columns / tables via alter script

      Manuel guide for utf8 to utf8mb4 migration.

      1. Change the database character set and collation

      ALTER DATABASE `<database name>` CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

      2. Disables the foreign keys check

      SET FOREIGN_KEY_CHECKS=0;

      3. Change the field types.

      ALTER TABLE `LEK_term_proposal`
      CHANGE COLUMN `term` `term` TEXT CHARACTER SET 'utf8mb4' NOT NULL DEFAULT '' COMMENT 'the proposed term' ;
       
      ALTER TABLE `LEK_terms`
      CHANGE COLUMN `term` `term` TEXT CHARACTER SET 'utf8mb4' NOT NULL DEFAULT '' ;
       
      ALTER TABLE `LEK_plugin_segmentstatistic_terms`
      CHANGE COLUMN `term` `term` TEXT CHARACTER SET 'utf8mb4' NOT NULL DEFAULT '' ;
       
      ALTER TABLE `LEK_term_history`
      CHANGE COLUMN `term` `term` TEXT CHARACTER SET 'utf8mb4' NOT NULL DEFAULT '' ;

      4. This script will auto-generate the alter scripts for all tables for character set and collation migration from
      utf8 to utf8mb4 and utf8_general_ci to utf8mb4_unicode_ci. Execute all result scripts after step 5.

      SELECT CONCAT( 
       'ALTER TABLE ', table_name, ' CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ', 
       'ALTER TABLE ', table_name, ' CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ') as r 
       FROM information_schema.TABLES AS T, information_schema.`COLLATION_CHARACTER_SET_APPLICABILITY` AS C 
       WHERE C.collation_name = T.table_collation 
       AND T.table_schema = <database name> 
       AND (C.CHARACTER_SET_NAME != 'utf8mb4' OR C.COLLATION_NAME not like 'utf8mb4%

       

      5. This script will auto-generate the alter scripts for migration from utf8_bin to utf8mb4_bin for all table columns where the table collation is utf8_bin.
      Execute this script after the scripts from the step 4 are executed.

      SELECT CONCAT('ALTER TABLE ', 
       t1.table_name, 
       ' MODIFY ', 
       t1.column_name, 
       ' ', 
       if(lower(t1.data_type) = 'varchar',concat(t1.data_type,'(' , CHARACTER_MAXIMUM_LENGTH, ')'),t1.data_type),
       ' CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;'
       )as r
      FROM 
       information_schema.columns t1
      WHERE 
       t1.TABLE_SCHEMA like <database name> AND
       t1.COLLATION_NAME IS NOT NULL AND
       t1.COLLATION_NAME IN ('utf8_bin');

      6. Client character set

      SET NAMES utf8mb4;

      7. Generate repair and optimize queries for all database tables.

      SELECT CONCAT( 'REPAIR TABLE ', table_name, '; OPTIMIZE TABLE ',table_name,';') as r 
       FROM information_schema.TABLES AS T, information_schema.`COLLATION_CHARACTER_SET_APPLICABILITY` AS C 
       WHERE C.collation_name = T.table_collation 
       AS AND T.table_schema = <database name> ;

       

      Make sure to set the client and server character set as well.

      Add In my.cnf:

      [client]
      default-character-set = utf8mb4
      
      [mysql]
      default-character-set = utf8mb4
      
      [mysqld]
      character-set-client-handshake = FALSE
      character-set-server = utf8mb4
      collation-server = utf8mb4_unicode_ci

      Additional info:

      character-set-client-handshake=FALSE
      With this the server will ignore character set set from the client. With this we will assure that utf8mb4 will be used always

      Attachments

        Activity

          People

            aleksandar Aleksandar Mitrev
            tlauria Thomas Lauria
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: