Uploaded image for project: 'translate5'
  1. translate5
  2. TRANSLATE-1310

Improve import performance by SQL optimizing in metacache update

    XMLWordPrintable

Details

    Description

      problem

      With implementing the segment metaCache on each segment save the segments metaCache is updated. This update call is very slow for huge tasks. It seems that the whole data is considered instead only the affected segments of one transunit.

      This must be improved by changing the UPDATE SQL for single segments. Another improvement could be TRANSLATE-1322.

      The reason for the slow SQL query is that the DB systems (mariadb / mysql 5.5 / mysql 5.6) behave differences in their query optimization. The same query runs infinitely on one machine for a big task, on the other machine its finished in 15 seconds. The task has about 120.000 segments.

      problem 2

      A similar problem can occur in the single segment update. See at the end in section "background problem 2".

      solution

      • rebuild that bulk operations so that a temporary table is used instead a derived one
        • see below
      • for each segment save, optimize the query by adding the segments transunit to the derived table (a temporary table is not usefule here, and not needed with the additional where clause to shrink the resultset in the derived table)

      example with temporary table for bulk update

      Important is the adding of the INDEX to the siblings temp table!

      CREATE TEMPORARY TABLE siblings (INDEX (`transunitId`)) AS (SELECT transunitId, GROUP_CONCAT(CONCAT('"',segmentId,'": ',siblingData) SEPARATOR ",") siblingData             
          FROM LEK_segments_meta             
          WHERE taskGuid = '{b5637842-f876-48ff-8fa4-ac4034606045}'
          GROUP BY transunitId);
      
      explain SELECT count(*)
      FROM LEK_segment_data d, LEK_segments s 
      LEFT JOIN LEK_segments_meta m ON m.taskGuid = s.taskGuid AND m.segmentId = s.id  
      LEFT JOIN siblings ON siblings.transunitId = m.transunitId WHERE d.taskGuid = '{b5637842-f876-48ff-8fa4-ac4034606045}' and s.taskGuid = d.taskGuid and d.segmentId = s.id GROUP BY d.segmentId;
      

      background

      The update of the metacache makes use of a derived table via "LEFT JOIN (SELECT ...)". The use of that derived table is optimized on my machine with MariaDB by using keys implicitly:

      +------+-------------+-------------------+--------+--------------------+-----------+---------+--------------------------+--------+---------------------------------------------------------------------+
      | id   | select_type | table             | type   | possible_keys      | key       | key_len | ref                      | rows   | Extra                                                               |
      +------+-------------+-------------------+--------+--------------------+-----------+---------+--------------------------+--------+---------------------------------------------------------------------+
      |    1 | PRIMARY     | d                 | ref    | segmentId,taskGuid | taskGuid  | 116     | const                    | 503098 | Using index condition; Using where; Using temporary; Using filesort |
      |    1 | PRIMARY     | s                 | eq_ref | PRIMARY,taskGuid   | PRIMARY   | 4       | icorrectT5.d.segmentId   |      1 | Using where                                                         |
      |    1 | PRIMARY     | m                 | eq_ref | segmentId,taskGuid | segmentId | 4       | icorrectT5.d.segmentId   |      1 | Using where                                                         |
      |    1 | PRIMARY     | <derived2>        | ref    | key0               | key0      | 195     | icorrectT5.m.transunitId |     10 | Using where                                                         |
      |    2 | DERIVED     | LEK_segments_meta | ref    | taskGuid           | taskGuid  | 116     | const                    | 255812 | Using index condition; Using where; Using filesort                  |
      +------+-------------+-------------------+--------+--------------------+-----------+---------+--------------------------+--------+---------------------------------------------------------------------+
      

      On our server Mysql 5.5 is running instead, where the query is explained the following way:

       

       

      +----+-------------+-------------------+--------+--------------------+-----------+---------+-------------------------------+--------+----------------------------------------------+
      | id | select_type | table             | type   | possible_keys      | key       | key_len | ref                           | rows   | Extra                                        |
      +----+-------------+-------------------+--------+--------------------+-----------+---------+-------------------------------+--------+----------------------------------------------+
      |  1 | PRIMARY     | d                 | ref    | segmentId,taskGuid | taskGuid  | 116     | const                         | 241814 | Using where; Using temporary; Using filesort |
      |  1 | PRIMARY     | s                 | eq_ref | PRIMARY,taskGuid   | PRIMARY   | 4       | testTranslate5Net.d.segmentId |      1 | Using where                                  |
      |  1 | PRIMARY     | m                 | eq_ref | segmentId,taskGuid | segmentId | 4       | testTranslate5Net.d.segmentId |      1 |                                              |
      |  1 | PRIMARY     | <derived2>        | ALL    | NULL               | NULL      | NULL    | NULL                          | 119168 |                                              |
      |  2 | DERIVED     | LEK_segments_meta | ALL    | taskGuid           | taskGuid  | 116     |                               | 226640 | Using where; Using filesort                  |
      +----+-------------+-------------------+--------+--------------------+-----------+---------+-------------------------------+--------+----------------------------------------------+
      
      

      The problem is shown in the fourth line with the <derived2> table, on the server all 120.000 segments are considered, on my machine implicit key0 is used and only 10 segments are considered, this is the reason for the big performance difference on both machines.

       

      According to https://stackoverflow.com/questions/8916127/how-do-i-tell-the-mysql-optimizer-to-use-the-index-on-a-derived-table this should be changed in mysql 5.6, but the best workaround is to use a temporary table with keys instead.

      See therefore https://stackoverflow.com/questions/5859391/create-a-temporary-table-in-a-select-statement-without-a-separate-create-table

      background problem 2

       

      EXPLAIN UPDATE LEK_segment_view_b6a53c54c8a510af2ac4a06f4b9f8468 view, (
        SELECT m.segmentId, CONCAT('{"minWidth":', ifnull(m.minWidth, 'null'), ',"maxWidth":', ifnull(m.maxWidth, 'null'), ',"additionalUnitLength":', m.additionalUnitLength, ',"additionalMrkLength":', m.additionalMrkLength, ',"siblingData":{', ifnull(siblings.siblingData,''), '}}') metaCache
        FROM LEK_segment_data d, LEK_segments s
        LEFT JOIN LEK_segments_meta m ON m.taskGuid = s.taskGuid AND m.segmentId = s.id
        LEFT JOIN (
          SELECT m1.transunitId, GROUP_CONCAT(CONCAT('"',m1.segmentId,'": ',m1.siblingData) SEPARATOR ",") siblingData
          FROM LEK_segments_meta m1
          LEFT JOIN LEK_segments_meta m2 ON m2.segmentId = 731455
          WHERE m1.taskGuid = '{adf5fd69-5e04-48b9-a6b9-8880066e1727}' AND m1.transunitId = m2.transunitId
          GROUP BY transunitId
        ) siblings ON siblings.transunitId = m.transunitId
        WHERE s.taskGuid = '{adf5fd69-5e04-48b9-a6b9-8880066e1727}' and m.transunitId = '1661_39_80' and d.segmentId = s.id GROUP BY d.segmentId
      ) data SET view.metaCache = data.metaCache WHERE view.id = data.segmentId;
      +----+-------------+------------+------------+--------+--------------------------------+----------------------+---------+------------------------+---------+----------+----------------------------------------------------+
      | id | select_type | table      | partitions | type   | possible_keys                  | key                  | key_len | ref                    | rows    | filtered | Extra                                              |
      +----+-------------+------------+------------+--------+--------------------------------+----------------------+---------+------------------------+---------+----------+----------------------------------------------------+
      |  1 | PRIMARY     | <derived2> | NULL       | ALL    | NULL                           | NULL                 | NULL    | NULL                   |    2982 |   100.00 | Using where                                        |
      |  1 | UPDATE      | view       | NULL       | eq_ref | PRIMARY                        | PRIMARY              | 4       | data.segmentId         |       1 |   100.00 | NULL                                               |
      |  2 | DERIVED     | d          | NULL       | index  | segmentId                      | segmentId            | 4       | NULL                   | 1192870 |   100.00 | Using index; Using temporary; Using filesort       |
      |  2 | DERIVED     | s          | NULL       | eq_ref | PRIMARY,taskGuid               | PRIMARY              | 4       | translate5.d.segmentId |       1 |     5.00 | Using where                                        |
      |  2 | DERIVED     | m          | NULL       | eq_ref | segmentId,taskGuid_transunitId | segmentId            | 4       | translate5.d.segmentId |       1 |     5.00 | Using where                                        |
      |  2 | DERIVED     | <derived3> | NULL       | ALL    | NULL                           | NULL                 | NULL    | NULL                   |       1 |   100.00 | Using where; Using join buffer (Block Nested Loop) |
      |  3 | DERIVED     | m2         | NULL       | const  | segmentId                      | segmentId            | 4       | const                  |       1 |   100.00 | NULL                                               |
      |  3 | DERIVED     | m1         | NULL       | ref    | taskGuid_transunitId           | taskGuid_transunitId | 503     | const,const            |       1 |   100.00 | NULL                                               |
      +----+-------------+------------+------------+--------+--------------------------------+----------------------+---------+------------------------+---------+----------+----------------------------------------------------+
      8 rows in set (0.00 sec)
      

       

      The JOIN between the view and the complex inner SELECT is here badly optimized. If we EXPLAIN only the inner SELECT, the select returns results in 0.00 seconds with the following execution plan: 

      +----+-------------+------------+------------+--------+--------------------------------+----------------------+---------+------------------------+------+----------+----------------------------------------------------+
      | id | select_type | table      | partitions | type   | possible_keys                  | key                  | key_len | ref                    | rows | filtered | Extra                                              |
      +----+-------------+------------+------------+--------+--------------------------------+----------------------+---------+------------------------+------+----------+----------------------------------------------------+
      |  1 | PRIMARY     | m          | NULL       | ref    | segmentId,taskGuid_transunitId | taskGuid_transunitId | 503     | const,const            |    1 |   100.00 | Using where; Using temporary; Using filesort       |
      |  1 | PRIMARY     | s          | NULL       | eq_ref | PRIMARY,taskGuid               | PRIMARY              | 4       | translate5.m.segmentId |    1 |     5.00 | Using where                                        |
      |  1 | PRIMARY     | <derived2> | NULL       | ALL    | NULL                           | NULL                 | NULL    | NULL                   |    1 |   100.00 | Using where; Using join buffer (Block Nested Loop) |
      |  1 | PRIMARY     | d          | NULL       | ref    | segmentId                      | segmentId            | 4       | translate5.m.segmentId |    2 |   100.00 | Using index                                        |
      |  2 | DERIVED     | m2         | NULL       | const  | segmentId                      | segmentId            | 4       | const                  |    1 |   100.00 | NULL                                               |
      |  2 | DERIVED     | m1         | NULL       | ref    | taskGuid_transunitId           | taskGuid_transunitId | 503     | const,const            |    1 |   100.00 | NULL                                               |
      +----+-------------+------------+------------+--------+--------------------------------+----------------------+---------+------------------------+------+----------+----------------------------------------------------+
      6 rows in set, 1 warning (0.00 sec)
      
      

       

      solution

      I tried different approaches (mysql optimizer hints for MySQL > 5.7, index hints etc, query reconstruction) but the only solution was to pack the select in a subselect:

      EXPLAIN UPDATE LEK_segment_view_b6a53c54c8a510af2ac4a06f4b9f8468 view, (SELECT * FROM (
        SELECT {... see above ...}
      ) innerData ) data SET view.metaCache = data.metaCache WHERE view.id = data.segmentId;
      

      This additional sub select triggers the optimizer to run the inner SELECT first - which is very fine since that query results only the affected handful segments, after that it joins only the segments with the matching ID from the view.

      execution plan with additional sub select

      +----+-------------+------------+------------+--------+--------------------------------+----------------------+---------+------------------------+------+----------+----------------------------------------------------+
      | id | select_type | table      | partitions | type   | possible_keys                  | key                  | key_len | ref                    | rows | filtered | Extra                                              |
      +----+-------------+------------+------------+--------+--------------------------------+----------------------+---------+------------------------+------+----------+----------------------------------------------------+
      |  1 | PRIMARY     | <derived3> | NULL       | ALL    | NULL                           | NULL                 | NULL    | NULL                   |    2 |   100.00 | Using where                                        |
      |  1 | UPDATE      | view       | NULL       | eq_ref | PRIMARY                        | PRIMARY              | 4       | innerData.segmentId    |    1 |   100.00 | NULL                                               |
      |  3 | DERIVED     | m          | NULL       | ref    | segmentId,taskGuid_transunitId | taskGuid_transunitId | 503     | const,const            |    1 |   100.00 | Using where; Using temporary; Using filesort       |
      |  3 | DERIVED     | s          | NULL       | eq_ref | PRIMARY,taskGuid               | PRIMARY              | 4       | translate5.m.segmentId |    1 |     5.00 | Using where                                        |
      |  3 | DERIVED     | <derived4> | NULL       | ALL    | NULL                           | NULL                 | NULL    | NULL                   |    1 |   100.00 | Using where; Using join buffer (Block Nested Loop) |
      |  3 | DERIVED     | d          | NULL       | ref    | segmentId                      | segmentId            | 4       | translate5.m.segmentId |    2 |   100.00 | Using index                                        |
      |  4 | DERIVED     | m2         | NULL       | const  | segmentId                      | segmentId            | 4       | const                  |    1 |   100.00 | NULL                                               |
      |  4 | DERIVED     | m1         | NULL       | ref    | taskGuid_transunitId           | taskGuid_transunitId | 503     | const,const            |    1 |   100.00 | NULL                                               |
      +----+-------------+------------+------------+--------+--------------------------------+----------------------+---------+------------------------+------+----------+----------------------------------------------------+
      
      

       

      Attachments

        Issue Links

          Activity

            People

              tlauria Thomas Lauria
              tlauria Thomas Lauria
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: