Details
-
Task
-
Resolution: Fixed
-
None
-
Empty show more show less
Description
problem
With implementing the segment metaCache on each segment save the segments metaCache is updated. This update call is very slow for huge tasks. It seems that the whole data is considered instead only the affected segments of one transunit.
This must be improved by changing the UPDATE SQL for single segments. Another improvement could be TRANSLATE-1322.
The reason for the slow SQL query is that the DB systems (mariadb / mysql 5.5 / mysql 5.6) behave differences in their query optimization. The same query runs infinitely on one machine for a big task, on the other machine its finished in 15 seconds. The task has about 120.000 segments.
problem 2
A similar problem can occur in the single segment update. See at the end in section "background problem 2".
solution
- rebuild that bulk operations so that a temporary table is used instead a derived one
- see below
- for each segment save, optimize the query by adding the segments transunit to the derived table (a temporary table is not usefule here, and not needed with the additional where clause to shrink the resultset in the derived table)
example with temporary table for bulk update
Important is the adding of the INDEX to the siblings temp table!
CREATE TEMPORARY TABLE siblings (INDEX (`transunitId`)) AS (SELECT transunitId, GROUP_CONCAT(CONCAT('"',segmentId,'": ',siblingData) SEPARATOR ",") siblingData FROM LEK_segments_meta WHERE taskGuid = '{b5637842-f876-48ff-8fa4-ac4034606045}' GROUP BY transunitId); explain SELECT count(*) FROM LEK_segment_data d, LEK_segments s LEFT JOIN LEK_segments_meta m ON m.taskGuid = s.taskGuid AND m.segmentId = s.id LEFT JOIN siblings ON siblings.transunitId = m.transunitId WHERE d.taskGuid = '{b5637842-f876-48ff-8fa4-ac4034606045}' and s.taskGuid = d.taskGuid and d.segmentId = s.id GROUP BY d.segmentId;
background
The update of the metacache makes use of a derived table via "LEFT JOIN (SELECT ...)". The use of that derived table is optimized on my machine with MariaDB by using keys implicitly:
+------+-------------+-------------------+--------+--------------------+-----------+---------+--------------------------+--------+---------------------------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------------------+--------+--------------------+-----------+---------+--------------------------+--------+---------------------------------------------------------------------+ | 1 | PRIMARY | d | ref | segmentId,taskGuid | taskGuid | 116 | const | 503098 | Using index condition; Using where; Using temporary; Using filesort | | 1 | PRIMARY | s | eq_ref | PRIMARY,taskGuid | PRIMARY | 4 | icorrectT5.d.segmentId | 1 | Using where | | 1 | PRIMARY | m | eq_ref | segmentId,taskGuid | segmentId | 4 | icorrectT5.d.segmentId | 1 | Using where | | 1 | PRIMARY | <derived2> | ref | key0 | key0 | 195 | icorrectT5.m.transunitId | 10 | Using where | | 2 | DERIVED | LEK_segments_meta | ref | taskGuid | taskGuid | 116 | const | 255812 | Using index condition; Using where; Using filesort | +------+-------------+-------------------+--------+--------------------+-----------+---------+--------------------------+--------+---------------------------------------------------------------------+
On our server Mysql 5.5 is running instead, where the query is explained the following way:
+----+-------------+-------------------+--------+--------------------+-----------+---------+-------------------------------+--------+----------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------------------+--------+--------------------+-----------+---------+-------------------------------+--------+----------------------------------------------+ | 1 | PRIMARY | d | ref | segmentId,taskGuid | taskGuid | 116 | const | 241814 | Using where; Using temporary; Using filesort | | 1 | PRIMARY | s | eq_ref | PRIMARY,taskGuid | PRIMARY | 4 | testTranslate5Net.d.segmentId | 1 | Using where | | 1 | PRIMARY | m | eq_ref | segmentId,taskGuid | segmentId | 4 | testTranslate5Net.d.segmentId | 1 | | | 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 119168 | | | 2 | DERIVED | LEK_segments_meta | ALL | taskGuid | taskGuid | 116 | | 226640 | Using where; Using filesort | +----+-------------+-------------------+--------+--------------------+-----------+---------+-------------------------------+--------+----------------------------------------------+
The problem is shown in the fourth line with the <derived2> table, on the server all 120.000 segments are considered, on my machine implicit key0 is used and only 10 segments are considered, this is the reason for the big performance difference on both machines.
According to https://stackoverflow.com/questions/8916127/how-do-i-tell-the-mysql-optimizer-to-use-the-index-on-a-derived-table this should be changed in mysql 5.6, but the best workaround is to use a temporary table with keys instead.
See therefore https://stackoverflow.com/questions/5859391/create-a-temporary-table-in-a-select-statement-without-a-separate-create-table
background problem 2
EXPLAIN UPDATE LEK_segment_view_b6a53c54c8a510af2ac4a06f4b9f8468 view, ( SELECT m.segmentId, CONCAT('{"minWidth":', ifnull(m.minWidth, 'null'), ',"maxWidth":', ifnull(m.maxWidth, 'null'), ',"additionalUnitLength":', m.additionalUnitLength, ',"additionalMrkLength":', m.additionalMrkLength, ',"siblingData":{', ifnull(siblings.siblingData,''), '}}') metaCache FROM LEK_segment_data d, LEK_segments s LEFT JOIN LEK_segments_meta m ON m.taskGuid = s.taskGuid AND m.segmentId = s.id LEFT JOIN ( SELECT m1.transunitId, GROUP_CONCAT(CONCAT('"',m1.segmentId,'": ',m1.siblingData) SEPARATOR ",") siblingData FROM LEK_segments_meta m1 LEFT JOIN LEK_segments_meta m2 ON m2.segmentId = 731455 WHERE m1.taskGuid = '{adf5fd69-5e04-48b9-a6b9-8880066e1727}' AND m1.transunitId = m2.transunitId GROUP BY transunitId ) siblings ON siblings.transunitId = m.transunitId WHERE s.taskGuid = '{adf5fd69-5e04-48b9-a6b9-8880066e1727}' and m.transunitId = '1661_39_80' and d.segmentId = s.id GROUP BY d.segmentId ) data SET view.metaCache = data.metaCache WHERE view.id = data.segmentId; +----+-------------+------------+------------+--------+--------------------------------+----------------------+---------+------------------------+---------+----------+----------------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+------------+------------+--------+--------------------------------+----------------------+---------+------------------------+---------+----------+----------------------------------------------------+ | 1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 2982 | 100.00 | Using where | | 1 | UPDATE | view | NULL | eq_ref | PRIMARY | PRIMARY | 4 | data.segmentId | 1 | 100.00 | NULL | | 2 | DERIVED | d | NULL | index | segmentId | segmentId | 4 | NULL | 1192870 | 100.00 | Using index; Using temporary; Using filesort | | 2 | DERIVED | s | NULL | eq_ref | PRIMARY,taskGuid | PRIMARY | 4 | translate5.d.segmentId | 1 | 5.00 | Using where | | 2 | DERIVED | m | NULL | eq_ref | segmentId,taskGuid_transunitId | segmentId | 4 | translate5.d.segmentId | 1 | 5.00 | Using where | | 2 | DERIVED | <derived3> | NULL | ALL | NULL | NULL | NULL | NULL | 1 | 100.00 | Using where; Using join buffer (Block Nested Loop) | | 3 | DERIVED | m2 | NULL | const | segmentId | segmentId | 4 | const | 1 | 100.00 | NULL | | 3 | DERIVED | m1 | NULL | ref | taskGuid_transunitId | taskGuid_transunitId | 503 | const,const | 1 | 100.00 | NULL | +----+-------------+------------+------------+--------+--------------------------------+----------------------+---------+------------------------+---------+----------+----------------------------------------------------+ 8 rows in set (0.00 sec)
The JOIN between the view and the complex inner SELECT is here badly optimized. If we EXPLAIN only the inner SELECT, the select returns results in 0.00 seconds with the following execution plan:
+----+-------------+------------+------------+--------+--------------------------------+----------------------+---------+------------------------+------+----------+----------------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+------------+------------+--------+--------------------------------+----------------------+---------+------------------------+------+----------+----------------------------------------------------+ | 1 | PRIMARY | m | NULL | ref | segmentId,taskGuid_transunitId | taskGuid_transunitId | 503 | const,const | 1 | 100.00 | Using where; Using temporary; Using filesort | | 1 | PRIMARY | s | NULL | eq_ref | PRIMARY,taskGuid | PRIMARY | 4 | translate5.m.segmentId | 1 | 5.00 | Using where | | 1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 1 | 100.00 | Using where; Using join buffer (Block Nested Loop) | | 1 | PRIMARY | d | NULL | ref | segmentId | segmentId | 4 | translate5.m.segmentId | 2 | 100.00 | Using index | | 2 | DERIVED | m2 | NULL | const | segmentId | segmentId | 4 | const | 1 | 100.00 | NULL | | 2 | DERIVED | m1 | NULL | ref | taskGuid_transunitId | taskGuid_transunitId | 503 | const,const | 1 | 100.00 | NULL | +----+-------------+------------+------------+--------+--------------------------------+----------------------+---------+------------------------+------+----------+----------------------------------------------------+ 6 rows in set, 1 warning (0.00 sec)
solution
I tried different approaches (mysql optimizer hints for MySQL > 5.7, index hints etc, query reconstruction) but the only solution was to pack the select in a subselect:
EXPLAIN UPDATE LEK_segment_view_b6a53c54c8a510af2ac4a06f4b9f8468 view, (SELECT * FROM ( SELECT {... see above ...} ) innerData ) data SET view.metaCache = data.metaCache WHERE view.id = data.segmentId;
This additional sub select triggers the optimizer to run the inner SELECT first - which is very fine since that query results only the affected handful segments, after that it joins only the segments with the matching ID from the view.
execution plan with additional sub select
+----+-------------+------------+------------+--------+--------------------------------+----------------------+---------+------------------------+------+----------+----------------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+------------+------------+--------+--------------------------------+----------------------+---------+------------------------+------+----------+----------------------------------------------------+ | 1 | PRIMARY | <derived3> | NULL | ALL | NULL | NULL | NULL | NULL | 2 | 100.00 | Using where | | 1 | UPDATE | view | NULL | eq_ref | PRIMARY | PRIMARY | 4 | innerData.segmentId | 1 | 100.00 | NULL | | 3 | DERIVED | m | NULL | ref | segmentId,taskGuid_transunitId | taskGuid_transunitId | 503 | const,const | 1 | 100.00 | Using where; Using temporary; Using filesort | | 3 | DERIVED | s | NULL | eq_ref | PRIMARY,taskGuid | PRIMARY | 4 | translate5.m.segmentId | 1 | 5.00 | Using where | | 3 | DERIVED | <derived4> | NULL | ALL | NULL | NULL | NULL | NULL | 1 | 100.00 | Using where; Using join buffer (Block Nested Loop) | | 3 | DERIVED | d | NULL | ref | segmentId | segmentId | 4 | translate5.m.segmentId | 2 | 100.00 | Using index | | 4 | DERIVED | m2 | NULL | const | segmentId | segmentId | 4 | const | 1 | 100.00 | NULL | | 4 | DERIVED | m1 | NULL | ref | taskGuid_transunitId | taskGuid_transunitId | 503 | const,const | 1 | 100.00 | NULL | +----+-------------+------------+------------+--------+--------------------------------+----------------------+---------+------------------------+------+----------+----------------------------------------------------+
Attachments
Issue Links
- causes
-
TRANSLATE-1293 MySQL: Slow performance in rare system configurations
- Done
- relates to
-
TRANSLATE-1322 Improve import performance by disabling metacache update for import termtagging
- Selected for dev