Uploaded image for project: 'translate5'
  1. translate5
  2. TRANSLATE-5009

Optimize term image handling for huge term sets

XMLWordPrintable

    • Critical
    • The query that removes unused image records has been optimized. It now uses better indexing and avoids unnecessary full table scans, resulting in a faster cleanup of old data.

      Problem:

      The current DELETE query on terms_images is performing poorly in MariaDB.

      explain DELETE i FROM `terms_images` i
          ->                 LEFT JOIN (
          ->                     SELECT id, target FROM `terms_attributes` WHERE collectionId = '118'
          ->                 ) a ON i.targetId = a.target   
          ->                 WHERE a.id IS NULL AND i.collectionId = '118';
      +------+-------------+------------------+------+-----------------------------------+------+---------+-------------------+---------+-------------+
      | id   | select_type | table            | type | possible_keys                     | key  | key_len | ref               | rows    | Extra       |
      +------+-------------+------------------+------+-----------------------------------+------+---------+-------------------+---------+-------------+
      |    1 | PRIMARY     | i                | ALL  | fk_terms_images_languageresources | NULL | NULL    | NULL              | 1641    | Using where |
      |    1 | PRIMARY     | <derived2>       | ref  | key0                              | key0 | 403     | DBXXXX.i.targetId | 1314    | Using where |
      |    2 | DERIVED     | terms_attributes | ALL  | collectionId_idx                  | NULL | NULL    | NULL              | 3512342 | Using where |
      +------+-------------+------------------+------+-----------------------------------+------+---------+-------------------+---------+-------------+ 
      • The derived table (SELECT ... FROM terms_attributes) forces MariaDB to scan all 3.5M rows from terms_attributes.
      • The index on collectionId (collectionId_idx) is not used in this context.
      • As a result, the optimizer performs a full table scan instead of leveraging the index.

      solution

      1. Avoid derived table by rewriting the query:
      2. Add a composite index on (collectionId, target) in terms_attributes:
      DELETE i
      FROM terms_images i
      LEFT JOIN terms_attributes a
        ON i.targetId = a.target
       AND a.collectionId = 118
      WHERE a.id IS NULL
        AND i.collectionId = 118;
      
      
       CREATE INDEX idx_terms_attributes_collection_target
        ON terms_attributes (collectionId, target);
      

      Expected Result:

      Optimizer should switch from ALL table scan to a range or ref scan. Query execution time should improve significantly on large datasets.

            pavelperminov Pavel Perminov
            tlauria Thomas Lauria
            Thomas Lauria
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: