Uploaded image for project: 'translate5'
  1. translate5
  2. TRANSLATE-4673

No DB index used when searching for plain terms

XMLWordPrintable

    • Critical
    • ON PREMISE: MariaDB version must be updated to be at least 11.4.5: docker compose pull; docker compose up -d
    • No proper DB index is used when using terminology for pre translations leading to slow performance there.

      problem

      Term Collections are used also for pre-translations.

      This is leading to a lot of queries on the term table scanning the full table:

      explain SELECT `terms_term`.* FROM `terms_term` WHERE (lower(term) = lower('data will contain file: line.') COLLATE utf8mb4_bin) AND (collectionId = 13) AND (languageId IN('5', '367', '368', '369', '370', '371', '372', '373', '374', '375', '376', '377', '251', '252')) GROUP BY `termEntryTbxId`;
      +------+-------------+------------+------+------------------------------------------------------------------------------+------+---------+------+--------+----------------------------------------------+
      | id   | select_type | table      | type | possible_keys                                                                | key  | key_len | ref  | rows   | Extra                                        |
      +------+-------------+------------+------+------------------------------------------------------------------------------+------+---------+------+--------+----------------------------------------------+
      |    1 | SIMPLE      | terms_term | ALL  | collectionId_idx,languageId_collectionId_idx,collectionId_termEntryTbxId_idx | NULL | NULL    | NULL | 671542 | Using where; Using temporary; Using filesort |
      +------+-------------+------------+------+------------------------------------------------------------------------------+------+---------+------+--------+----------------------------------------------+ 

      Even with an index on columns term, collectionId and languageId this is not used since different collations. 

      Since by default we have utf8mb4_general_ci collation on term column of terms_term table.

      That means when searching there something, this is treated case insenstive and a == ä.

      There fore we use collation utf8mb4_bin with lower texts on plain term search to simulate a search where a != ä but with case insensitive search. This needs a long time since no proper index can be used.

      Today there is a collation utf8mb4_0900_as_ci which treats a != ä but still case insensitive. Exactly what we want in the term search, allowing indexes to be used - reducing in my example the query from a full table scan to scan over 14 Terms only.

      solution

      change collation / add index

       

      alter table terms_term drop index fulltext; 
      
      ALTER TABLE terms_term
      MODIFY term VARCHAR(255)
      CHARACTER SET utf8mb4
      COLLATE utf8mb4_0900_as_ci
      NOT NULL;
      
      ALTER TABLE terms_term
      MODIFY proposal VARCHAR(255)
      CHARACTER SET utf8mb4
      COLLATE utf8mb4_0900_as_ci
      NOT NULL;
        
      alter table terms_term add FULLTEXT KEY `fulltext` (`term`,`proposal`);  

      Open question: Must the index re-recreated with the t5 recreate command?

       

      code change

      Throw away the lower and utf8mb4_bin cast in the findEntries method.

      testing

      Test if the change of the collation has other problems.

      Possible code places:

      • Method: findHomonym
      • Class TermPortletData
      • Sorting of Terms.
      • For sure \MittagQI\Translate5\Terminology\SearchCollection::findEntries

            pavelperminov Pavel Perminov
            tlauria Thomas Lauria
            Thomas Lauria
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: