-
Bug
-
Resolution: Fixed
-
None
-
Critical
-
ON PREMISE: MariaDB version must be updated to be at least 11.4.5: docker compose pull; docker compose up -d
-
No proper DB index is used when using terminology for pre translations leading to slow performance there.
-
Emptyshow more show less
problem
Term Collections are used also for pre-translations.
This is leading to a lot of queries on the term table scanning the full table:
explain SELECT `terms_term`.* FROM `terms_term` WHERE (lower(term) = lower('data will contain file: line.') COLLATE utf8mb4_bin) AND (collectionId = 13) AND (languageId IN('5', '367', '368', '369', '370', '371', '372', '373', '374', '375', '376', '377', '251', '252')) GROUP BY `termEntryTbxId`; +------+-------------+------------+------+------------------------------------------------------------------------------+------+---------+------+--------+----------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+------------+------+------------------------------------------------------------------------------+------+---------+------+--------+----------------------------------------------+ | 1 | SIMPLE | terms_term | ALL | collectionId_idx,languageId_collectionId_idx,collectionId_termEntryTbxId_idx | NULL | NULL | NULL | 671542 | Using where; Using temporary; Using filesort | +------+-------------+------------+------+------------------------------------------------------------------------------+------+---------+------+--------+----------------------------------------------+
Even with an index on columns term, collectionId and languageId this is not used since different collations.
Since by default we have utf8mb4_general_ci collation on term column of terms_term table.
That means when searching there something, this is treated case insenstive and a == ä.
There fore we use collation utf8mb4_bin with lower texts on plain term search to simulate a search where a != ä but with case insensitive search. This needs a long time since no proper index can be used.
Today there is a collation utf8mb4_0900_as_ci which treats a != ä but still case insensitive. Exactly what we want in the term search, allowing indexes to be used - reducing in my example the query from a full table scan to scan over 14 Terms only.
solution
change collation / add index
alter table terms_term drop index fulltext; ALTER TABLE terms_term MODIFY term VARCHAR(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_as_ci NOT NULL; ALTER TABLE terms_term MODIFY proposal VARCHAR(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_as_ci NOT NULL; alter table terms_term add FULLTEXT KEY `fulltext` (`term`,`proposal`);
Open question: Must the index re-recreated with the t5 recreate command?
code change
Throw away the lower and utf8mb4_bin cast in the findEntries method.
testing
Test if the change of the collation has other problems.
Possible code places:
- Method: findHomonym
- Class TermPortletData
- Sorting of Terms.
- For sure \MittagQI\Translate5\Terminology\SearchCollection::findEntries