XML

Word

Printable

Type: Bug
Resolution: Fixed
Fix Version/s: translate5 - 7.30.3
Affects Version/s: None
Component/s: TermPortal

Urgency:
Critical
Important release notes:
ON PREMISE: MariaDB version must be updated to be at least 11.4.5: docker compose pull; docker compose up -d
ChangeLog Description:
No proper DB index is used when using terminology for pre translations leading to slow performance there.
Checklist:

Empty

show more show less

problem

Term Collections are used also for pre-translations.

This is leading to a lot of queries on the term table scanning the full table:

explain SELECT `terms_term`.* FROM `terms_term` WHERE (lower(term) = lower('data will contain file: line.') COLLATE utf8mb4_bin) AND (collectionId = 13) AND (languageId IN('5', '367', '368', '369', '370', '371', '372', '373', '374', '375', '376', '377', '251', '252')) GROUP BY `termEntryTbxId`;
+------+-------------+------------+------+------------------------------------------------------------------------------+------+---------+------+--------+----------------------------------------------+
| id   | select_type | table      | type | possible_keys                                                                | key  | key_len | ref  | rows   | Extra                                        |
+------+-------------+------------+------+------------------------------------------------------------------------------+------+---------+------+--------+----------------------------------------------+
|    1 | SIMPLE      | terms_term | ALL  | collectionId_idx,languageId_collectionId_idx,collectionId_termEntryTbxId_idx | NULL | NULL    | NULL | 671542 | Using where; Using temporary; Using filesort |
+------+-------------+------------+------+------------------------------------------------------------------------------+------+---------+------+--------+----------------------------------------------+

Even with an index on columns term, collectionId and languageId this is not used since different collations.

Since by default we have utf8mb4_general_ci collation on term column of terms_term table.

That means when searching there something, this is treated case insenstive and a == ä.

There fore we use collation utf8mb4_bin with lower texts on plain term search to simulate a search where a != ä but with case insensitive search. This needs a long time since no proper index can be used.

Today there is a collation utf8mb4_0900_as_ci which treats a != ä but still case insensitive. Exactly what we want in the term search, allowing indexes to be used - reducing in my example the query from a full table scan to scan over 14 Terms only.

solution

change collation / add index

alter table terms_term drop index fulltext; 

ALTER TABLE terms_term
MODIFY term VARCHAR(255)
CHARACTER SET utf8mb4
COLLATE utf8mb4_0900_as_ci
NOT NULL;

ALTER TABLE terms_term
MODIFY proposal VARCHAR(255)
CHARACTER SET utf8mb4
COLLATE utf8mb4_0900_as_ci
NOT NULL;
  
alter table terms_term add FULLTEXT KEY `fulltext` (`term`,`proposal`);

Open question: Must the index re-recreated with the t5 recreate command?

code change

Throw away the lower and utf8mb4_bin cast in the findEntries method.

testing

Test if the change of the collation has other problems.

Possible code places:

Method: findHomonym
Class TermPortletData
Sorting of Terms.
For sure \MittagQI\Translate5\Terminology\SearchCollection::findEntries

Assignee:: Pavel Perminov
Reporter:: Thomas Lauria
Peer developer:: Thomas Lauria
Votes:: 0 Vote for this issue
Watchers:: 1 Start watching this issue

Created:: 22/May/2025 03:44
Updated:: 17/Oct/2025 11:31
Resolved:: 17/Oct/2025 11:31

Details

Description

problem

solution

change collation / add index

code change

testing

Attachments

Activity

People

Dates