Problem

t5memory performance is not good for concordance search for large memories.

Also if a memory grows to large, t5memory can not handle it anymore efficiently.

Therefore we implemented automatic TM splitting in translate5.

Yet so far we query only one of those memories after the other, when sending fuzzy or concordance search requests. This slows down performance since t5memory is now able to handle multiple requests to different TMs at the same time.

Solution

Fuzzy search: Enable translate5 to send request to parts of splitted TMs at the same time and combine the results.

Concordance search in editor and TM-maintenance: Enable translate5 to send request to parts of splitted TMs at the same time. Send back to the UI the first results you receive and immediately query the same TM-part again, even if there are no answers yet from the other parts. Do so for all parts.

Connected todo

Test t5memory TM parallelism queries in depth, if they do not produce errors and always deliver the correct results.

We can pre-process TMX files on the translate5 side:

filter duplicates by hashing important data from TU and use some temp table for storing TUs to proceed with
splitting TMX into smaller part (up to 300-400Mb) and import them into different TMs asynchronously
for import into existing LR we can first export it, attach new TMX, delete current bound TMs and create new with import
for update we can query each TM of LR to search for exact match and then update TM that has it and not simply current writable
for export we then can fetch data from TMs in parallel and join it after

Splitting of TMX on import will also help to speed up Content protection process.

Attachments

Issue Links

blocks

TRANSLATE-3893 Create a test for amended tags parsing code

Selected for dev

is blocked by

TRANSLATE-3941 Handle t5memory TM splitting in connection with size

Done

Activity

People

Assignee:: Leon Kiz

Reporter:: Marc Mittag [Administrator]

Peer developer:: Sanya Mikhliaiev

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 15/May/2024 06:37

Updated:: 18/Oct/2024 10:08