Details
-
Improvement
-
Resolution: Unresolved
-
None
-
None
-
High
-
Empty show more show less
Description
Problem
t5memory performance is not good for concordance search for large memories.
Also if a memory grows to large, t5memory can not handle it anymore efficiently.
Therefore we implemented automatic TM splitting in translate5.
Yet so far we query only one of those memories after the other, when sending fuzzy or concordance search requests. This slows down performance since t5memory is now able to handle multiple requests to different TMs at the same time.
Solution
Fuzzy search: Enable translate5 to send request to parts of splitted TMs at the same time and combine the results.
Concordance search in editor and TM-maintenance: Enable translate5 to send request to parts of splitted TMs at the same time. Send back to the UI the first results you receive and immediately query the same TM-part again, even if there are no answers yet from the other parts. Do so for all parts.
Connected todo
Test t5memory TM parallelism queries in depth, if they do not produce errors and always deliver the correct results.
We can pre-process TMX files on the translate5 side:
- filter duplicates by hashing important data from TU and use some temp table for storing TUs to proceed with
- splitting TMX into smaller part (up to 300-400Mb) and import them into different TMs asynchronously
- for import into existing LR we can first export it, attach new TMX, delete current bound TMs and create new with import
- for update we can query each TM of LR to search for exact match and then update TM that has it and not simply current writable
- for export we then can fetch data from TMs in parallel and join it after
Splitting of TMX on import will also help to speed up Content protection process.
Attachments
Issue Links
- blocks
-
TRANSLATE-3893 Create a test for amended tags parsing code
- Selected for dev
- is blocked by
-
TRANSLATE-3941 Handle t5memory TM splitting in connection with size
- Done