Loading...

XML

Word

Printable

Type: Improvement
Resolution: Fixed
Fix Version/s: translate5 - 7.35.0, translate5 - 7.35.2
Affects Version/s: None
Component/s: t5memory

Urgency:
High
Important release notes:

Hide
7.35.x: IMPORTANT FOR INSTALLATIONS WHERE CONTENT PROTECTION IS USED!
Due to the feature all TMs using Content Protection rules are listed as "not converted" and conversion should be started afterwards.
On Premise clients: please handle manually after the update.
Hosting clients: is done and ensured by MittagQI.

Show
7.35.x: IMPORTANT FOR INSTALLATIONS WHERE CONTENT PROTECTION IS USED! Due to the feature all TMs using Content Protection rules are listed as "not converted" and conversion should be started afterwards. On Premise clients: please handle manually after the update. Hosting clients: is done and ensured by MittagQI.
ChangeLog Description:

Hide
7.35.2: conversion script did fail for empty TMs
7.35.0: Apply filters on TMX on import time from translate5 side
All Translate5 TMs in instance will be marked as "not converted" and matches may have lower matches.
Language resources affected by this should be converted manually.

Show
7.35.2: conversion script did fail for empty TMs 7.35.0: Apply filters on TMX on import time from translate5 side All Translate5 TMs in instance will be marked as "not converted" and matches may have lower matches. Language resources affected by this should be converted manually.
Checklist:

Empty

show more show less

Source duplicates (variation) exist in t5memory as part of its features.

That is dictated partly by TMX standard and partly by simple custom of how TM should behave.

We now thought of possibility to add more robust way to filter trans units that are of little use for end user.

As so we will introduce 3 config parameters that will allow us to filter duplicates without care about for author, document or context.

So in current logic uniqueness of segment is calculated by combination of: source text, author, document and context.
Segment is replaced only in case if all those fields are same for newly coming segment and only timestamp is more fresh then the one existing in t5memory.

After improvement done we will be able to combine author, document or context depending on config. Only source text will always play a role and in theory all additional fields may be omitted from combination.

How do I even test it?

You need TMX file with duplicates first.

In application/config/installation.ini you have to add 1 lines of settings:

runtimeOptions.LanguageResources.t5memory.skipAuthor = 0
runtimeOptions.LanguageResources.t5memory.skipDocument = 0
runtimeOptions.LanguageResources.t5memory.skipContext = 0

runtimeOptions.LanguageResources.t5memory.useTmxUtilsTrim = 0
runtimeOptions.LanguageResources.t5memory.useTmxUtilsFilter = 0

With everything set to 0 after import resulting memory should have only freshest duplicates but still have variants for diff author, doc and context.

Segments without context will receive fake one that is "-" symbol

If skipAuthor is set to 1: all duplicates where only author differs will be skipped and only freshest one preserved.

If skipDocument is set to 1: all duplicates where only document differs will be skipped and only freshest one preserved.

If skipContext is set to 1: all duplicates where only context differs will be skipped and only freshest one preserved.

Configs above may be combined in any way.

If useTmxUtilsFilter is set to 1: all logic above should remain absolutely same. Only difference is speed of processing.

To test useTmxUtilsTrim you need big TMX file. One that for sure will not fit into 1 memory on t5memory side.

So test is to import that file in test-lr-1 with useTmxUtilsTrim = 0 -> export TMX file.
Then import that same file in test-lr-2 with useTmxUtilsTrim = 1 -> export TMX file.

Compare files. They should be same. Only difference yet again is speed of import.

blocks

TRANSLATE-5088 Improve segment filtering on TMX import

Done

relates to

TRANSLATE-5050 Redo TMX export in parallel

Done

TRANSLATE-5088 Improve segment filtering on TMX import

Done

TRANSLATE-5103 Improve deletion of Segments in Maintenance

Done

TRANSLATE-5140 Use dash as context instead of segment number in task

Done

TRANSLATE-4898 Add retry if t5memory rejects request due to restart

Done

TRANSLATE-5087 Check for duplicates on segment updates

Done

TRANSLATE-5090 Introduce settings for TMX filter and tmx-utils

Done

(3 relates to)

Assignee:: Sanya Mikhliaiev
Reporter:: Sanya Mikhliaiev
Peer developer:: Leon Kiz
Tester:: Stephan Bergmann, Sylvia Schumacher
Votes:: 0 Vote for this issue
Watchers:: 5 Start watching this issue

Created:: 07/Oct/2025 07:35
Updated:: 03/Mar/2026 05:15
Resolved:: 25/Feb/2026 04:45

Details

Description

How do I even test it?

Attachments

Issue Links

Activity

People

Dates