-
Bug
-
Resolution: Unresolved
-
None
-
High
-
Enable tag handling configuration for each resource. Introducing new xml tag handler with tag repair functionality.
-
Emptyshow more show less
Problem
Currently, the system interacts with multiple translation resources (DeepL, OpenAI, Google, Microsoft, etc.) for translating content. Each resource can process tags, but they have different methods and options for handling tag processing. Additionally, there is no uniform way to configure how tags are processed or repaired across resources.
Tasks:
To create a configurable system for tag processing and repair that allows:
- Defining the type of tags (HTML or XLIFF) sent to each translation resource.
- Configuring whether the tag repair functionality is applied post-translation on the backend.
- Aligning the tag repair functionality with the type of tags sent to resources.
- Document in confluence, how which MT/LLM resource is currently handling tags with translate5
Implementation ideas:
- Tag Repair Functionality:
-
- Introduce a tag repair mechanism for XLIFF tags.
- Evaluate whether to:
- Develop a single, unified tag repair class to handle both HTML and XLIFF tags.
- Create separate tag repair classes for HTML and XLIFF tags.
- Evaluate, how current tag repair for DeepL works. In Marcs understanding it makes sure
- no tag is missing
- tags are syntactically correct
- if a tag has to be inserted or moved, it will be moved/inserted in a similar position as it had in the source segment (so e. g. after the same number of blocks of word-characters and non-word-characters. If that logic already exist, keep it.
- Resource-Specific Configuration:
-
- Allow configuration for each resource to specify:
- The type of tags it processes (HTML or XLIFF).
- Whether tag repair should be enabled or disabled.
- Ensure tag repair type aligns with the tag type sent to the resource (e.g., if XLIFF tags are sent, only XLIFF tag repair should be applied).
- Allow configuration for each resource to specify:
- Validation Logic:
-
- Implement validation to prevent mismatches between tag type and tag repair functionality. For example:
- If XLIFF tags are sent, ensure HTML repair is not attempted.
- Implement validation to prevent mismatches between tag type and tag repair functionality. For example:
suggested presets
In general its always a good idea to set sendWhitespaceAsTag to active.
For more details see Test-Details.txt
Deepl:
- runtimeOptions.LanguageResources.deepl.sendWhitespaceAsTag: "active" (if disabled whitespaces (can) get lost)
- runtimeOptions.plugins.DeepL.api.parametars.tagHandling: "none" (if that means that the parameter is not sent at all
- runtimeOptions.LanguageResources.deepl.tagHandler: "xliff_paired_tags"
Sample: <t5x_123>abc</t5x_123> or <t5x_456 />
additional:
- "split_sentences" => "nonewlines" should be removed from request. This will be set automatically in the right way by Deepl
- "preserve_formatting" => false is default and can be removed.
OR better: set to true else \n will be lost and zeile1\nzeile2 will end in "line1line2" instead of "line1 line2"
OpenAI
runtimeOptions.LanguageResources.openai.sendWhitespaceAsTag activated
runtimeOptions.LanguageResources.openai.tagHandler xliff_paired_tags
additional
recommended by ChatGPT "Für professionelle Übersetzungen"
'model' => 'gpt-4',
or
'model' => 'gpt-4-turbo',
can be selected when actual language-resource is created.
!!! in the list which is offered, there are some (at least one) model(s) which is not able to translate at all.
This ends up in an error
"This is not a chat model and thus not supported in the v1/chat/completions endpoint. Did you mean to use v1/completions?"
So maybe the list can be examined by some kind of attributes which are able to translate. Else its really pain in the ass for user.
really hard to decide, none of them is perfect. "best" results are with:
runtimeOptions.LanguageResources.google.sendWhitespaceAsTag activated
runtimeOptions.LanguageResources.google.tagHandler xliff_paired_tags
runtimeOptions.LanguageResources.google.format text
Microsoft
hard to decide, formal all OK, Must/can be decided by prefered results.
Maybe html_image is not as good as the other two.
runtimeOptions.LanguageResources.microsoft.sendWhitespaceAsTag activated
runtimeOptions.LanguageResources.microsoft.tagHandler xlf_repair
- relates to
-
TRANSLATE-4203 DeepL: Switch tag-handling to be able to send tags as xliff tags
- Done
-
TRANSLATE-4490 Check GPT tag handling
- Selected for dev
- Wiki Page
-
Wiki Page Loading...