[TRANSLATE-2077] Offer export of Trados-Style analysis xml

Type: New Feature
Resolution: Fixed
Fix Version/s: translate5 - 5.2.0
Affects Version/s: None
Component/s: MatchAnalysis & Pretranslation

Urgency:
High
ChangeLog Description:
The match analysis report can be exported now in a widely usable XML format.
Checklist:

Empty

show more show less

Next to the Excel export button a button to export "Trados style XML" is implemented.

The format of the file should look structurally like the attached one, except

no file elements, since translate5 does not support a file specific analysis right now
for the settings element the following values should be set: <settings reportInternalFuzzyLeverage="translate5Value" reportLockedSegmentsSeparately="no" reportCrossFileRepetitions="yes" minimumMatchScore="lowestFuzzyValueThatIsConfiguredToBeShownInTranslate5" searchMode="bestWins"/>
the following Trados-specific attributes of the settings element are fully omitted:
missingFormattingPenalty="1" differentFormattingPenalty="1" multipleTranslationsPenalty="1" autoLocalizationPenalty="0" textReplacementPenalty="0" fullRecallMatchedWords="2" partialRecallMatchedWords="n/a" fullRecallSignificantWords="2" partialRecallSignificantWords="n/a"
In the batchTotal analysis section the following applies
- For the following elements all numeric attributes are always set to "0", because they have no analogon currently in translate5:
  - locked
  - perfect
  - repeated (we only have crossFileRepeated)
  - newBaseline (this is specific to SDL MT)
  - newLearnings (this is specific to SDL MT)
- the number and definitions of fuzzy elements will reflect the fuzzy ranges as defineable with ~~TRANSLATE-2076~~
- all MT matches will always be counted within "new"
- crossFileRepeated are translate5s repetitions (which are represented by 102% matches)
- exact are 100% and 101% and 104%-Matches from translate5, since Trados does not know our 101 and 104%-Matches
- inContextExact are 103%-Matches from translate5
- The following attributes will always have the value "0", since translate5 does not support them right now:
  - characters="0" placeables="0" tags="0" repairWords="0" fullRecallWords="0" partialRecallWords="0" edits="0" adaptiveWords="0" baselineWords="0"
- The following attributes will be ommitted, because translate5 does not support them so far:
  - segments

The above definitions (what we count in where) should always be explained within an xml comment within each exported Trados-like xml analysis file.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

analyse_internalfuzzy.xml
304 kB
16/Apr/2021 04:30
Log_sample.xml
9 kB
22/May/2020 06:30

relates to

TRANSLATE-2076 Define analysis fuzzy match ranges

Done

Thomas Lauria added a comment - 15/Jul/2021 01:32

another fixes regarding the order of the XML nodes

Thomas Lauria added a comment - 15/Jul/2021 01:32 another fixes regarding the order of the XML nodes

Thomas Lauria added a comment - 06/Jul/2021 01:56

Probably they use regular expressions to parse the content

Thomas Lauria added a comment - 06/Jul/2021 01:56 Probably they use regular expressions to parse the content

Marc Mittag [Administrator] added a comment - 05/Jul/2021 09:28

Currently Plunet is not able to import the Trados-like Analysis, that we generate.

According to Plunet the reason is, that in our analysis segments and words are flipped.

Meaning:

An xml-tag with count information does not look like this:

but like this in translate5:

Yet the word-attribute must be listed after the segments attribute and NOT before it.

I know according xml this should not matter at all and thus it is a Plunet bug - but it is like it is and it should be easy to change for us, is it tlauria ?

Marc Mittag [Administrator] added a comment - 05/Jul/2021 09:28 Currently Plunet is not able to import the Trados-like Analysis, that we generate. According to Plunet the reason is, that in our analysis segments and words are flipped. Meaning: An xml-tag with count information does not look like this: <total segments="6855" words="2328" characters="0" placeables="0" tags="0" repairsegments="0" fullRecallsegments="0" partialRecallsegments="0" edits="0" adaptivesegments="0" baselinesegments="0" /> but like this in translate5: <total words="2328" segments="6855" characters="0" placeables="0" tags="0" repairsegments="0" fullRecallsegments="0" partialRecallsegments="0" edits="0" adaptivesegments="0" baselinesegments="0" /> Yet the word-attribute must be listed after the segments attribute and NOT before it. I know according xml this should not matter at all and thus it is a Plunet bug - but it is like it is and it should be easy to change for us, is it tlauria ?

Marc Mittag [Administrator] added a comment - 14/Apr/2021 05:45 - edited

Open questions

XML Structure
- taskInfo tag: We assume that the task of taskInfo here is not the task in translate5, but the task of doing analysis. Should we add a uuid to each analysis so that it can be identified by that, or shall we just for uniqueness reuse the taskGuid here too?
  → Garik Khachanyan Thomas Lauria uuid for the analysis would be good
- taskInfo runTime attribute: is currently not tracked, shall this be tracked, or filled with a dummy content?
- → Garik Khachanyan Thomas Lauria should be easy to track, right? Than we should do it. Otherwise set it to 0
- project tag: evaluates to translate5 task guid and name, no questions here
- customer tag: shall this be filled with the tasks customer name?
  → Garik Khachanyan Thomas Lauria yes
- tm tag name attribute: since one analysis can contain multiple language resources, shall we sum up here the names? If we had a match analysis name, this could be the better equivalence here.
- → Garik Khachanyan Thomas Lauria use the language resource names, comma separated. If there is a comma in the resource name, replace it with _
- An alternative implementation would be one XML file per TM (to keep the above TM tag name stuff consistent), makes that sense? Although the MT content is already mixed up differently via the "new" fields.
  → Garik Khachanyan Thomas Laurianot for now. Maybe the clients will want that, yet I think, it makes more sense to have one file per analysis. So maybe in the future we will get the option to donwload a summary of all languague resources, or one by resource
file structure: In the issue is written "no file elements, since translate5 does not support a file specific analysis right now" thats not true since the data is collected per segment, so via the segment we can get the corresponding file. Should this then implemented?
→ Garik Khachanyan Thomas Lauria if this is easy: Yes. Easy means, if it takes max 3-4 hours.
internal fuzzies: as far as Thomas understand the issue and XML internal fuzzies are not respected, correct?
→ Garik Khachanyan Thomas Lauria good question. I see, I did not specify the issue good enough. In the example file it says: reportInternalFuzzyLeverage="no". So I assume, we should put in "no" here, if Internal Fuzzy is "off" in translate5, ans "yes", if it is on. And in case it is on, we have to include internal fuzzy in the xml. Yet, so far we do not have information about how the tags should look like in the xml for intern Fuzzy. So I propose, that we use exactly the same structure, as for the <fuzzy-Tags and I ask the clients for example data. And then we might have to rename tags later, which should be easy.
missing in the issue: "crossFileRepeated are translate5s repetitions" → this can be done via our 102% matches, correct?
→ Garik Khachanyan Thomas Lauria yes, this is what I wanted to say

Marc Mittag [Administrator] added a comment - 14/Apr/2021 05:45 - edited Open questions XML Structure taskInfo tag: We assume that the task of taskInfo here is not the task in translate5, but the task of doing analysis. Should we add a uuid to each analysis so that it can be identified by that, or shall we just for uniqueness reuse the taskGuid here too? → Garik Khachanyan Thomas Lauria uuid for the analysis would be good taskInfo runTime attribute: is currently not tracked, shall this be tracked, or filled with a dummy content? → Garik Khachanyan Thomas Lauria should be easy to track, right? Than we should do it. Otherwise set it to 0 project tag: evaluates to translate5 task guid and name, no questions here customer tag: shall this be filled with the tasks customer name? → Garik Khachanyan Thomas Lauria yes tm tag name attribute: since one analysis can contain multiple language resources, shall we sum up here the names? If we had a match analysis name, this could be the better equivalence here. → Garik Khachanyan Thomas Lauria use the language resource names, comma separated. If there is a comma in the resource name, replace it with _ An alternative implementation would be one XML file per TM (to keep the above TM tag name stuff consistent), makes that sense? Although the MT content is already mixed up differently via the "new" fields. → Garik Khachanyan Thomas Lauria not for now. Maybe the clients will want that, yet I think, it makes more sense to have one file per analysis. So maybe in the future we will get the option to donwload a summary of all languague resources, or one by resource file structure: In the issue is written "no file elements, since translate5 does not support a file specific analysis right now" thats not true since the data is collected per segment, so via the segment we can get the corresponding file. Should this then implemented? → Garik Khachanyan Thomas Lauria if this is easy: Yes. Easy means, if it takes max 3-4 hours. internal fuzzies: as far as Thomas understand the issue and XML internal fuzzies are not respected, correct? → Garik Khachanyan Thomas Lauria good question. I see, I did not specify the issue good enough. In the example file it says: reportInternalFuzzyLeverage="no". So I assume, we should put in "no" here, if Internal Fuzzy is "off" in translate5, ans "yes", if it is on. And in case it is on, we have to include internal fuzzy in the xml. Yet, so far we do not have information about how the tags should look like in the xml for intern Fuzzy. So I propose, that we use exactly the same structure, as for the <fuzzy-Tags and I ask the clients for example data. And then we might have to rename tags later, which should be easy. missing in the issue: "crossFileRepeated are translate5s repetitions" → this can be done via our 102% matches, correct? → Garik Khachanyan Thomas Lauria yes, this is what I wanted to say

translate5

Details

Description

Attachments

Attachments

Issue Links

Activity

Collapse comment: Thomas Lauria added a comment - 15/Jul/2021 01:32

Expand comment: Thomas Lauria added a comment - 15/Jul/2021 01:32

Collapse comment: Thomas Lauria added a comment - 06/Jul/2021 01:56

Expand comment: Thomas Lauria added a comment - 06/Jul/2021 01:56

Collapse comment: Marc Mittag [Administrator] added a comment - 05/Jul/2021 09:28

Expand comment: Marc Mittag [Administrator] added a comment - 05/Jul/2021 09:28

Collapse comment: Marc Mittag [Administrator] added a comment - 14/Apr/2021 05:45, Edited by Thomas Lauria - 15/Apr/2021 03:47

Expand comment: Marc Mittag [Administrator] added a comment - 14/Apr/2021 05:45, Edited by Thomas Lauria - 15/Apr/2021 03:47

People

Dates