Uploaded image for project: 'translate5'
  1. translate5
  2. TRANSLATE-2077

Offer export of Trados-Style analysis xml

    • High
    • The match analysis report can be exported now in a widely usable XML format.

      Next to the Excel export button a button to export "Trados style XML" is implemented.

      The format of the file should look structurally like the attached one, except

      • no file elements, since translate5 does not support a file specific analysis right now
      • for the settings element the following values should be set: <settings reportInternalFuzzyLeverage="translate5Value" reportLockedSegmentsSeparately="no" reportCrossFileRepetitions="yes" minimumMatchScore="lowestFuzzyValueThatIsConfiguredToBeShownInTranslate5" searchMode="bestWins"/>
      • the following Trados-specific attributes of the settings element are fully omitted:
        missingFormattingPenalty="1" differentFormattingPenalty="1" multipleTranslationsPenalty="1" autoLocalizationPenalty="0" textReplacementPenalty="0" fullRecallMatchedWords="2" partialRecallMatchedWords="n/a" fullRecallSignificantWords="2" partialRecallSignificantWords="n/a"
      • In the batchTotal analysis section the following applies
        • For the following elements all numeric attributes are always set to "0", because they have no analogon currently in translate5:
          • locked
          • perfect
          • repeated (we only have crossFileRepeated)
          • newBaseline (this is specific to SDL MT)
          • newLearnings (this is specific to SDL MT)
        • the number and definitions of fuzzy elements will reflect the fuzzy ranges as defineable with TRANSLATE-2076
        • all MT matches will always be counted within "new"
        • crossFileRepeated are translate5s repetitions (which are represented by 102% matches)
        • exact are 100% and 101% and 104%-Matches from translate5, since Trados does not know our 101 and 104%-Matches
        • inContextExact are 103%-Matches from translate5
        • The following attributes will always have the value "0", since translate5 does not support them right now:
          • characters="0" placeables="0" tags="0" repairWords="0" fullRecallWords="0" partialRecallWords="0" edits="0" adaptiveWords="0" baselineWords="0"
        • The following attributes will be ommitted, because translate5 does not support them so far:
          • segments

      The above definitions (what we count in where) should always be explained within an xml comment within each exported Trados-like xml analysis file.

          [TRANSLATE-2077] Offer export of Trados-Style analysis xml

          another fixes regarding the order of the XML nodes

          Thomas Lauria added a comment - another fixes regarding the order of the XML nodes

          Probably they use regular expressions to parse the content

          Thomas Lauria added a comment - Probably they use regular expressions to parse the content

          Currently Plunet is not able to import the Trados-like Analysis, that we generate.

          According to Plunet the reason is, that in our analysis segments and words are flipped.

          Meaning:

          An xml-tag with count information does not look like this:

          <total segments="6855" words="2328" characters="0" placeables="0" tags="0" repairsegments="0" fullRecallsegments="0" partialRecallsegments="0" edits="0" adaptivesegments="0" baselinesegments="0" />

          but like this in translate5:

          <total words="2328"  segments="6855" characters="0" placeables="0" tags="0" repairsegments="0" fullRecallsegments="0" partialRecallsegments="0" edits="0" adaptivesegments="0" baselinesegments="0" />

          Yet the word-attribute must be listed after the segments attribute and NOT before it.

          I know according xml this should not matter at all and thus it is a Plunet bug - but it is like it is and it should be easy to change for us, is it tlauria ?

          Marc Mittag [Administrator] added a comment - Currently Plunet is not able to import the Trados-like Analysis, that we generate. According to Plunet the reason is, that in our analysis segments and words are flipped. Meaning: An xml-tag with count information does not look like this: <total segments="6855" words="2328" characters="0" placeables="0" tags="0" repairsegments="0" fullRecallsegments="0" partialRecallsegments="0" edits="0" adaptivesegments="0" baselinesegments="0" /> but like this in translate5: <total words="2328"  segments="6855" characters="0" placeables="0" tags="0" repairsegments="0" fullRecallsegments="0" partialRecallsegments="0" edits="0" adaptivesegments="0" baselinesegments="0" /> Yet the word-attribute must be listed after the segments attribute and NOT before it. I know according xml this should not matter at all and thus it is a Plunet bug - but it is like it is and it should be easy to change for us, is it tlauria ?

          Open questions

          • XML Structure
            • taskInfo tag: We assume that the task of taskInfo here is not the task in translate5, but the task of doing analysis. Should we add a uuid to each analysis so that it can be identified by that, or shall we just for uniqueness reuse the taskGuid here too?
              Garik Khachanyan Thomas Lauria uuid for the analysis would be good
            • taskInfo runTime attribute: is currently not tracked, shall this be tracked, or filled with a dummy content?
            • Garik Khachanyan Thomas Lauria should be easy to track, right? Than we should do it. Otherwise set it to 0
            • project tag: evaluates to translate5 task guid and name, no questions here
            • customer tag: shall this be filled with the tasks customer name?
              Garik Khachanyan Thomas Lauria yes
            • tm tag name attribute: since one analysis can contain multiple language resources, shall we sum up here the names? If we had a match analysis name, this could be the better equivalence here.
            • Garik Khachanyan Thomas Lauria use the language resource names, comma separated. If there is a comma in the resource name, replace it with _
            • An alternative implementation would be one XML file per TM (to keep the above TM tag name stuff consistent), makes that sense? Although the MT content is already mixed up differently via the "new" fields.
              Garik Khachanyan Thomas Laurianot for now. Maybe the clients will want that, yet I think, it makes more sense to have one file per analysis. So maybe in the future we will get the option to donwload a summary of all languague resources, or one by resource
          • file structure: In the issue is written "no file elements, since translate5 does not support a file specific analysis right now" thats not true since the data is collected per segment, so via the segment we can get the corresponding file. Should this then implemented?
            Garik Khachanyan Thomas Lauria if this is easy: Yes. Easy means, if it takes max 3-4 hours.
          • internal fuzzies: as far as Thomas understand the issue and XML internal fuzzies are not respected, correct?
            Garik Khachanyan Thomas Lauria good question. I see, I did not specify the issue good enough. In the example file it says: reportInternalFuzzyLeverage="no". So I assume, we should put in "no" here, if Internal Fuzzy is "off" in translate5, ans "yes", if it is on. And in case it is on, we have to include internal fuzzy in the xml. Yet, so far we do not have information about how the tags should look like in the xml for intern Fuzzy. So I propose, that we use exactly the same structure, as for the <fuzzy-Tags and I ask the clients for example data. And then we might have to rename tags later, which should be easy.
          • missing in the issue: "crossFileRepeated are translate5s repetitions" → this can be done via our 102% matches, correct?
            Garik Khachanyan Thomas Lauria yes, this is what I wanted to say

          Marc Mittag [Administrator] added a comment - - edited Open questions XML Structure taskInfo tag: We assume that the task of taskInfo here is not the task in translate5, but the task of doing analysis. Should we add a uuid to each analysis so that it can be identified by that, or shall we just for uniqueness reuse the taskGuid here too? → Garik Khachanyan Thomas Lauria uuid for the analysis would be good taskInfo runTime attribute: is currently not tracked, shall this be tracked, or filled with a dummy content? → Garik Khachanyan Thomas Lauria should be easy to track, right? Than we should do it. Otherwise set it to 0 project tag: evaluates to translate5 task guid and name, no questions here customer tag: shall this be filled with the tasks customer name? → Garik Khachanyan Thomas Lauria yes tm tag name attribute: since one analysis can contain multiple language resources, shall we sum up here the names? If we had a match analysis name, this could be the better equivalence here. → Garik Khachanyan Thomas Lauria use the language resource names, comma separated. If there is a comma in the resource name, replace it with _ An alternative implementation would be one XML file per TM (to keep the above TM tag name stuff consistent), makes that sense? Although the MT content is already mixed up differently via the "new" fields. → Garik Khachanyan Thomas Lauria not for now. Maybe the clients will want that, yet I think, it makes more sense to have one file per analysis. So maybe in the future we will get the option to donwload a summary of all languague resources, or one by resource file structure: In the issue is written "no file elements, since translate5 does not support a file specific analysis right now" thats not true since the data is collected per segment, so via the segment we can get the corresponding file. Should this then implemented? → Garik Khachanyan Thomas Lauria if this is easy: Yes. Easy means, if it takes max 3-4 hours. internal fuzzies: as far as Thomas understand the issue and XML internal fuzzies are not respected, correct? → Garik Khachanyan Thomas Lauria good question. I see, I did not specify the issue good enough. In the example file it says: reportInternalFuzzyLeverage="no". So I assume, we should put in "no" here, if Internal Fuzzy is "off" in translate5, ans "yes", if it is on. And in case it is on, we have to include internal fuzzy in the xml. Yet, so far we do not have information about how the tags should look like in the xml for intern Fuzzy. So I propose, that we use exactly the same structure, as for the <fuzzy-Tags and I ask the clients for example data. And then we might have to rename tags later, which should be easy. missing in the issue: "crossFileRepeated are translate5s repetitions" → this can be done via our 102% matches, correct? → Garik Khachanyan Thomas Lauria yes, this is what I wanted to say

            tlauria Thomas Lauria
            marcmittag Marc Mittag [Administrator]
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: