Uploaded image for project: 'translate5'
  1. translate5
  2. TRANSLATE-1376

Segment length calculation does not include length of content outside of mrk tags

    XMLWordPrintable

Details

    Description

      problem

      The min max segment length is defined in the transunit. Currently translate5 uses for length calculation only the contents inside the mrk tags. So the length of content outside between the mrk tags (mostly whitespace) is missing.

      This must be changed, so that the length of the additional characters is also added.

      solution

      The length of other content (outside/between mrk mtype seg tags) is also saved for length calculation.

      Assume the following <target>, where "bef", "betweenX" and "aft" are assumed as whitespace, (since other content as whitespace outside of mrks gices an error).

      <target>bef<mrk>text 1</mrk>between1<mrk>text 2</mrk>between1<mrk>text 3</mrk>aft</target>

      The length of "bef" is saved as "additionalUnitLength" to each segment, the length of each whitespace after a closed mrk is saved to that mrk as "additionalMrkLength". That means the lengths of "betweenX" and the final "aft" are saved each to the preceding segment defined by the mrk.

      Each additionalMrkLength is added automatically to the segments content length in siblingData, the additionalUnitLength instead must be only added once on each length calculation (where siblingData is used) - this is just since the additionalUnitLength is independent of the MRKs. That means: the length stored in siblingData is the segment text length and the additionalMrkLength.

      preserveWhitespace has influence to the otherContent:

      if preserveWhitespace is true (wether in the trans-unit or in the application), the length of otherContent is always the real length, since the othercontent is taken over completely.
      if preserveWhitespace is false in the config, the default behaviour is then: whitespace in other content is removed and ignored in length counting expect the content between two mrk tags (<mrk>content</mrk> HERE <mrk>next content</mrk>): here the othercontent in between the mrks is condensed to one whitespace (or if there is another tag inbetween to one whitespace before that tag, and one after).
      Before: <mrk>content</mrk>         <mrk>next content</mrk> After: <mrk>content</mrk> <mrk>next content</mrk>
      Before: <mrk>content</mrk>    <x>     <mrk>next content</mrk> After: <mrk>content</mrk> <x> <mrk>next content</mrk>

      source and target MRK padding if MRKs are different in source vs target:
      if there is no target content (translation task), MRK padding is no problem since there is no target to compare and add missing MRKs
      if there is a target content: just use the target otherContent since padded target MRKs could not be edited and are not added as new MRKs in the target. So no otherContent must be considered here. This will change with implementing merging and splitting!

       

      Attachments

        Activity

          People

            tlauria Thomas Lauria
            tlauria Thomas Lauria
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: