Uploaded image for project: 'translate5'
  1. translate5
  2. TRANSLATE-4899

Repetition hash calculated incorrectly

XMLWordPrintable

    • High
    • [🐞 Fix] Changed logic to calculate segment hashed. Will affect newly imported tasks only.

      Problem

      Current hashes are calculated with replacement of all internal tags with placeholders and adding some meta info fields to string for hash.

      As such Content protection tags are considered as simple internal tags (as they are extension of internal tags) and resname is also not included there at the moment of hash calculation.

      In result that affects repetitions handling as segment with same count of internal tags but different number of CP tags considered repetition: like 2 placeholder tags and 1 CP tag == 1 placeholder tag + 2 CP tags

       

      Current segment tag handling logic:

      All internal tags excluding whitespace tags will be replaced with "<internal-tag>" string.
      We do not care open or closing type of tags (bpt/ept like) or singular (ph|x|i) ones.
      So bpt|ept|ph|x|i|number|placable|etc -> <internal-tag>
      Whitespace tags -> <internal-ws-tag>
      At the end of resulting segment we will add "#" + count of non-whitespace tags.

      After that resulting string will be hashed with md5

      Solution

      Differ Cp tags and simple internal tags as different entities in process of hash calculation.

      Add resname to hash in process of its calculation.

      Remove addition of resname in \editor_Models_Segment::getRepetitionHash

      New logic:

      Open tags like bpt, bx -> <internal-open-tag>
      Closing tags like ept, ex -> <internal-close-tag>
      Singular like ph, i, x -> <internal-tag>
      White space tags -> <internal-ws-tag>{}
      Content protection -> <internal-cp-tag>
      Placable ->   <internal-placable-tag>

      At the end of resulting segment we will add "#" + count of non-whitespace tags.

      If segment has import time resname (descriptor) - add it at end of string.

      After that resulting string will be hashed with md5

            sanya@mittagqi.com Sanya Mikhliaiev
            sanya@mittagqi.com Sanya Mikhliaiev
            Aleksandar Mitrev
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: