Uploaded image for project: 'translate5'
  1. translate5
  2. TRANSLATE-4784

Uplifting fuzzy matches by looking what content protection tags we do not have in the fuzzy match

XMLWordPrintable

    • High
    • Improve fuzzy matches fetching for segments with content protection

      Problem

      In some segments numbers like 1, 2, etc that are protected in the source may be translated like: one, two, single, dual, etc

      As such that results in unprotected integer in t5memory (when the segment is saved or imported from tmx), because we save only content protection tags in the source, that have an equivalent in the target.

      But that integer may be protected in task: if that is not review task, then no target at import time and no problem to protect integers.

      So then when we look for matches in t5memory we have protection tag in our request string but as t5memory does not have protection resulting matchrate is lower.

      Solution

      To deal with it

      • if 
        • we have a task segment source with content protection tags of the type integer
        • and we have no 100% match
        • and we have a fuzzy match of 50% or higher
      • we
        • will look for protection tag difference between task segment source and memory segment source
        • and try to make additional call for matches to t5memory without that part of content protection tags, that are present in task segment source but not present in match segment source. If there are multiple possible variations for this, we try them all until we receive a 100% match. If we do not receive a 100% match, we take the best fuzzy match from all different calls as a result.

      In attempt to guess we will move through matches one by one until call to t5memory with modified query returns 100 match.

      Example for the statement above:

      Lets say in task we have segment like

      Our nice <number>1</number> segment 

      And in TM 2 segments like:

      Our nice 1 segment.   - rate 97
      Our nice 1 segment 2<x />.    - rate 97

      When we will modify task segment depending on how 1st segment from TM looks like in our guess call we will get 100 match.
      And if we try to modify task segment depending on second segment from TM - we will get exactly same query as in 1st guessing attempt.

      So that do not add any value to our work.

            sanya@mittagqi.com Sanya Mikhliaiev
            sanya@mittagqi.com Sanya Mikhliaiev
            Thomas Lauria
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: