Uploaded image for project: 'translate5'
  1. translate5
  2. TRANSLATE-3300

Terms that contain xml special chars are not tagged

      If a term contains an xml special char > < & ' ", the term will not be termtaggt.

      For example if a term looks like "Checks & Balances", it would not be tagged.

      Please check, if we can solve this without modifying termtagger - so inspect, what is send to termtagger in regard of terms and segment content and if both can be modified in a way that leads to success.

      Please then check, if the same is true for terms, that contain a + character.

      Research result:

      this is how 'Cat & Dog' really looks inside the tbx, exported and fed to termtagger

      Cat&#xA0;&amp;&#xA0;Dog

      and the spaces in the original tbx are non-breaking spaces, as their code is 160

      Conclusion: We should replace all whitespace in terms with a one single simple space in terms, when we create the TBX for TermTagger.
      Please implement this in the frame of this issue

          [TRANSLATE-3300] Terms that contain xml special chars are not tagged

          Aleksandar Mitrev made changes -
          Status Original: Releasable [ 10901 ] New: Done [ 10000 ]
          Aleksandar Mitrev made changes -
          Fix Version/s New: translate5 - 6.7.0 [ 13102 ]
          Aleksandar Mitrev made changes -
          Status Original: Front-end testing [ 11001 ] New: Releasable [ 10901 ]
          Axel Becher made changes -
          Status Original: Final pull request [ 10005 ] New: Front-end testing [ 11001 ]
          Pavel Perminov made changes -
          Resolution New: Fixed [ 1 ]
          Status Original: In Progress [ 3 ] New: Final pull request [ 10005 ]
          Pavel Perminov made changes -
          ChangeLog Description New: Replaced non-breaking spaces with ordinary spaces before feeding tbx-data to TermTagger
          Pavel Perminov made changes -
          Description Original: If a term contains an xml special char > < & ' ", the term will not be termtaggt.

          For example if a term looks like "Checks & Balances", it would not be tagged.

          Please check, if we can solve this without modifying termtagger - so inspect, what is send to termtagger in regard of terms and segment content and if both can be modified in a way that leads to success.

          Please then check, if the same is true for terms, that contain a + character.

          h3. Research result:

          this is how 'Cat & Dog' really looks inside the tbx, exported and fed to termtagger
          Cat&#xA0;&amp;&#xA0;Dog
          and the spaces in the original tbx are non-breaking spaces, as their code is 160

          Conclusion: We should replace all whitespace in terms with a one single simple space in terms, when we create the TBX for TermTagger.
          Please implement this in the frame of this issue
          New: If a term contains an xml special char > < & ' ", the term will not be termtaggt.

          For example if a term looks like "Checks & Balances", it would not be tagged.

          Please check, if we can solve this without modifying termtagger - so inspect, what is send to termtagger in regard of terms and segment content and if both can be modified in a way that leads to success.

          Please then check, if the same is true for terms, that contain a + character.
          h3. Research result:

          this is how 'Cat & Dog' really looks inside the tbx, exported and fed to termtagger
          {code:java}
          Cat&#xA0;&amp;&#xA0;Dog{code}

          and the spaces in the original tbx are non-breaking spaces, as their code is 160

          Conclusion: We should replace all whitespace in terms with a one single simple space in terms, when we create the TBX for TermTagger.
          Please implement this in the frame of this issue
          Marc Mittag [Administrator] made changes -
          Description Original: If a term contains an xml special char > < & ' ", the term will not be termtaggt.

          For example if a term looks like "Checks & Balances", it would not be tagged.

          Please check, if we can solve this without modifying termtagger - so inspect, what is send to termtagger in regard of terms and segment content and if both can be modified in a way that leads to success.

          Please then check, if the same is true for terms, that contain a + character.
          New: If a term contains an xml special char > < & ' ", the term will not be termtaggt.

          For example if a term looks like "Checks & Balances", it would not be tagged.

          Please check, if we can solve this without modifying termtagger - so inspect, what is send to termtagger in regard of terms and segment content and if both can be modified in a way that leads to success.

          Please then check, if the same is true for terms, that contain a + character.

          h3. Research result:

          this is how 'Cat & Dog' really looks inside the tbx, exported and fed to termtagger
          Cat&#xA0;&amp;&#xA0;Dog
          and the spaces in the original tbx are non-breaking spaces, as their code is 160

          Conclusion: We should replace all whitespace in terms with a one single simple space in terms, when we create the TBX for TermTagger.
          Please implement this in the frame of this issue
          Pavel Perminov made changes -
          Attachment New: image-2023-09-20-12-44-57-753.png [ 24500 ]
          Pavel Perminov made changes -
          Rank New: Ranked higher

            pavelperminov Pavel Perminov
            marcmittag Marc Mittag [Administrator]
            Axel Becher
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: