Uploaded image for project: 'translate5'
  1. translate5
  2. TRANSLATE-3300

Terms that contain xml special chars are not tagged

    XMLWordPrintable

Details

    Description

      If a term contains an xml special char > < & ' ", the term will not be termtaggt.

      For example if a term looks like "Checks & Balances", it would not be tagged.

      Please check, if we can solve this without modifying termtagger - so inspect, what is send to termtagger in regard of terms and segment content and if both can be modified in a way that leads to success.

      Please then check, if the same is true for terms, that contain a + character.

      Research result:

      this is how 'Cat & Dog' really looks inside the tbx, exported and fed to termtagger

      Cat&#xA0;&amp;&#xA0;Dog

      and the spaces in the original tbx are non-breaking spaces, as their code is 160

      Conclusion: We should replace all whitespace in terms with a one single simple space in terms, when we create the TBX for TermTagger.
      Please implement this in the frame of this issue

      Attachments

        Activity

          People

            pavelperminov Pavel Perminov
            marcmittag Marc Mittag [Administrator]
            Axel Becher
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: