Details
-
Bug
-
Resolution: Fixed
-
None
-
Medium
-
Replaced non-breaking spaces with ordinary spaces before feeding tbx-data to TermTagger
-
Empty show more show less
Description
If a term contains an xml special char > < & ' ", the term will not be termtaggt.
For example if a term looks like "Checks & Balances", it would not be tagged.
Please check, if we can solve this without modifying termtagger - so inspect, what is send to termtagger in regard of terms and segment content and if both can be modified in a way that leads to success.
Please then check, if the same is true for terms, that contain a + character.
Research result:
this is how 'Cat & Dog' really looks inside the tbx, exported and fed to termtagger
Cat & Dog
and the spaces in the original tbx are non-breaking spaces, as their code is 160
Conclusion: We should replace all whitespace in terms with a one single simple space in terms, when we create the TBX for TermTagger.
Please implement this in the frame of this issue