-
Bug
-
Resolution: Fixed
-
None
-
Medium
-
Replaced non-breaking spaces with ordinary spaces before feeding tbx-data to TermTagger
-
Emptyshow more show less
If a term contains an xml special char > < & ' ", the term will not be termtaggt.
For example if a term looks like "Checks & Balances", it would not be tagged.
Please check, if we can solve this without modifying termtagger - so inspect, what is send to termtagger in regard of terms and segment content and if both can be modified in a way that leads to success.
Please then check, if the same is true for terms, that contain a + character.
Research result:
this is how 'Cat & Dog' really looks inside the tbx, exported and fed to termtagger
Cat & Dog
and the spaces in the original tbx are non-breaking spaces, as their code is 160
Conclusion: We should replace all whitespace in terms with a one single simple space in terms, when we create the TBX for TermTagger.
Please implement this in the frame of this issue
[TRANSLATE-3300] Terms that contain xml special chars are not tagged
Status | Original: Releasable [ 10901 ] | New: Done [ 10000 ] |
Fix Version/s | New: translate5 - 6.7.0 [ 13102 ] |
Status | Original: Front-end testing [ 11001 ] | New: Releasable [ 10901 ] |
Status | Original: Final pull request [ 10005 ] | New: Front-end testing [ 11001 ] |
Resolution | New: Fixed [ 1 ] | |
Status | Original: In Progress [ 3 ] | New: Final pull request [ 10005 ] |
ChangeLog Description | New: Replaced non-breaking spaces with ordinary spaces before feeding tbx-data to TermTagger |
Description |
Original:
If a term contains an xml special char > < & ' ", the term will not be termtaggt. For example if a term looks like "Checks & Balances", it would not be tagged. Please check, if we can solve this without modifying termtagger - so inspect, what is send to termtagger in regard of terms and segment content and if both can be modified in a way that leads to success. Please then check, if the same is true for terms, that contain a + character. h3. Research result: this is how 'Cat & Dog' really looks inside the tbx, exported and fed to termtagger Cat & Dog and the spaces in the original tbx are non-breaking spaces, as their code is 160 Conclusion: We should replace all whitespace in terms with a one single simple space in terms, when we create the TBX for TermTagger. Please implement this in the frame of this issue |
New:
If a term contains an xml special char > < & ' ", the term will not be termtaggt. For example if a term looks like "Checks & Balances", it would not be tagged. Please check, if we can solve this without modifying termtagger - so inspect, what is send to termtagger in regard of terms and segment content and if both can be modified in a way that leads to success. Please then check, if the same is true for terms, that contain a + character. h3. Research result: this is how 'Cat & Dog' really looks inside the tbx, exported and fed to termtagger {code:java} Cat & Dog{code} and the spaces in the original tbx are non-breaking spaces, as their code is 160 Conclusion: We should replace all whitespace in terms with a one single simple space in terms, when we create the TBX for TermTagger. Please implement this in the frame of this issue |
Description |
Original:
If a term contains an xml special char > < & ' ", the term will not be termtaggt. For example if a term looks like "Checks & Balances", it would not be tagged. Please check, if we can solve this without modifying termtagger - so inspect, what is send to termtagger in regard of terms and segment content and if both can be modified in a way that leads to success. Please then check, if the same is true for terms, that contain a + character. |
New:
If a term contains an xml special char > < & ' ", the term will not be termtaggt. For example if a term looks like "Checks & Balances", it would not be tagged. Please check, if we can solve this without modifying termtagger - so inspect, what is send to termtagger in regard of terms and segment content and if both can be modified in a way that leads to success. Please then check, if the same is true for terms, that contain a + character. h3. Research result: this is how 'Cat & Dog' really looks inside the tbx, exported and fed to termtagger Cat & Dog and the spaces in the original tbx are non-breaking spaces, as their code is 160 Conclusion: We should replace all whitespace in terms with a one single simple space in terms, when we create the TBX for TermTagger. Please implement this in the frame of this issue |
Attachment | New: image-2023-09-20-12-44-57-753.png [ 24500 ] |
Rank | New: Ranked higher |