• Medium
    • Hide
      So far in certain cases it was segmented after a full stop (.), even if no whitespace followed. This is removed (the rule
      <rule break="yes">
      <beforebreak>[\.!?…]['"\u00BB\u2019\u201D\u203A\p{Pe}\u0002]*</beforebreak>
      <afterbreak>\p{Lu}[^\p{Lu}]</afterbreak>
      </rule>
      got removed from srx)
      Show
      So far in certain cases it was segmented after a full stop (.), even if no whitespace followed. This is removed (the rule <rule break="yes"> <beforebreak>[\.!?…]['"\u00BB\u2019\u201D\u203A\p{Pe}\u0002]*</beforebreak> <afterbreak>\p{Lu}[^\p{Lu}]</afterbreak> </rule> got removed from srx)

      Problem

      The translation source (xml) is like this:

      <p id="666">H-840.G2x[HP]: bürstenloser Gleichstrommotor mit Getriebe<

      And the segment boundary is after the full stop.

       

      Solution

       

      There should not be segmentation if there is no whitespace after the full stop AND the following capital letter is part of an alphanumeric string, adjust the general srx file accordingly

      The solutions is implemented in the attached file languages-5.srx

            aleksandar Aleksandar Mitrev
            sylviaschumacher Sylvia Schumacher
            Thomas Lauria
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: