Uploaded image for project: 'translate5'
  1. translate5
  2. TRANSLATE-3987

wrong segmentation

    XMLWordPrintable

Details

    • Medium
    • Hide
      So far in certain cases it was segmented after a full stop (.), even if no whitespace followed. This is removed (the rule
      <rule break="yes">
      <beforebreak>[\.!?…]['"\u00BB\u2019\u201D\u203A\p{Pe}\u0002]*</beforebreak>
      <afterbreak>\p{Lu}[^\p{Lu}]</afterbreak>
      </rule>
      got removed from srx)
      Show
      So far in certain cases it was segmented after a full stop (.), even if no whitespace followed. This is removed (the rule <rule break="yes"> <beforebreak>[\.!?…]['"\u00BB\u2019\u201D\u203A\p{Pe}\u0002]*</beforebreak> <afterbreak>\p{Lu}[^\p{Lu}]</afterbreak> </rule> got removed from srx)

    Description

      Problem

      The translation source (xml) is like this:

      <p id="666">H-840.G2x[HP]: bürstenloser Gleichstrommotor mit Getriebe<

      And the segment boundary is after the full stop.

       

      Solution

       

      There should not be segmentation if there is no whitespace after the full stop AND the following capital letter is part of an alphanumeric string, adjust the general srx file accordingly

      The solutions is implemented in the attached file languages-5.srx

      Attachments

        1. image-2024-06-04-17-25-51-664.png
          33 kB
          Sylvia Schumacher
        2. languages-5.srx
          664 kB
          Marc Mittag [Administrator]

        Activity

          People

            aleksandar Aleksandar Mitrev
            sylviaschumacher Sylvia Schumacher
            Thomas Lauria
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: