Improve File Format Segmentation Rules after colons, fix OKAPI quirk

XMLWordPrintable

    • Type: Improvement
    • Resolution: Unresolved
    • None
    • Affects Version/s: None
    • Component/s: file format settings
    • Critical
    • File Format Settings: Improve Segmentation after Colons: Take quotes into account

      Problem

      Solution

      a)

      Überall, wo wir die folgende break-Regel in unseren default-srx haben,

      <rule break="yes">
        <beforebreak>:</beforebreak>
        <afterbreak>\s+\p{Lu}</afterbreak>
      </rule>
      

       

      sie gegen die folgende ersetzen

      <rule break="yes">
          <beforebreak>:</beforebreak>
          <afterbreak>\s+[„»‚›\"']? ?\p{Lu}</afterbreak>
      </rule>

      b)

      For us the solution is, to replace this break-yes rule

      <beforebreak>[\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}]\s*[\p{Pe}\p{Pf}\p{Po}"'"''’""]\s[\.?!]\s*[\p{Pe}\p{Pf}\p{Po}"'"''’""]*</beforebreak>

      with this one

      <beforebreak>[\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}]\s*[\p{Pe}\p{Pf}\p{Po}"'"''’""]\s[\.?!]\s+[\p{Pe}\p{Pf}\p{Po}"'"''’""]*</beforebreak>

      So just the last \s* in the regex replaced with \s+
      This solves our problem (details see Okapi-Groups-Link above) and I think many similar problems.

            Assignee:
            Axel Becher
            Reporter:
            Axel Becher
            Thomas Lauria
            Stephan Bergmann, Sylvia Schumacher
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: