Uploaded image for project: 'translate5'
  1. translate5
  2. TRANSLATE-3408

Implement proper segmentation for InstantTranslate

    XMLWordPrintable

Details

    • High
    • Improved segmentation for InstantTranslate to work like the target segmentation of Okapi

    Description

      InstantTranslate currently implements only a rudimentary segmentation. Within this feature a reusable segmentation engine shall be implemented (as part of the Okapi Plugin), that uses the Tranlste5-default BCONF's segmentation rules to perform the segmentation. This segmentation then should be utilized for InstantTranslate

      Useful Links:

       

      https://okapiframework.org/wiki/index.php/SRX

      https://unicode-org.github.io/icu/userguide/strings/regexp.html#regular-expressions
      https://en.wikipedia.org/wiki/Unicode_character_property

      Essential for the implementation is, that we need to make use of the existing regular expressions in the okapi srx.

      Okapi is a Java app and srx uses ICU Regex, which is not completly compatible with Java.

      https://okapiframework.org/wiki/index.php/SRX_and_Java shows, how this is solved in Okapi.

      We must check, what this means for the the regex used in the srx files. Are they icu regex and Okapi simply maps a regex something that works with Java and is equal? Or are actually Java regex in the file? What does that mean for PHP, if we use the same regexes? We should keep this as simple as possible. One way would be, to implement a check, if a regex works in php without an error. If yes, all is fine. If no, we skip the rule. Reason for this strategy: Everything else will become very complex. And: On first sight all the rules currently present in the okapi srx file should  work just as you would expect it in PHP. So in practical use this strategy will reach the goal (pragmatic approach).

      Attachments

        Issue Links

          Activity

            People

              pavelperminov Pavel Perminov
              axelbecher Axel Becher
              Axel Becher
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: