Uploaded image for project: 'translate5'
  1. translate5
  2. TRANSLATE-4680

Re-Segment TMX on TMX import

XMLWordPrintable

    • High
    • Add new Re-segment TMX on import feature

      Problem

      Some TM systems do not segment after every sentence, like translate5 (and most other alike systems) does it. But save one paragraph into one segment, so often multiple sentences.

      This leads to a lot of segments not being found or having very bad matches in translation processes.

      Solution

      Do a segmentation of each translation unit in the TMX (tu-tag).

      Import 1 tu-tag for each resulting segment, if the number of found segments in source and target language is the same.

      Import the original tu-tag unsegmented, if the number of found segments in source and target language differs.

      Make this configurable in TMX import process in the UI (both for new language resource and for uploading tmx to an existing TM). Default should be disabled, so no further segmentation.

      Implementation hint: use the same segmentation mechanism used in InstantTranslate-text-field, so not via Okapi.

            sanya@mittagqi.com Sanya Mikhliaiev
            marcmittag Marc Mittag [Administrator]
            Leon Kiz
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: