Uploaded image for project: 'translate5'
  1. translate5
  2. TRANSLATE-2835

Repair invalid OpenTM2 TMX export

    XMLWordPrintable

Details

    • Critical
    • Depending on the content in the TM the exported TMX may result in invalid XML. This is tried to be fixed as best as possible to provide valid XML.

    Description

      OpenTM2 does in some cases create invalid xml in the TMX export regarding xml special characters.

      Strangely not always, when they appear in segments, but sometimes.

      We have to repair this in order to be able to migrate existing TMs to the new translate5 memory service.

      Therefore we need to parse the content of every <seg> tag on export, check if the content is valid and if not repair the tmx.

      Pretty sure only the internal tags bpt, ept and ph are used in opentm2 tmx. For sure <sub> is not used.

      Added 22nd of March: We also need to replace ph-tags of type="lb" with real linebreaks in the TMX.

      To support invalid XML coming from OpenTM2 (mostly due additional tags in the content), the parser gets a list of valid TMX tags + the list of XLF tags used in the content. All tags not in that list are encoded and this is logged in the PHP log (not the system log!)

      Attachments

        Issue Links

          Activity

            People

              tlauria Thomas Lauria
              marcmittag Marc Mittag [Administrator]
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: