Uploaded image for project: 'translate5'
  1. translate5
  2. TRANSLATE-1405 TermPortal as terminology management solution
  3. TRANSLATE-1274

Extend TBX-import and -export to widely match LISA-type TBX-Basic and extend translate5 term-DB structure

    XMLWordPrintable

Details

    • High
    • Tbx import/export ability added

    Description

      General hint

      Keep in mind, that in a next step an editing history analogous to the segment history might come. So if relevant develop the architecture with that in mind.

      By solving this issue also solve TRANSLATE-1410

      General import behaviour

      In this version of translate5 TermPortal we will support TBX version 2 for import and export as promoted by LISA in 2008 and as documented in the 2008 download package, that can be found here:

      https://github.com/byutrg/TBX-Basic-Package/archive/e4a4289461553df091070373887ef5e3d516c571.ziphttps://www.tbxinfo.net/tbx-downloads-2/

      In addition Across-style TBX will be supported for import.

      The code should be written in a manner, that reflects, that in the future we will also have to support "TBXv3 Basic" import. See https://www.tbxinfo.net/tbx-dialects/?id=2

      If the imported TBX contains a not supported element or element-attribute (do not mix element attributes like "type" with the term-attributes like 'type="administratusStatus"'!), a warning level error is issued to the error-system and the tag or element-attribute is ignored.

      A lot of what is written below is already supported for the import, but needs to be extended to be fit for editing and export purposes.

      Extending merge behaviour

      The current merge behaviour is documented here:

      https://confluence.translate5.net/display/TAD/Term+Collection

      This is extended:

      If a term a term already exists in the TermCollection and should be merged with the imported term, it is checked, if the term already exists in other termEntries of the same TermCollection as well:

      • If yes: The term is not merged and not imported, but an import error is recorded and the import continues with the next termEntry (the entire termEntry of this term is skipped). After the import is finished, all import errors are shown in the GUI / passed back via API.
      • If no: The term is merged as described in https://confluence.translate5.net/display/TAD/Term+Collection

      Martif-header

      On import, ignore the martifHeader (everything inside the tag <martifHeader>)

      On export, write a martifHeader that looks like follows:

      <martifHeader>
      	<fileDesc>
      		<titleStmt>
      			<title>Export of translate5 termCollection TERMCOLLECTIONNAME</title>
      			<note>Contains the languages: list of rfc4656 codes of contained languages</note>
      		</titleStmt>
      		<sourceDesc>
      			<p>File is exported from translate5 instance at https://domainOfTranslate5 by the user USERNAME WHO EXPORTED</p>
      		</sourceDesc>
      	</fileDesc>
      	<encodingDesc>
      		<p type="XCSURI">http://www.lisa.org/fileadmin/standards/tbx_basic/TBXBasicXCSV02.xcs</p>
      	</encodingDesc>
      </martifHeader>
       

      Body of the TBX

      All tags in chapter 5 of TBX_Basic_datacategoriesV23.pdf (contained in the attached zip file) are supported for import and export. Exception: The <back>-tag, that is not part of the body-tag, but part of chapter 5.

      If the body contains elements that are not part of  TBX_Basic_datacategoriesV23.pdf, they are ignored and a warning level error is issued. The import continues.

      Markup inside of TBX-elements is not supported with this version of TermPortal. This means, if there are markup elements (such as bold etc.) in elements of the type basictext or notetext, than they are deleted on import time.

      Ref-Element

      The ref-tag on termEntry-level references to the ID of another termEntry. The import should map that, so that in the GUI a click on the reference opens the other entry in the GUI (see TBX_Basic_datacategoriesV23.pdf page 11).

      The analogous behaviour is true for a ref-element on term-level (see page 12).

      <DescripGrp, <descrip and <admin type="source"

      The releation between these 3 elements is analogous to <transAcGrp, <transAc and <transAcNote. The <admin type="source" gives a source to a desciption of type="context" or type="definition", if it is contained within a <descripGrp. This must be considered on import and must be shown in the GUI.

      termEntry- or term-Attributes of the body

      All termEntry- or term-attributes in chapter 3 of TBX_Basic_datacategoriesV23.pdf are supported (known) by default by translate5's DB structure. Thus they are always selectable, when adding or editing a term through the GUI as described in TRANSLATE-1275.

      If a TBX is imported, that contains other termEntry- or term-attributes, they are simply added to translate5 and in the future are selectable in the termCollection they belong to as termEntry- or term-attributes.

      Text field or picklist

      For each termEntry- or term-attribute a system configuration (not based in Zf_configuration but in an extra table) defines, if the attribute holds a plaintext field or a picklist. Analogous to Zf_configuration the default can be overwritten on system basis (at this stage only on database level).

      For termEntry- or term-attributes defined in TBX-Basic TBX_Basic_datacategoriesV23.pdf spec tells, if the attribute default is picklist or plaintext. For other termEntry- or term-attributes that enter through TBX-import the default always is plaintext. The database "knows", if an termEntry- or term-attribute is part of TBX-Basic, or not (used for re-export).

      Picklist values

      For picklists the default picklist values are defined in the database of translate5.

      For picklists as defined in TBX-Basic, the default values are defined in a DB-table and taken from TBX-Basic.

      On TBX import TBX-picklist values are simply imported, even if they are not valid according to TBX-Basic. The database "knows" after the import, if an termEntry- or term-attribute value is part of TBX-Basic, or not (so far used for re-export). This even means, that a picklist value can be not valid TBX-Basic, but the picklist itself is valid.

      The German GUI text for a value is also defined in the DB-table. The GUI-translations are drawn from translate5 GUI-translation mechanism.

      Mapping of administrativeStatus

      A system configuration maps potential values of <termNote type="administrativeStatus"> to the ones of TBX-Basic for displaying them in translate5 editor (please see page 8 of TBX_Basic_datacategoriesV23.pdf; notRecommended and obsolete are both mapped to notRecommended). This ensures, that the display of term attribute icons in term-portlet of translate5 editor (recommended, forbidden, etc.) still work as they should.

      Please also ensure, that what has been implemented with https://jira.translate5.net/browse/TRANSLATE-1375 still works.

      In addition it must be possible to also map other attributes to the administrativeStatus, like the proprietary Across across_ISO_picklist_Usage:

      <termNote type="across_ISO_picklist_Usage">do not use</termNote>

      Text termEntry- or term-attributes

      For termEntry- or term-attributes with type "noteText" (according to TBX_Basic_datacategoriesV23.pdf) translate5 will NOT support inline tags in this development phase. Therefore on the import all contained internal tags will be stripped, so that notetext is therefore converted to PCDATA. If tags are stripped, issue a warning level error, that does not block the import.

      processStatus

      processStatus is not part of the TBXv2 Basic definition, yet of the much wider specification "Term Base eXchange (TBX) (identical to ISO 30042:2008)" as linked at https://www.gala-global.org/sites/default/files/migrated-pages/docs/tbx_oscar_0.pdf

      Yet it is widely used and important for most terminology usage scenarios. Therefore it must be supported in translate5's standard TBX im- and export and in the GUI usage.

      Images

      Images are not part of TBXv2 Basic, yet of the much wider specification "Term Base eXchange (TBX) (identical to ISO 30042:2008)" as linked at https://www.gala-global.org/sites/default/files/migrated-pages/docs/tbx_oscar_0.pdf

      Import and export of images is supported as described there. Please see page 33 "11.2 Referencing a file that is embedded in the back matter of a TBX file" and "11.3 Referencing a file from the back matter" on the same page.

      Back-Element

      Of the Back-Element not everything described in TBX-Basic is supported, but only the referencing of the user from the termEntry-part of the TBX and the referencing of images for import and export.

      The goal is to be able to match users in the imported TBX with users, that are alread present in translate5.

      Please see the code below. The target attribute of <transacNote type="responsibility" target="user_516"> references the user in the back area.

      The following should happen:

      • if an user with a guid equal to the id exists in translate5, the current termEntry is linked to that user in Zf_users, so that translate5 knows about the relation. This implies, that on editing or proposing a term or an attribute also the relation to the user who does that is saved in the DB.
      • else if an user with the mail address exists in translate5, the termEntry is linked to that user and INSTEAD of the userid in the TBX the userGUID from translate5 is used for that user inside of the termadministration of translate5 - so that the userGUID is exported in case of TBX export (if multiple users with same mail addresses exist, use the first one.)
      • elseIf a user is specified for the term in the TBX, but its ID and email does not match a user in translate5, the user is saved for the term in the translate5 term-DB structure (most likely in a table term_users) and is NOT linked in any way to the translate5s Zf_users
      • else the PM is used as user and connected to the term (this should only be the case, if no user is referenced in the back element for a term)
      <termEntry id="termEntry_400">
                      <langSet xml:lang="de-DE">
                        <tig id="tig_400_de-DE_1">
                          <term id="term_7af1bb1c-a9ae-c05f-3918-e00ed7f6c4a6">anonymisiert</term>
                          ...
                          <transacGrp>
                            <transac type="transactionType">modification</transac>
                            <transacNote type="responsibility" target="user_516">Oliver Müller</transacNote>
                            <date>19-11-2016</date>
                          </transacGrp>
                          ...
                        </tig>
                      </langSet>
                    </termentry>
                    ...
                    <back>
                              <refObjectList type="respPerson">
                                ...
                                <refObject id="user_516">
                                  <item type="fn">Oliver Müller</item>
                                  <item type="email">oliver.mueller@company.de</item>
                                </refObject>
                                ...
                              </refObjectList>
                    ...
                    </back>
       

      General export behaviour

      The export must be implemented in a way that makes it possible to export also big amounts of data (TBX bigger than 2 GB) without first creating the entire file in the memory, because this would use to much memory. So the file should first be temporary written to the file system and then offered for download from there and automatically be deleted after the download is ready. This is better than direct streaming, because then we know the file size in advance and can pass that information to the browser.

      The existing export that currently is only used for TermTagger is extended to be able to generate an export for each termCollection, that contains all information in the termCollection as TBX. Please ensure, that the TermTagger still does not receive more information in the TBX than it currently does to avoid an increase of memory usage or triggering of bugs in TermTagger.

      2 export options are added to the languageResources interface of the GUI:

      • TBX-Basic:
        • It only exports attributes and contents, that are part of TBX-Basic version 2
        • The header is set as so: <martif type="TBX-Basic" 
        • It exports referenced backmatter about the persons that created or modified terms:
      <refObjectList type="respPerson">
           <refObject id="userGUID">
                <item type="fn">Jane Doe</item> 
                <item type="email">jane_doe@mymail.com</item>           
           </refObject>
           ...
      </refObjectList>
      • TBX:
        • It exports all other attributes too
        • The header is set as so: <martif type="TBX" 
        • It also adds a list of all subjectFields, that exist in the TermCollection like so:
      <refObjectList type="subjectField">
                  <refObject>
                    <item>Fahrwerk und Reifen</item>
                  </refObject>
                  ...
      </refObjectList>
      

       Store all tbx basic and non tbx basic attributes in separate table

      We need a place where we can store all available term and term entry attributes (only the attribute name,value,type, collection and termEntryId/termId) for a term collection. This data currently exist(lek_term_attributes), but the query will be to expensive if we try to find out what kind of term attributes the searched term collection contains. This data is required for providing a frontend attribute filter in the term portal.

      Please also have a look for how the process status "rejected" must be handled in the TBX export. See TRANSLATE-1409

      Attachments

        1. TBX-basic-sample.tbx
          10 kB
          Marc Mittag [Administrator]

        Issue Links

          Activity

            People

              pavelperminov Pavel Perminov
              marcmittag Marc Mittag [Administrator]
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: