Uploaded image for project: 'translate5'
  1. translate5
  2. TRANSLATE-2022

Prevent huge segments to be send to the termTagger

    XMLWordPrintable

Details

    Description

       

      problem

      The length of a segment influences the duration the termtagger needs in an exponentially way.

      Therefore very long segments can just bring the termTagger down by blocking the whole termtagger instance.

      solution

      Segments longer as a configurable value are not considered as to be tagged.

      On the import this can be done by the segment status already used by the termtag import, on the GUI tagging we just check the length and if the segment is longer we just send an according message to the GUI that the segment can not be tagged.

      Segment length analysis

      Regarding some of our customers, the distribution of wordCounts shows, that normally the segments do not have more then 150 words. See attached txt file.

      Everything above can be considered as not normal, so the default for the segments not to be tagged a word count of 150 is assumed.

      Attachments

        Activity

          People

            tlauria Thomas Lauria
            tlauria Thomas Lauria
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: