Details
-
Bug
-
Resolution: Fixed
-
None
-
Critical
-
Empty show more show less
Description
problem
The length of a segment influences the duration the termtagger needs in an exponentially way.
Therefore very long segments can just bring the termTagger down by blocking the whole termtagger instance.
solution
Segments longer as a configurable value are not considered as to be tagged.
On the import this can be done by the segment status already used by the termtag import, on the GUI tagging we just check the length and if the segment is longer we just send an according message to the GUI that the segment can not be tagged.
Segment length analysis
Regarding some of our customers, the distribution of wordCounts shows, that normally the segments do not have more then 150 words. See attached txt file.
Everything above can be considered as not normal, so the default for the segments not to be tagged a word count of 150 is assumed.