Uploaded image for project: 'translate5'
  1. translate5
  2. TRANSLATE-3535

Evaluate postediting time and levenshtein distance

    • High
    • Hide
      This feature is deactivated by default until we have experiences in production with some selected clients. Please contact us in case you would like to do this together with us.
      In our hosting we will enable it successive the next days / one - two weeks.

      If postediting time and levenshtein distance KPIs are needed for legacy data then the following commands should be triggered:
      t5 statistics:levenshtein (to calculate missing levenshtein values in segments history)
      t5 statistics:aggregate (to aggregate segments history data into statistics DB)
      Show
      This feature is deactivated by default until we have experiences in production with some selected clients. Please contact us in case you would like to do this together with us. In our hosting we will enable it successive the next days / one - two weeks. If postediting time and levenshtein distance KPIs are needed for legacy data then the following commands should be triggered: t5 statistics:levenshtein (to calculate missing levenshtein values in segments history) t5 statistics:aggregate (to aggregate segments history data into statistics DB)
    • Hide
      translate5 - 7.21.0: Added segments editing history data aggregation to calculate and display KPIs related to levenshtein distances and post-editing time
      translate5 - 7.21.2: Automated test fixes
      Show
      translate5 - 7.21.0: Added segments editing history data aggregation to calculate and display KPIs related to levenshtein distances and post-editing time translate5 - 7.21.2: Automated test fixes

      • Calculate levenshtein distances on segment level for distance between first version of the segment in a current workflow step and current one
      • Post-Editing time is already calculated since 10 years in translate5

      Save those 2 distances  in LEK_segments and LEK_segment_history tables and an aggregated distance between first version and current one on task level.

      Provide statistical data as follows:

      • Calculate the average by the current filtering in the task management grid and make it available in the same way as "Show KPIs" and "Export meta data".
      • If filtered by "advanced filters" (third dimension of things that correlate with the task user assignment (job)), then also the statistic is only calculated by those jobs in the filtering
      • For further details on how to calculate the averages, please see sheet "Logic" in attached "Calculation and Tooltip Matrix.ods"

      Add new advanced filters, that allow to filter the tasks by

        • match rate range min/max (regarding MT resources quality estimation of Modelfront and probably in the future of LLMs is used in the same way in the same match rate column)
        • used language resource(s) (multi-select tag-field)
        • language resource type (like TermCollection, TM, MT)
      • Please note: If filtered by any filtering that relates to the language resource used for pre-translation, the first language resource used for pre-translation is taken into account. Even if the user decides to manually take over something else from the fuzzy match panel

      • Show the resulting calculated levenshtein distance and post-editimg time averages for the current filtering in the existing "Show KPIs" window and make them also available in the xlsx-file that can be downloaded by clicking on "Download meta-data"

      UI text for KPI window:
      EN
      Ø Post-editing time within 1 workflow step

      Ø Levenshtein distance within 1 workflow step
      Ø Post-editing time from the start of workflow
      Ø Levenshtein distance from the start of workflow

      DE
      Ø Nachbearbeitungszeit innerhalb eines Workflowschritts
      Ø Levenshtein-Distanz
      innerhalb eines Workflowschritts
      Ø Nachbearbeitungszeit ab Beginn des Workflows
      Ø Levenshtein-Distanz ab Beginn des Workflows

      The Levensthein average in the UI should show at least 5 decimal places.
      Behind those UI texts there should be an info icon. And there should be a tooltip when hovering across the text or the info icon. The text of the tooltip should be:

      For the tooltip texts please see the attached file "Calculation and Tooltip Matrix.ods"

      Additional note:
      1/
      If ClickHouse connection cannot be established, in KPI window we display "Daten nicht verfügbar" ("Data unavailable") next to Post-editing time label

      actual ToDos:

      Needed Information for Calculation

      1. (overall) Levinshtein Distanz (= Anzahl der Änderungen)
      2. number of segments
      3. numer of workflow-steps

      if would be good to have these informations somewhere to get a better feeling why a calculated result is not equal to the wanted result.

      When segments are not touched by a user, they are not counted at all (have no entry in segment-history table). This does not feel right for me, see example "Test 2025-01-17: #1"
      It is be calculated right, when a certain workflow-step is finished. But within the workflow-step "untouched segments" are not calculated, and therefore are "missing" (count of segments is not all segments, but only the currently edited/touched ones).
      => information of the actual workflow-step might not be calculated correct, because the "0 segements" are not present here.

      what to do if a user is editing a segment "outside" any workflow-step. Can be reproduced by creating a task and no NOT assign any user.
      => we decided to have one workflow-step before actual workflow called "no workflow" and one AFTER the workflow.
      The "0-segments" where added for "no workflow" on "event start workflow", and the ones for the "after workflow" are added on task-ending.
      Those two workflows are always calculated "independant" which mean the are not part of the "distance per workflow-step" or "distance from the start" calculation
      -> Labels for 4 new KPI Lev for "Before" (no workflow) and "After" (after workflow)
      EN
      Ø
      Levenshtein distance before the start of workflow
      Ø Post-editing time before the start of workflow
      Ø Levenshtein distance after the end of workflow
      Ø Post-editing time after the end of workflow
      DE
      Ø Levenshtein-Distanz vor Beginn des Workflows
      Ø Nachbearbeitungszeit vor Beginn des Workflows
      Ø Levenshtein-Distanz nach Ende des Workflows
      Ø Nachbearbeitungszeit nach Ende des Workflows

      Tooltip in Task-List KPI must only be shown when hover-ing the "i"

      In CLI add optional parameter taskId to "t5 statistics:levenshtein/aggregate"
      sample: t5 statistics:levenshtein -t 123 => create levenshtein only for task with ID

        1. Calculation and Tooltip Matrix.ods
          44 kB
        2. image-2024-09-12-17-12-40-878.png
          image-2024-09-12-17-12-40-878.png
          184 kB
        3. image-2024-09-12-17-15-13-080.png
          image-2024-09-12-17-15-13-080.png
          141 kB
        4. image-2024-12-01-17-41-51-707.png
          image-2024-12-01-17-41-51-707.png
          55 kB
        5. image-2024-12-02-20-29-14-891.png
          image-2024-12-02-20-29-14-891.png
          113 kB
        6. new.txt
          8 kB
        7. newest.txt
          11 kB
        8. Screenshot 2024-10-17 at 17.56.43.png
          Screenshot 2024-10-17 at 17.56.43.png
          200 kB

          [TRANSLATE-3535] Evaluate postediting time and levenshtein distance

          To calculate Levinsthein distance online for comparison this can be used: https://planetcalc.com/1721/

          Marc Mittag [Administrator] added a comment - To calculate Levinsthein distance online for comparison this can be used: https://planetcalc.com/1721/

          just edited the texts for the kpi window again and also its tooltips.

          Please find attached for your convenience the old version (new.txt) and the newest version (newest.txt)

          newest.txt
          .new.txt

          Marc Mittag [Administrator] added a comment - just edited the texts for the kpi window again and also its tooltips. Please find attached for your convenience the old version (new.txt) and the newest version (newest.txt) newest.txt . new.txt

            volodymyr@mittagqi.com Volodymyr Kyianenko
            marcmittag Marc Mittag [Administrator]
            Thomas Lauria
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: