Uploaded image for project: 'translate5'
  1. translate5
  2. TRANSLATE-3916

Make concept for Post-editing and levenshtein statistics

    XMLWordPrintable

Details

    Description

      Visualisation:

      concepted in related task TRANSLATE-3535

      Technical concept:

      1. In Clickhouse create new table segment_modifications_aggregation with columns:

      • taskGuid -> foreign for LEK_tasks.taskGuid
      • userGuid -> foreign for Zf_users.userGuid
      • workflowName (string)
      • workflowStepName (string)
      • segmentId -> foreign for LEK_segments.id
      • segmentName (string) - source | target | ???
      • duration (int) - sum of segment edit duration
      • levenshteinOriginal (int) - distance between original segment text and current state
      • levenshteinPrevious (int) - distance between resulting segment text in previous step and current state
      • matchRate (int) - copy from segment matchRate
      • matchRateType (varchar(60)) - copy from segment matchRateType
      • ???

      2. Create migration script to hydrate aggregation table with data from LEK_segmnet_history* tables

      3. New records should go in both LEK_segmnet_history and segment_modifications_aggregation
      4. In KpiWindow.js add new rows to show avg duration and Levenshtein distance:
          - levenshteinOriginal will always appear in report 
          - levenshteinPrevious when filter by job is applied
      5. In admin/task/filter/FilterWindow.js add filters for match rate and language resource
      6. Make service to fetch data for editor_TaskController::kpiAction
      7. Provide data for KpiWindow.js in editor_TaskController::kpiAction
      In editor_TaskController we will do double filtering: firstly filter by existing filters with existing procedures, then with filtered task guids we'll fetch task guids from clickhouse using match related filters.
      For index action after that we will once again filter tasks to show filtered table
      For kpi we'll use this procedure to get avg results for desired fields

      8. Rename "Assigned role" to "Workflow step" in "Advanced filters" window

      9. Task removement: Add deletion of clickhouse data to TaskRemover

      10. Maintenannce / Deployment Todos

      • Addition to 2. migration script → convert into CLI command which is working on all segments per batch - so that the installation is still usable while script is running, and script should be recallable for recalculation, proceeding recalculation when crashed etc.
      • t5 task:clean command should list info about orphaned data in clickhouse to deleted tasks and provide script to delete them
      • add similar class to Models_SystemRequirement_Modules_Database for checking connection and in future other needed things for ClickHouse DB
      • reminder to add the docker config of clickhouse to the public docker file and add instructions to the important release notes of TRANSLATE-3535
      • Setup ClickHouse DB in instance:create command in hosting scripts

      11. migration script which adds clickhouse credentials to installation.ini - ENV is not enough here

      Attachments

        Issue Links

          Activity

            People

              sanya@mittagqi.com Sanya Mikhliaiev
              marcmittag Marc Mittag [Administrator]
              Thomas Lauria
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: