Uploaded image for project: 'translate5'
  1. translate5
  2. TRANSLATE-1405 TermPortal as terminology management solution
  3. TRANSLATE-1925

BUG: Workers running parallelism is not implemented correctly

    XMLWordPrintable

Details

    • High
    • Enhancement: Setting more workers to "waiting" in the "wakeupScheduled" call independently of the calling worker to improve the parallelism of running workers
    • -

    Description

      This is a bug, that makes sense to solve, when we develop the next stage of Termportal. Therefore it is turned into a sub-task, but actually has nothing to do with the new features, that have to be implemented for TRANSLATE-1405

      problem

      While investigating the worker deadlock problem TRANSLATE-1673 a bug in worker parallelism is discovered. Workers can have different states:

      State Meaning
      prepare the worker is added to the worker table, but is not ready to run (for example some other workers are still missing in the worker table, or importable task is created but not uploaded yet. etc...) call schedulePrepared to mark the prepared workers of a taskGuid or worker group to be scheduled
      scheduled The worker is scheduled to be used, it is set to waiting after checking worker dependencies. It is set to waiting by wakeupScheduled call
      waiting waiting workers are ready to run, and may be started (set to running) in parallel, restricted by maxRunProcesses and slot / resource blocking mechanisms
      running the worker is running
      defunct the worker (or a sub worker) crashed
      done the worker has successfully finished its work

      Regarding the wakeupScheduled call, only ONE worker per call is set from scheduled to waiting. According to the worker specification multiple workers can have the state waiting and another algorithm (depending on maxParallelProcesses config and slot and resource logic) the max amount of workers is set to running. Since there is only one scheduled worker set to waiting this logic is not used.

      On the other side, wakeupScheduled is called very often which sets then more then only one scheduled worker to waiting, so the problem is not so grave. On the other hand the exhaustive usage of wakeupScheduled could be one reason for the deadLocks mentioned in TRANSLATE-1673.

      solution

      Change the wakeupScheduled call so that all next available workers of the same worker type are set from scheduled to waiting, not only one worker.

      Main problem here: Since the workers are a core functionality, this change must be tested very well. One way would be to import the 3 test tasks mentioned in TRANSLATE-1673.

      Attachments

        Issue Links

          Activity

            People

              axelbecher Axel Becher
              tlauria Thomas Lauria
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: