Details
-
Sub-task
-
Resolution: Fixed
-
None
-
High
-
Enhancement: Setting more workers to "waiting" in the "wakeupScheduled" call independently of the calling worker to improve the parallelism of running workers
-
Empty show more show less
Description
This is a bug, that makes sense to solve, when we develop the next stage of Termportal. Therefore it is turned into a sub-task, but actually has nothing to do with the new features, that have to be implemented for TRANSLATE-1405
problem
While investigating the worker deadlock problem TRANSLATE-1673 a bug in worker parallelism is discovered. Workers can have different states:
State | Meaning |
---|---|
prepare | the worker is added to the worker table, but is not ready to run (for example some other workers are still missing in the worker table, or importable task is created but not uploaded yet. etc...) call schedulePrepared to mark the prepared workers of a taskGuid or worker group to be scheduled |
scheduled | The worker is scheduled to be used, it is set to waiting after checking worker dependencies. It is set to waiting by wakeupScheduled call |
waiting | waiting workers are ready to run, and may be started (set to running) in parallel, restricted by maxRunProcesses and slot / resource blocking mechanisms |
running | the worker is running |
defunct | the worker (or a sub worker) crashed |
done | the worker has successfully finished its work |
Regarding the wakeupScheduled call, only ONE worker per call is set from scheduled to waiting. According to the worker specification multiple workers can have the state waiting and another algorithm (depending on maxParallelProcesses config and slot and resource logic) the max amount of workers is set to running. Since there is only one scheduled worker set to waiting this logic is not used.
On the other side, wakeupScheduled is called very often which sets then more then only one scheduled worker to waiting, so the problem is not so grave. On the other hand the exhaustive usage of wakeupScheduled could be one reason for the deadLocks mentioned in TRANSLATE-1673.
solution
Change the wakeupScheduled call so that all next available workers of the same worker type are set from scheduled to waiting, not only one worker.
Main problem here: Since the workers are a core functionality, this change must be tested very well. One way would be to import the 3 test tasks mentioned in TRANSLATE-1673.
Attachments
Issue Links
- causes
-
TRANSLATE-1974 Term-collection language resources import worker should run only one at a time
- Done