Open
Conversation
Added a semaphore to NomadJobManager so that the dispatch job requests don't exceed the urllib3 pool size limit used by the nomad python library
Edited nomad_job_manager.py to work with polling instead of the Nomad event stream. Also modified pipeline_stages.py so that a stage is marked failed if a job in that stage is lost by the Nomad API
When I was cherrypicking changes in other files in src into this branch accidently copied over main.py changes. Reverted the changes since src/main.py changes are being tracked in the update-entrypoint branch
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR encompasses changes associated with the pipelines submission and tracking of Nomad jobs. The biggest changes are to
nomad_job_manager.py.nomad_job_manager.pywas changed to:pipeline_stages.pyin a better way so that pipelines can be marked failed earlier to not waste compute timepipeline_stages.pywas also changed to stagger job submissions within each stage. This was another move designed to reduce load on the nomad server by many pipeline jobs running at once.