Skip to content

Comments

Avoid duplicating PATH#5425

Merged
adamnovak merged 6 commits intoDataBiosphere:masterfrom
mnijhuis-tos:path-patch
Feb 4, 2026
Merged

Avoid duplicating PATH#5425
adamnovak merged 6 commits intoDataBiosphere:masterfrom
mnijhuis-tos:path-patch

Conversation

@mnijhuis-tos
Copy link
Contributor

When using the singleMachine batch system, environment contains the saved contents of os.environ.

This code then effectively concatenates the original PATH saved in environment to the PATH in os.environ, which is equal. In the end, os.environ thus gets extended with its original contents.

In my case, my PATH was quite long. After toil concatenated it several times to os.environ, os.environ contained 7 times the original PATH. Later, an "Argument list too long" error occured while executing a simple stat call.

If environment["PATH"] already contains os.environ["PATH"] (which is also true of os.environ["PATH"] is empty or if it equals environment["PATH]), the concatenation is not needed, so we can skip it and avoid the "Argument list too long" error.

Changelog Entry

To be copied to the draft changelog by merger:

  • PR submitter writes their recommendation for a changelog entry here

Reviewer Checklist

  • Make sure it is coming from issues/XXXX-fix-the-thing in the Toil repo, or from an external repo.
    • If it is coming from an external repo, make sure to pull it in for CI with:
      contrib/admin/test-pr otheruser theirbranchname issues/XXXX-fix-the-thing
      
    • If there is no associated issue, create one.
  • Read through the code changes. Make sure that it doesn't have:
    • Addition of trailing whitespace.
    • New variable or member names in camelCase that want to be in snake_case.
    • New functions without type hints.
    • New functions or classes without informative docstrings.
    • Changes to semantics not reflected in the relevant docstrings.
    • New or changed command line options for Toil workflows that are not reflected in docs/running/{cliOptions,cwl,wdl}.rst
    • New features without tests.
  • Comment on the lines of code where problems exist with a review comment. You can shift-click the line numbers in the diff to select multiple lines.
  • Finish the review with an overall description of your opinion.

Merger Checklist

  • Make sure the PR passed tests, including the Gitlab tests, for the most recent commit in its branch.
  • Make sure the PR has been reviewed. If not, review it. If it has been reviewed and any requested changes seem to have been addressed, proceed.
  • Merge with the Github "Squash and merge" feature.
    • If there are multiple authors' commits, add Co-authored-by to give credit to all contributing authors.
  • Copy its recommended changelog entry to the Draft Changelog.
  • Append the issue number in parentheses to the changelog entry.

When using the singleMachine batch system, `environment` contains the saved contents of `os.environ`.

This code then effectively concatenates the original PATH saved in `environment` to the PATH in `os.environ`, which is equal. In the end, os.environ thus gets extended with its original contents.

In my case, my PATH was quite long. After toil concatenated it several times to `os.environ`, `os.environ` contained 7 times the original PATH. Later, an "Argument list too long" error occured while executing a simple `stat` call.

If `environment["PATH"]` already contains `os.environ["PATH"]` (which is also true of `os.environ["PATH"]` is empty or if it equals `environment["PATH]`), the concatenation is not needed, so we can skip it and avoid the  "Argument list too long" error.
Copy link
Member

@adamnovak adamnovak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can fix this, but I am going to try and reimplement the code change because what's here looks breakable.

# Handle path specially. Sometimes e.g. leader may not include
# /bin, but the Toil appliance needs it.
if i in os.environ and os.environ[i] != "":
if i in os.environ and os.environ[i] not in environment[i]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is quite the right code for this; if the worker has PATH=/bin and the workflow is trying to apply PATH=/usr/bin:/bin:/home/username/bin, then it won't make any changes to the path.

Probably what we really want is a real union of the directory sets, with the new ones first.

@adamnovak
Copy link
Member

I've pulled this in for testing in our path-patch branch.

@adamnovak adamnovak merged commit 7b6693a into DataBiosphere:master Feb 4, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants