-
Notifications
You must be signed in to change notification settings - Fork 25
Update htcondor instructions #459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Update htcondor instructions #459
Conversation
…gging I was tired of hacking around wanting verbose logging in the HTCondor Snakemake executor, so I added some plumbing to pass Snakemake's '--verbose' flag through 'snakemake_long.py' to snakemake itself. Additionally, I added '--env-manager' so I could run things with my preferred mamba env instead of conda (which is too slow to rebuild).
The executor has matured quite a bit since these instructions were first drafted, and it's my hope that these changes remove a lot of the headache for running jobs. Now, you can edit config files in `config/` and use the `input/` directory directly. Workflows should be submitted directly from the repository root.
tristan-f-r
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[I'll have to restore my HTCondor access to follow this.]
tristan-f-r
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some code nitpicks
Co-authored-by: Tristan F.-R. <pub.tristanf@gmail.com>
agitter
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm testing the Snakemake long execution mode. The first time my jobs went on hold because I put my spras-v0.6.0.sif file in the htcondor/ directory instead of the root directory. That should have been obvious based on the comment in the .yaml file.
On the second attempt my jobs went on hold with
Transfer output files failure at execution point slot1_24@e2591.chtc.wisc.edu while sending files to access point ap2001. Details: 1 total failures: first failure: reading from file /var/lib/condor/execute/slot1/dir_3699332/scratch/output: (errno 2) No such file or directory
|
|
||
| Before submitting all SPRAS jobs to a single remote Execution Point (EP), | ||
| you'll need to set up three things: | ||
| 1. You'll need to modify ``htcondor/spras.sub`` to point at your container |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This appears on the same line as the previous text when rendered https://spras--459.org.readthedocs.build/en/459/htcondor.html
| following: | ||
| Parallelizing SPRAS workflows with HTCondor requires much of the same setup | ||
| as the previous section, but with several additions. | ||
| 1. Build/activate the SPRAS conda/mamba environment and ``pip install`` the SPRAS module |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue
| @@ -63,64 +63,54 @@ image does not use a "v" in the tag. | |||
| Submitting All Jobs to a Single EP | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When talking to colleagues about the single EP versus parallel modes, an initial point of confusion was what files in this htcondor directory are used in each situation. Summarizing that in one place up front in both scenarios could be helpful. For example. the .sub file is not used when running in parallel.
| requires editing the SPRAS profile in ``spras_profile/config.yaml``. | ||
| Make sure you specify the correct container, and change any other config | ||
| values needed by your workflow (defaults are fine in most cases). | ||
| 3. Instead of editing ``spras.sub`` to define the workflow, this scenario |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This language does a good job with respect to my comment above about making it explicit which files are used where. It may still be useful to itemize that all in one place.
| #container_image = <your spras image>.sif | ||
| container_image = instructions-overhaul.sif | ||
| # container_image = docker://reedcompbio/spras:v0.2.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need both of the commented lines?
| log = logs/spras_$(Cluster)_$(Process).log | ||
| output = logs/spras_$(Cluster)_$(Process).out | ||
| error = logs/spras_$(Cluster)_$(Process).err | ||
| log = htcondor/logs/spras_$(Cluster)_$(Process).log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want on per cluster or one per cluster_process pair?
This largely reformats the directory structure needed to run SPRAS workflows with HTCondor. In particular, it moves a lot of the helper code/submit files out of
docker-wrappers/SPRAS/into a top-levelhtcondor/directory. I can do this now that the HTCondor executor has matured significantly, and can handle all the paths as they're configured in this diff.To run a test SPRAS workflow, try following along with the instructions in
docs/htcondor.rst. If anything is confusing, or you get hung up on any of the steps, let's discuss what I can do to make things more clear.