-
Notifications
You must be signed in to change notification settings - Fork 25
Update htcondor instructions #459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
This file was deleted.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,7 +1,7 @@ | ||
| Running with HTCondor | ||
| ===================== | ||
|
|
||
| The folder `docker-wrappers/SPRAS <https://github.com/Reed-CompBio/spras/tree/main/docker-wrappers/SPRAS>`_ | ||
| The folder `htcondor/ <https://github.com/Reed-CompBio/spras/tree/main/htcondor>`_ | ||
| inside the SPRAS git repository contains several files that can be used to | ||
| run workflows with this container on HTCondor. To use the ``spras`` | ||
| image in this environment, first login to an HTCondor Access Point (AP). | ||
|
|
@@ -63,64 +63,54 @@ image does not use a "v" in the tag. | |
| Submitting All Jobs to a Single EP | ||
| ---------------------------------- | ||
|
|
||
| Navigate to the ``spras/docker-wrappers/SPRAS`` directory and create the | ||
| ``logs/`` directory (``mkdir logs``). Next, modify ``spras.sub`` so that | ||
| it uses the SPRAS apptainer image you created: | ||
|
|
||
| :: | ||
|
|
||
| container_image = < your spras image >.sif | ||
|
|
||
| Make sure to modify the configuration file to have | ||
| ``unpack_singularity`` set to ``true``, and ``containers.framework`` set | ||
| to ``singularity``: else, the workflow will (likely) fail. | ||
|
|
||
| Then run ``condor_submit spras.sub``, which will submit SPRAS to | ||
| HTCondor as a single job with as many cores as indicated by the | ||
| ``NUM_PROCS`` line in ``spras.sub``, using the value of | ||
| ``EXAMPLE_CONFIG`` as the SPRAS configuration file. By default, the | ||
| ``example_config.yaml`` runs everything except for ``cytoscape``, which | ||
| appears to fail periodically in HTCondor. | ||
|
|
||
| **Note**: The ``spras.sub`` submit file is an example of how this | ||
| workflow could be submitted from a CHTC Access Point (AP) to the OSPool. | ||
| To run in the local CHTC pool, omit the ``+WantGlideIn`` and | ||
| ``requirements`` lines. | ||
| Running all SPRAS steps on a single remote Execution Point (EP) is a good way | ||
| to get started with HTCondor, but it is significantly less efficient than using | ||
| HTCondor's distributed capabilities. This approach is best suited for | ||
| workflows that are not computationally intensive, or for testing and | ||
| debugging purposes. | ||
|
|
||
| Before submitting all SPRAS jobs to a single remote Execution Point (EP), | ||
| you'll need to set up three things: | ||
| 1. You'll need to modify ``htcondor/spras.sub`` to point at your container | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This appears on the same line as the previous text when rendered https://spras--459.org.readthedocs.build/en/459/htcondor.html |
||
| image, along with any other configuration changes you want to make like | ||
| choosing a logging directory or toggling OSPool submission. Note that all | ||
| paths in the submit file are relative to the directory from which you run | ||
| ``condor_submit``, which will typically be the root of the SPRAS repository. | ||
| 2. You'll need to ensure your SPRAS configuration file has a few key values | ||
| set, including ``unpack_singularity: true`` and | ||
| ``containers.framework: singularity``. | ||
| 3. Finally, it's best practice to create the logging directory configured in | ||
| the submit file before submitting the job, e.g. to create the default log | ||
| directory, run ``mkdir htcondor/logs`` from the root of the repository. | ||
jhiemstrawisc marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| Once these steps are complete, you can submit the job from the root of the | ||
| the SPRAS repository by running ``condor_submit htcondor/spras.sub``. | ||
|
|
||
| When the job completes, the ``output`` directory from the workflow should be | ||
| returned as ``output``. | ||
|
|
||
| Submitting Parallel Jobs | ||
| ------------------------ | ||
|
|
||
| Parallelizing SPRAS workflows with HTCondor requires the same setup as | ||
| the previous section, but with two additions. First, it requires an | ||
| activated SPRAS conda environment with a ``pip install``-ed version of | ||
| the SPRAS module (via ``pip install .`` inside the SPRAS directory). | ||
|
|
||
| Second, it requires an experimental executor for HTCondor that has been | ||
| forked from the upstream `HTCondor Snakemake | ||
| executor <https://github.com/htcondor/snakemake-executor-plugin-htcondor>`__. | ||
|
|
||
| After activating your ``spras`` conda environment and ``pip``-installing | ||
| SPRAS, you can install the HTCondor Snakemake executor with the | ||
| following: | ||
| Parallelizing SPRAS workflows with HTCondor requires much of the same setup | ||
| as the previous section, but with several additions. | ||
| 1. Build/activate the SPRAS conda/mamba environment and ``pip install`` the SPRAS module | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same issue |
||
| (via ``pip install .`` inside the SPRAS directory). | ||
| 2. Install the `HTCondor Snakemake | ||
| executor <https://github.com/htcondor/snakemake-executor-plugin-htcondor>`__; once your | ||
| SPRAS conda/mamba environment is activated and SPRAS is ``pip install``-ed, | ||
| you can install the HTCondor Snakemake executor with the following: | ||
|
|
||
| .. code:: bash | ||
|
|
||
| pip install git+https://github.com/htcondor/snakemake-executor-plugin-htcondor.git | ||
|
|
||
| Currently, this executor requires that all input to the workflow is | ||
| scoped to the current working directory. Therefore, you'll need to copy | ||
| the Snakefile and your input directory (as specified by | ||
| ``example_config.yaml``) to this directory: | ||
|
|
||
| .. code:: bash | ||
|
|
||
| cp ../../Snakefile . && \ | ||
| cp -r ../../input . | ||
|
|
||
| Instead of editing ``spras.sub`` to define the workflow, this scenario | ||
| requires editing the SPRAS profile in ``spras_profile/config.yaml``. | ||
| Make sure you specify the correct container, and change any other config | ||
| values needed by your workflow (defaults are fine in most cases). | ||
| 3. Instead of editing ``spras.sub`` to define the workflow, this scenario | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This language does a good job with respect to my comment above about making it explicit which files are used where. It may still be useful to itemize that all in one place. |
||
| requires editing the SPRAS profile in ``htcondor/spras_profile/config.yaml``. | ||
| Make sure you specify the correct container, and change any other config | ||
| values needed by your workflow (defaults are fine in most cases). | ||
| 4. Modify your SPRAS configuration file to set ``unpack_singularity: true`` and | ||
| ``containers.framework: singularity``. | ||
|
|
||
| Then, to start the workflow with HTCondor in the CHTC pool, there are | ||
| two options: | ||
|
|
@@ -132,42 +122,69 @@ The first option is to run Snakemake in a way that ties its execution to | |
| your terminal. This is good for testing short workflows and running | ||
| short jobs. The downside is that closing your terminal causes the | ||
| process to exit, removing any unfinished jobs. To use this option, | ||
| invoke Snakemake directly by running: | ||
| invoke Snakemake directly from the repository root by running: | ||
|
|
||
| .. code:: bash | ||
|
|
||
| snakemake --profile spras_profile | ||
| snakemake --profile htcondor/spras_profile/ | ||
|
|
||
| **Note**: Running the workflow in this way requires that your terminal | ||
| session stays active. Closing the terminal will suspend ongoing jobs, but | ||
| Snakemake will handle picking up where any previously-completed jobs left off | ||
| when you restart the workflow. | ||
|
|
||
| Long Running Snakemake Jobs (Managed by HTCondor) | ||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
|
||
| The second option is to let HTCondor manage the Snakemake process, which | ||
| allows the jobs to run as long as needed. Instead of seeing Snakemake | ||
| output directly in your terminal, you'll be able to see it in a | ||
| specified log file. To use this option, make sure ``snakemake_long.py`` | ||
| is executable (you can run ``chmod +x snakemake_long.py`` from the AP to | ||
| make sure it is), and then run: | ||
| specified log file. To use this option, run from the repository root: | ||
|
|
||
| :: | ||
| .. code:: bash | ||
|
|
||
| ./htcondor/snakemake_long.py --profile htcondor/spras_profile/ | ||
|
|
||
| ./snakemake_long.py --profile spras_profile --htcondor-jobdir <path/to/logging/directory> | ||
| A convenience script called ``run_htcondor.sh`` is also provided in the | ||
| repository root. You can execute this script by running: | ||
|
|
||
| .. code:: bash | ||
|
|
||
| When run in this mode, all log files for the workflow will be placed | ||
| into the path you provided for the logging directory. In particular, | ||
| Snakemake's outputs with job progress can be found split between | ||
| ``<logdir>/snakemake-long.err`` and ``<logdir>/snakemake-long.out``. | ||
| ./run_htcondor.sh | ||
|
|
||
| When executed in this mode, all log files for the workflow will be placed | ||
| into the logging directory (``htcondor/logs`` by default). In particular, | ||
| Snakemake's stdout/stderr outputs containing your workflow's progress can | ||
| be found split between ``htcondor/logs/snakemake.err`` and ``htcondor/logs/snakemake.out``. | ||
| These will also log each rule and what HTCondor job ID was submitted for | ||
| that rule (see the `troubleshooting section <#troubleshooting>`__ for | ||
| information on how to use these extra log files). | ||
|
|
||
| **Note**: While you're in the initial stages of developing/debugging your | ||
| workflow, it's very useful to invoke Snakemake with the ``--verbose`` flag. | ||
| This can be passed to Snakemake via the ``snakemake_long.py`` script by | ||
| adding it to the script's argument list, e.g.: | ||
|
|
||
| .. code:: bash | ||
|
|
||
| ./htcondor/snakemake_long.py --profile htcondor/spras_profile/ --verbose | ||
|
|
||
| If you use mamba instead of conda for environment management, you can specify | ||
| this with the ``--env-manager`` flag: | ||
|
|
||
| .. code:: bash | ||
|
|
||
| ./htcondor/snakemake_long.py --profile htcondor/spras_profile/ --env-manager mamba | ||
|
|
||
| Adjusting Resources | ||
| ------------------- | ||
|
|
||
| Resource requirements can be adjusted as needed in | ||
| ``spras_profile/config.yaml``, and HTCondor logs for this workflow can | ||
| be found in ``.snakemake/htcondor``. You can set a different log | ||
| directory by adding ``htcondor-jobdir: /path/to/dir`` to the profile's | ||
| configuration. | ||
| ``htcondor/spras_profile/config.yaml``, and HTCondor logs for this workflow | ||
| can be found in your log directory. You can set a different log | ||
| directory by changing the configured ``htcondor-jobdir`` in the profile's | ||
| configuration. Alternatively, you can pass a different log directory | ||
| when invoking Snakemake with the ``--htcondor-jobdir`` argument. | ||
|
|
||
| To run this same workflow in the OSPool, add the following to the | ||
| profile's default-resources block: | ||
|
|
@@ -178,11 +195,6 @@ profile's default-resources block: | |
| requirements: | | ||
| '(HAS_SINGULARITY == True) && (Poolname =!= "CHTC")' | ||
|
|
||
| **Note**: This workflow requires that the terminal session responsible | ||
| for running snakemake stays active. Closing the terminal will suspend | ||
| jobs, but the workflow can use Snakemake's checkpointing to pick up any | ||
| jobs where they left off. | ||
|
|
||
| **Note**: If you encounter an error that says | ||
| ``No module named 'spras'``, make sure you've ``pip install``-ed the | ||
| SPRAS module into your conda environment. | ||
|
|
@@ -195,11 +207,10 @@ To monitor the state of the job, you can use a second terminal to run | |
| ``condor_watch_q`` for realtime updates. | ||
|
|
||
| Upon completion, the ``output`` directory from the workflow should be | ||
| returned as ``spras/docker-wrappers/SPRAS/output``, along with several | ||
| files containing the workflow's logging information (anything that | ||
| matches ``logs/spras_*`` and ending in ``.out``, ``.err``, or ``.log``). | ||
| If the job was unsuccessful, these files should contain useful debugging | ||
| clues about what may have gone wrong. | ||
| returned as ``output``, along with several files containing the workflow's | ||
| logging information (anything that matches ``htcondor/logs/spras_*`` and | ||
| ending in ``.out``, ``.err``, or ``.log``). If the job was unsuccessful, | ||
| these files should contain useful debugging clues about what may have gone wrong. | ||
|
|
||
| **Note**: If you want to run the workflow with a different version of | ||
| SPRAS, or one that contains development updates you've made, rebuild | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When talking to colleagues about the single EP versus parallel modes, an initial point of confusion was what files in this
htcondordirectory are used in each situation. Summarizing that in one place up front in both scenarios could be helpful. For example. the.subfile is not used when running in parallel.