Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -144,3 +144,12 @@ TempMat.mat

# Singularity cache
unpacked

# HTCondor logs
htcondor/logs/
*.err
*.out
*.log

# Any sif files
*.sif
154 changes: 0 additions & 154 deletions docker-wrappers/SPRAS/example_config.yaml

This file was deleted.

161 changes: 86 additions & 75 deletions docs/htcondor.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Running with HTCondor
=====================

The folder `docker-wrappers/SPRAS <https://github.com/Reed-CompBio/spras/tree/main/docker-wrappers/SPRAS>`_
The folder `htcondor/ <https://github.com/Reed-CompBio/spras/tree/main/htcondor>`_
inside the SPRAS git repository contains several files that can be used to
run workflows with this container on HTCondor. To use the ``spras``
image in this environment, first login to an HTCondor Access Point (AP).
Expand Down Expand Up @@ -63,64 +63,54 @@ image does not use a "v" in the tag.
Submitting All Jobs to a Single EP
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When talking to colleagues about the single EP versus parallel modes, an initial point of confusion was what files in this htcondor directory are used in each situation. Summarizing that in one place up front in both scenarios could be helpful. For example. the .sub file is not used when running in parallel.

----------------------------------

Navigate to the ``spras/docker-wrappers/SPRAS`` directory and create the
``logs/`` directory (``mkdir logs``). Next, modify ``spras.sub`` so that
it uses the SPRAS apptainer image you created:

::

container_image = < your spras image >.sif

Make sure to modify the configuration file to have
``unpack_singularity`` set to ``true``, and ``containers.framework`` set
to ``singularity``: else, the workflow will (likely) fail.

Then run ``condor_submit spras.sub``, which will submit SPRAS to
HTCondor as a single job with as many cores as indicated by the
``NUM_PROCS`` line in ``spras.sub``, using the value of
``EXAMPLE_CONFIG`` as the SPRAS configuration file. By default, the
``example_config.yaml`` runs everything except for ``cytoscape``, which
appears to fail periodically in HTCondor.

**Note**: The ``spras.sub`` submit file is an example of how this
workflow could be submitted from a CHTC Access Point (AP) to the OSPool.
To run in the local CHTC pool, omit the ``+WantGlideIn`` and
``requirements`` lines.
Running all SPRAS steps on a single remote Execution Point (EP) is a good way
to get started with HTCondor, but it is significantly less efficient than using
HTCondor's distributed capabilities. This approach is best suited for
workflows that are not computationally intensive, or for testing and
debugging purposes.

Before submitting all SPRAS jobs to a single remote Execution Point (EP),
you'll need to set up three things:
1. You'll need to modify ``htcondor/spras.sub`` to point at your container
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears on the same line as the previous text when rendered https://spras--459.org.readthedocs.build/en/459/htcondor.html

image, along with any other configuration changes you want to make like
choosing a logging directory or toggling OSPool submission. Note that all
paths in the submit file are relative to the directory from which you run
``condor_submit``, which will typically be the root of the SPRAS repository.
2. You'll need to ensure your SPRAS configuration file has a few key values
set, including ``unpack_singularity: true`` and
``containers.framework: singularity``.
3. Finally, it's best practice to create the logging directory configured in
the submit file before submitting the job, e.g. to create the default log
directory, run ``mkdir htcondor/logs`` from the root of the repository.

Once these steps are complete, you can submit the job from the root of the
the SPRAS repository by running ``condor_submit htcondor/spras.sub``.

When the job completes, the ``output`` directory from the workflow should be
returned as ``output``.

Submitting Parallel Jobs
------------------------

Parallelizing SPRAS workflows with HTCondor requires the same setup as
the previous section, but with two additions. First, it requires an
activated SPRAS conda environment with a ``pip install``-ed version of
the SPRAS module (via ``pip install .`` inside the SPRAS directory).

Second, it requires an experimental executor for HTCondor that has been
forked from the upstream `HTCondor Snakemake
executor <https://github.com/htcondor/snakemake-executor-plugin-htcondor>`__.

After activating your ``spras`` conda environment and ``pip``-installing
SPRAS, you can install the HTCondor Snakemake executor with the
following:
Parallelizing SPRAS workflows with HTCondor requires much of the same setup
as the previous section, but with several additions.
1. Build/activate the SPRAS conda/mamba environment and ``pip install`` the SPRAS module
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue

(via ``pip install .`` inside the SPRAS directory).
2. Install the `HTCondor Snakemake
executor <https://github.com/htcondor/snakemake-executor-plugin-htcondor>`__; once your
SPRAS conda/mamba environment is activated and SPRAS is ``pip install``-ed,
you can install the HTCondor Snakemake executor with the following:

.. code:: bash

pip install git+https://github.com/htcondor/snakemake-executor-plugin-htcondor.git

Currently, this executor requires that all input to the workflow is
scoped to the current working directory. Therefore, you'll need to copy
the Snakefile and your input directory (as specified by
``example_config.yaml``) to this directory:

.. code:: bash

cp ../../Snakefile . && \
cp -r ../../input .

Instead of editing ``spras.sub`` to define the workflow, this scenario
requires editing the SPRAS profile in ``spras_profile/config.yaml``.
Make sure you specify the correct container, and change any other config
values needed by your workflow (defaults are fine in most cases).
3. Instead of editing ``spras.sub`` to define the workflow, this scenario
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This language does a good job with respect to my comment above about making it explicit which files are used where. It may still be useful to itemize that all in one place.

requires editing the SPRAS profile in ``htcondor/spras_profile/config.yaml``.
Make sure you specify the correct container, and change any other config
values needed by your workflow (defaults are fine in most cases).
4. Modify your SPRAS configuration file to set ``unpack_singularity: true`` and
``containers.framework: singularity``.

Then, to start the workflow with HTCondor in the CHTC pool, there are
two options:
Expand All @@ -132,42 +122,69 @@ The first option is to run Snakemake in a way that ties its execution to
your terminal. This is good for testing short workflows and running
short jobs. The downside is that closing your terminal causes the
process to exit, removing any unfinished jobs. To use this option,
invoke Snakemake directly by running:
invoke Snakemake directly from the repository root by running:

.. code:: bash

snakemake --profile spras_profile
snakemake --profile htcondor/spras_profile/

**Note**: Running the workflow in this way requires that your terminal
session stays active. Closing the terminal will suspend ongoing jobs, but
Snakemake will handle picking up where any previously-completed jobs left off
when you restart the workflow.

Long Running Snakemake Jobs (Managed by HTCondor)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The second option is to let HTCondor manage the Snakemake process, which
allows the jobs to run as long as needed. Instead of seeing Snakemake
output directly in your terminal, you'll be able to see it in a
specified log file. To use this option, make sure ``snakemake_long.py``
is executable (you can run ``chmod +x snakemake_long.py`` from the AP to
make sure it is), and then run:
specified log file. To use this option, run from the repository root:

::
.. code:: bash

./htcondor/snakemake_long.py --profile htcondor/spras_profile/

./snakemake_long.py --profile spras_profile --htcondor-jobdir <path/to/logging/directory>
A convenience script called ``run_htcondor.sh`` is also provided in the
repository root. You can execute this script by running:

.. code:: bash

When run in this mode, all log files for the workflow will be placed
into the path you provided for the logging directory. In particular,
Snakemake's outputs with job progress can be found split between
``<logdir>/snakemake-long.err`` and ``<logdir>/snakemake-long.out``.
./run_htcondor.sh

When executed in this mode, all log files for the workflow will be placed
into the logging directory (``htcondor/logs`` by default). In particular,
Snakemake's stdout/stderr outputs containing your workflow's progress can
be found split between ``htcondor/logs/snakemake.err`` and ``htcondor/logs/snakemake.out``.
These will also log each rule and what HTCondor job ID was submitted for
that rule (see the `troubleshooting section <#troubleshooting>`__ for
information on how to use these extra log files).

**Note**: While you're in the initial stages of developing/debugging your
workflow, it's very useful to invoke Snakemake with the ``--verbose`` flag.
This can be passed to Snakemake via the ``snakemake_long.py`` script by
adding it to the script's argument list, e.g.:

.. code:: bash

./htcondor/snakemake_long.py --profile htcondor/spras_profile/ --verbose

If you use mamba instead of conda for environment management, you can specify
this with the ``--env-manager`` flag:

.. code:: bash

./htcondor/snakemake_long.py --profile htcondor/spras_profile/ --env-manager mamba

Adjusting Resources
-------------------

Resource requirements can be adjusted as needed in
``spras_profile/config.yaml``, and HTCondor logs for this workflow can
be found in ``.snakemake/htcondor``. You can set a different log
directory by adding ``htcondor-jobdir: /path/to/dir`` to the profile's
configuration.
``htcondor/spras_profile/config.yaml``, and HTCondor logs for this workflow
can be found in your log directory. You can set a different log
directory by changing the configured ``htcondor-jobdir`` in the profile's
configuration. Alternatively, you can pass a different log directory
when invoking Snakemake with the ``--htcondor-jobdir`` argument.

To run this same workflow in the OSPool, add the following to the
profile's default-resources block:
Expand All @@ -178,11 +195,6 @@ profile's default-resources block:
requirements: |
'(HAS_SINGULARITY == True) && (Poolname =!= "CHTC")'

**Note**: This workflow requires that the terminal session responsible
for running snakemake stays active. Closing the terminal will suspend
jobs, but the workflow can use Snakemake's checkpointing to pick up any
jobs where they left off.

**Note**: If you encounter an error that says
``No module named 'spras'``, make sure you've ``pip install``-ed the
SPRAS module into your conda environment.
Expand All @@ -195,11 +207,10 @@ To monitor the state of the job, you can use a second terminal to run
``condor_watch_q`` for realtime updates.

Upon completion, the ``output`` directory from the workflow should be
returned as ``spras/docker-wrappers/SPRAS/output``, along with several
files containing the workflow's logging information (anything that
matches ``logs/spras_*`` and ending in ``.out``, ``.err``, or ``.log``).
If the job was unsuccessful, these files should contain useful debugging
clues about what may have gone wrong.
returned as ``output``, along with several files containing the workflow's
logging information (anything that matches ``htcondor/logs/spras_*`` and
ending in ``.out``, ``.err``, or ``.log``). If the job was unsuccessful,
these files should contain useful debugging clues about what may have gone wrong.

**Note**: If you want to run the workflow with a different version of
SPRAS, or one that contains development updates you've made, rebuild
Expand Down
Loading
Loading