Run

This repository contains the set up for running text processing jobs on Hopper using SLURM to reserve nodes and GNU parallel to assign file names to the nodes and enable resuming interrupted jobs.

This set up is best for when jobs are longer and need to be distributed across multiple nodes. If it's possible to run all jobs on a single node using parallel, this may be faster as it won't involve the overhead of establishing ssh connections.

Use

Clone this repository to ~/run:

cd
git clone git@github.com:textproclab/run.git

Install the clustershell Python module:

pip install clustershell

In the repository with your code, make a new SLURM script. For an example, see count-words.slurm in this repository.
In your code's documentation, add instructions directing users to this repository.
Submit the SLURM script:

sbatch count-words.slurm \
    /data/textproclab/corpora/Gutenberg/text-clean \
    /data/textproclab/tmp

In addition to the files explicitly created (e.g., the -words.txt files with word counts), this creates three files related to the job:

count-words-output.txt is the log file containing any output written to the standard output or error streams by the jobs.
count-words-jobs.txt is used by parallel to keep track of which of the sub-jobs (files being processed) completed successfully. If the job is interrupted, this can be used to re-run only those parts that weren't successfully completed. If you want to run a new job or re-run everything from scratch, delete this file.
count-words-hosts.txt is used by parallel to keep track of the nodes it's running on. It can be deleted when the job is finished, even if you want to resume later.

Acknowledgements

This set up is derived from that made for the University of Connecticut HPC by Pariksheet Nanda: https://github.uconn.edu/HPC/parallel-slurm

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE.md		LICENSE.md
README.md		README.md
count-words.slurm		count-words.slurm
parallel-opts.sh		parallel-opts.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Run

Use

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

License

textproclab/run

Folders and files

Latest commit

History

Repository files navigation

Run

Use

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages