Skip to content

Comments

Update the SLURM script for the new Integration cluster environment#67

Open
zhengtaocui wants to merge 9 commits intongwpc-candidatefrom
SFINCS_Hawaii_setup_updates
Open

Update the SLURM script for the new Integration cluster environment#67
zhengtaocui wants to merge 9 commits intongwpc-candidatefrom
SFINCS_Hawaii_setup_updates

Conversation

@zhengtaocui
Copy link

The SLURM script, sing_run.bash, is updated to adapt the new environment on the Integration cluster.

@zhengtaocui zhengtaocui requested a review from cheginit January 29, 2026 21:06
@zhengtaocui zhengtaocui self-assigned this Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RAW_DOWNLOAD_DIR is not defined so this command fails.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is defined in the config file, schism_calib.cfg.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. In the version that I had and was using to test this PR, it was not defined. Please add a comment before sourcing the config file about the expected vars to avoid such issues.

Copy link
Author

@zhengtaocui zhengtaocui Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That configure file is created by the SFINCS config and download script. It should not be here. It causes confusions. I deleted it from the repo.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fails because FCST_LENGTH_HRS is set as a float in the config file, and bash arithmetic only supports integers. You either need to use export FCST_LENGTH_HRS=3 in config file, or bc like so end_itime=$(echo "$start_itime + $FCST_LENGTH_HRS * 3600 + 3600" | bc | cut -d. -f1)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not float. This is my config for example.
export STARTPDY=20120611
export STARTCYC=00
export FCST_LENGTH_HRS=3
export HOT_START_FILE=''
export USE_TPXO="NO"
export COASTAL_DOMAIN=hawaii
export METEO_SOURCE=NWM_RETRO
export COASTAL_WORK_DIR=/ngen-test/coastal/schism_config_test_hawaii_stofs_nwm_retro/schism_2012-06-11T00-00-00Z
export RAW_DOWNLOAD_DIR=/ngen-test/coastal/schism_config_test_hawaii_stofs_nwm_retro/data/raw

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the version that I was testing this PR with, it was a float. So, to avoid this issue it's better to use bc.

Copy link

@cheginit cheginit Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even in the branch that you submitted this PR, the config file located here has the same issues (No raw directory and float hours). Please modify the config file so it can be a part of this PR. Also, COASTAL_WORK_DIR needs to be user specific to avoid write permission issue. If I run this, I will get a permission error. So, it needs to be something like this: export COASTAL_WORK_DIR=/ngen-test/coastal/${USER}/schism_config_test_hawaii_stofs_nwm_retro/schism_2012-06-11T00-00-00Z.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

STOFS is not available prior to 2020-12-30 as it's noted here. So, I don't think this config will work.

Copy link
Author

@zhengtaocui zhengtaocui Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. This file doesn't work. You need to run the SFINCS config and download script first to create all the download files and the config file. This config file doesn't belong here. I removed it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still need a config file that I can test the script and make sure it runs.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as before with the bash arithmetic, this needs to use bc too: itime=$(echo "${LENGTH_HRS} * 3600 + $start_timestamp" | bc | cut -d. -f1)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my response above.

#SBATCH --ntasks-per-node=18 #numebr of cores per node
#SBATCH --exclusive

export NODES=2 #this must match the number of nodes defined above by slurm
export NCORES=18 #this must match the number of cores per node defined above by slurm
export NPROCS=$((NODES*NCORES))

set -x
set -euox pipefail

#load the configuration file
. ./schism_calib.cfg
Copy link

@cheginit cheginit Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add this check for env vars to ensure all the required vars are set:

# Check string variables
for var in STARTPDY STARTCYC COASTAL_DOMAIN METEO_SOURCE COASTAL_WORK_DIR RAW_DOWNLOAD_DIR; do
    if [[ -z "${!var}" ]]; then
        echo "ERROR: $var is not defined in config file"
        exit 1
    fi
done

# Check numeric variables
for var in FCST_LENGTH_HRS; do
    if [[ -z "${!var}" ]] || ! [[ "${!var}" =~ ^[0-9]+$ ]]; then
        echo "ERROR: $var must be a positive integer"
        exit 1
    fi
done

# Check YES/NO variables
if [[ "${USE_TPXO}" != "YES" ]] && [[ "${USE_TPXO}" != "NO" ]]; then
    echo "ERROR: USE_TPXO must be YES or NO"
    exit 1
fi

# Check that HOT_START_FILE is defined (but allow empty string)
if [[ ! -v HOT_START_FILE ]]; then
    echo "ERROR: HOT_START_FILE must be defined (can be empty string '')"
    exit 1
fi

# Validate HOT_START_FILE exists if provided
if [[ -n "${HOT_START_FILE}" ]] && [[ ! -f "${HOT_START_FILE}" ]]; then
    echo "ERROR: HOT_START_FILE specified but file does not exist: ${HOT_START_FILE}"
    exit 1
fi

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed the changes. Thanks for the comments.

@cheginit
Copy link

cheginit commented Feb 5, 2026

Thanks for addressing my comments. This looks good to me.

@zhengtaocui zhengtaocui changed the base branch from development to ngwpc-candidate February 10, 2026 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants