Skip to content

A different way of executing GEOS from SWELL#643

Closed
Dooruk wants to merge 17 commits intodevelopfrom
origin/feature/exec_geos_direct_develop
Closed

A different way of executing GEOS from SWELL#643
Dooruk wants to merge 17 commits intodevelopfrom
origin/feature/exec_geos_direct_develop

Conversation

@Dooruk
Copy link
Collaborator

@Dooruk Dooruk commented Oct 17, 2025

To address #625 I worked on a different way of executing GEOSgcm in SWELL. I had to make a lot of changes but i will try to explain the essential part of this PR.

Let's consider gcm_run.j in 4 stages:

  1. SLURM & node assignment
  2. Preprocessing
  3. Execution
  4. Postprocessing

In the current implementation, SWELL handles 2 & 3 via python and subprocess and 1 is assumed to be set properly by the user, which caused trouble with the NCCS. For DA purposes 4, postprocessing is explicitly handled by SWELL but that is not the focus of this PR.

In this proposed implementation, the main difference is that we rely on gcm_run.j for 2 and 3 by conducting surgical edits via PrepCoupledGeosRundir at few locations and running gcm_run.j directly from Cylc (which doesn't capture failed exit status):

    [[RunGeos]]
        script = "{{experiment_path}}/forecast/gcm_run.j"
        platform = {{platform}}
        [[[job]]]
            shell = /bin/csh
        [[[directives]]]
        {%- for key, value in scheduling["RunGeos"]["directives"]["all"].items() %}
            --{{key}} = {{value}}
        {%- endfor %}

I created the 3dfgat_coupled_cycle suite for testing, should work by default if anyone has time to check it out.

Sorry I'm tagging a lot of people but I would like to get some feedback on this, but not sure if my explanation was sufficient here. Otherwise i can explain it during the next Swell Development Tag-up.

Also it has the idea of model-differentiated tasks (inspired by the Skylab workflow and @mranst has a PR for it already), so perhaps having tasks/model/ might be better way of organizing the model specific tasks like he proposed.

@Dooruk Dooruk requested review from DavidSRussell, Copilot, jeromebarre, mer-a-o, mranst, rtodling and vrx- and removed request for rtodling October 17, 2025 20:44
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

A comprehensive refactoring of SWELL's GEOS execution approach, transitioning from Python subprocess-based execution to direct execution via Cylc with surgical modifications to GEOS configuration files.

Key changes include:

  • Replacing subprocess-based GEOS execution with direct gcm_run.j script execution from Cylc
  • Modifying forecast directory structure to be experiment-level rather than cycle-level
  • Adding new coupled GEOS tasks for improved DA workflow integration

Reviewed Changes

Copilot reviewed 29 out of 29 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/swell/utilities/slurm.py Updates SLURM task handling for new RunGeos task
src/swell/utilities/geos.py Enhanced RC file parsing, checkpoint handling, and error handling
src/swell/tasks/prep_coupled_geos_run_dir.py New task for preparing coupled GEOS run directories with surgical edits
src/swell/tasks/base/task_base.py Modified forecast directory structure to be experiment-level
src/swell/suites/3dfgat_coupled_cycle/ New complete coupled cycling suite with direct GEOS execution
Various move/restart tasks Updated for new directory structure and improved error handling
Comments suppressed due to low confidence (1)

src/swell/tasks/move_erase_da_restart.py:1

  • Inconsistent destination directory - this line uses 'FORECAST' while other MOM restart operations use 'RESTART'. This appears to be a typo.
# (C) Copyright 2021- United States Government as represented by the Administrator of the

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

ask_question: bool = True
prompt: str = ("Since this suite is non-cycling choose how "
"many hours the workflow can run ahead?")
prompt: str = ("How many additional cycles can be run"
Copy link

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing space before parenthesis in prompt text.

Suggested change
prompt: str = ("How many additional cycles can be run"
prompt: str = ("How many additional cycles can be run "

Copilot uses AI. Check for mistakes.
number_of_states = int(isodate.parse_duration(window_length)
/ isodate.parse_duration(background_frequency)) + 1
self.logger.info('Number of states: ', str(number_of_states-1))
self.logger.info('Number of states: ' + str(number_of_states-1))
Copy link

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] String concatenation should use f-string formatting for consistency with modern Python practices.

Suggested change
self.logger.info('Number of states: ' + str(number_of_states-1))
self.logger.info(f'Number of states: {number_of_states-1}')

Copilot uses AI. Check for mistakes.

with open(rcfile, "w") as f:
yaml.dump(rcdict, f, default_flow_style=False, sort_keys=False)
self.geos.write_rc(rcdict, rcfile)
Copy link

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method now uses self.geos.write_rc() instead of yaml.dump(), but this change may affect the output format. Ensure the new method maintains backward compatibility with existing RC file consumers.

Copilot uses AI. Check for mistakes.
# be very model specific
req_files = ['AGCM.rc', 'CAP.rc', 'data_table', 'diag_table', 'fvcore_layout.rc',
'gcm_emip.setup', 'gcm_run.j', 'HISTORY.rc', '__init__.py', 'input.nml',
'logging.yaml' , 'ice_in', 'MOM_input', 'MOM_override']
Copy link

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra space before comma in list item.

Suggested change
'logging.yaml' , 'ice_in', 'MOM_input', 'MOM_override']
'logging.yaml', 'ice_in', 'MOM_input', 'MOM_override']

Copilot uses AI. Check for mistakes.

[[RunGeosExecutable]]
script = "swell task RunGeosExecutable $config -d $datetime"
script = "{{experiment_path}}/forecast/gcm_run.j"
Copy link

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Direct script execution bypasses error handling that would be provided by the SWELL task wrapper. Consider adding error checking or validation mechanisms.

Suggested change
script = "{{experiment_path}}/forecast/gcm_run.j"
script = "swell task RunGeosExecutable $config -d $datetime"

Copilot uses AI. Check for mistakes.

# Launch the job monitor
self.logger.critical('Launching the TUI, press \'q\' at any time to exit the TUI')
self.logger.critical('Press any key to launch the TUI. To exit TUI, press \'q\' at any time.')
Copy link

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing space after period in the sentence.

Copilot uses AI. Check for mistakes.
@Dooruk Dooruk changed the title A proposed way of executing GEOS from SWELL A different way of executing GEOS from SWELL Oct 17, 2025
@DavidSRussell
Copy link
Contributor

Looks like the default 3dfgat_coupled_cycle ran fine for me. The only obvious differences I see are the location of the forecast directory and the time stamp on cycle_done. Not sure how well I understand the pros and cons of the new approach; maybe we can go over it some time?

@DavidSRussell
Copy link
Contributor

Just a reminder that we also talked about including time series of the mean observation error, which would be useful for tracking changes in the altimetry processing.

@jeromebarre
Copy link

Hi, it seems that some commits of the PR address changes that are not part the actual feature proposed.
For example:
7caae0c

would it be possible to do a cleanup (or recreate a new branch with just the correct cherry picked commits) so it's easier to review.

Thanks a lot!

@Dooruk
Copy link
Collaborator Author

Dooruk commented Dec 15, 2025

see #677

@Dooruk Dooruk closed this Dec 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments