A different way of executing GEOS from SWELL#643
Conversation
There was a problem hiding this comment.
Pull Request Overview
A comprehensive refactoring of SWELL's GEOS execution approach, transitioning from Python subprocess-based execution to direct execution via Cylc with surgical modifications to GEOS configuration files.
Key changes include:
- Replacing subprocess-based GEOS execution with direct
gcm_run.jscript execution from Cylc - Modifying forecast directory structure to be experiment-level rather than cycle-level
- Adding new coupled GEOS tasks for improved DA workflow integration
Reviewed Changes
Copilot reviewed 29 out of 29 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/swell/utilities/slurm.py | Updates SLURM task handling for new RunGeos task |
| src/swell/utilities/geos.py | Enhanced RC file parsing, checkpoint handling, and error handling |
| src/swell/tasks/prep_coupled_geos_run_dir.py | New task for preparing coupled GEOS run directories with surgical edits |
| src/swell/tasks/base/task_base.py | Modified forecast directory structure to be experiment-level |
| src/swell/suites/3dfgat_coupled_cycle/ | New complete coupled cycling suite with direct GEOS execution |
| Various move/restart tasks | Updated for new directory structure and improved error handling |
Comments suppressed due to low confidence (1)
src/swell/tasks/move_erase_da_restart.py:1
- Inconsistent destination directory - this line uses 'FORECAST' while other MOM restart operations use 'RESTART'. This appears to be a typo.
# (C) Copyright 2021- United States Government as represented by the Administrator of the
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| ask_question: bool = True | ||
| prompt: str = ("Since this suite is non-cycling choose how " | ||
| "many hours the workflow can run ahead?") | ||
| prompt: str = ("How many additional cycles can be run" |
There was a problem hiding this comment.
Missing space before parenthesis in prompt text.
| prompt: str = ("How many additional cycles can be run" | |
| prompt: str = ("How many additional cycles can be run " |
| number_of_states = int(isodate.parse_duration(window_length) | ||
| / isodate.parse_duration(background_frequency)) + 1 | ||
| self.logger.info('Number of states: ', str(number_of_states-1)) | ||
| self.logger.info('Number of states: ' + str(number_of_states-1)) |
There was a problem hiding this comment.
[nitpick] String concatenation should use f-string formatting for consistency with modern Python practices.
| self.logger.info('Number of states: ' + str(number_of_states-1)) | |
| self.logger.info(f'Number of states: {number_of_states-1}') |
|
|
||
| with open(rcfile, "w") as f: | ||
| yaml.dump(rcdict, f, default_flow_style=False, sort_keys=False) | ||
| self.geos.write_rc(rcdict, rcfile) |
There was a problem hiding this comment.
The method now uses self.geos.write_rc() instead of yaml.dump(), but this change may affect the output format. Ensure the new method maintains backward compatibility with existing RC file consumers.
| # be very model specific | ||
| req_files = ['AGCM.rc', 'CAP.rc', 'data_table', 'diag_table', 'fvcore_layout.rc', | ||
| 'gcm_emip.setup', 'gcm_run.j', 'HISTORY.rc', '__init__.py', 'input.nml', | ||
| 'logging.yaml' , 'ice_in', 'MOM_input', 'MOM_override'] |
There was a problem hiding this comment.
Extra space before comma in list item.
| 'logging.yaml' , 'ice_in', 'MOM_input', 'MOM_override'] | |
| 'logging.yaml', 'ice_in', 'MOM_input', 'MOM_override'] |
|
|
||
| [[RunGeosExecutable]] | ||
| script = "swell task RunGeosExecutable $config -d $datetime" | ||
| script = "{{experiment_path}}/forecast/gcm_run.j" |
There was a problem hiding this comment.
[nitpick] Direct script execution bypasses error handling that would be provided by the SWELL task wrapper. Consider adding error checking or validation mechanisms.
| script = "{{experiment_path}}/forecast/gcm_run.j" | |
| script = "swell task RunGeosExecutable $config -d $datetime" |
|
|
||
| # Launch the job monitor | ||
| self.logger.critical('Launching the TUI, press \'q\' at any time to exit the TUI') | ||
| self.logger.critical('Press any key to launch the TUI. To exit TUI, press \'q\' at any time.') |
There was a problem hiding this comment.
Missing space after period in the sentence.
|
Looks like the default |
|
Just a reminder that we also talked about including time series of the mean observation error, which would be useful for tracking changes in the altimetry processing. |
|
Hi, it seems that some commits of the PR address changes that are not part the actual feature proposed. would it be possible to do a cleanup (or recreate a new branch with just the correct cherry picked commits) so it's easier to review. Thanks a lot! |
|
see #677 |
To address #625 I worked on a different way of executing GEOSgcm in SWELL. I had to make a lot of changes but i will try to explain the essential part of this PR.
Let's consider
gcm_run.jin 4 stages:In the current implementation, SWELL handles 2 & 3 via python and
subprocessand 1 is assumed to be set properly by the user, which caused trouble with the NCCS. For DA purposes 4, postprocessing is explicitly handled by SWELL but that is not the focus of this PR.In this proposed implementation, the main difference is that we rely on
gcm_run.jfor 2 and 3 by conducting surgical edits viaPrepCoupledGeosRundirat few locations and runninggcm_run.jdirectly from Cylc (which doesn't capture failed exit status):I created the
3dfgat_coupled_cyclesuite for testing, should work by default if anyone has time to check it out.Sorry I'm tagging a lot of people but I would like to get some feedback on this, but not sure if my explanation was sufficient here. Otherwise i can explain it during the next Swell Development Tag-up.
Also it has the idea of model-differentiated tasks (inspired by the Skylab workflow and @mranst has a PR for it already), so perhaps having
tasks/model/might be better way of organizing the model specific tasks like he proposed.