Refactor configuration handling on system initialization#80
Open
Refactor configuration handling on system initialization#80
Conversation
This change aligns the interface for initialization better with utils.utils.Config
config_facade.ConfigFacade was not a "Facade" that facilitates access to some subcomponents, but rather functions as a "Adapter" that allows the non-refactored code to interact with the refactored configuration. Accordingly it was renamed to config_adapter.Config and all references to a "Facade" in the documentation/ test function names where updated as well.
clessig
reviewed
Jan 27, 2025
| @@ -0,0 +1,4 @@ | |||
| { | |||
| "input_data": "/p/scratch/atmo-rep/data/era5_1deg/months/", | |||
Owner
There was a problem hiding this comment.
The paths here should only be base paths, e.g. /p/scratch/atmo-rep/. The rest should be consistent across infras.
| def train_continue( wandb_id, epoch, Trainer, epoch_continue = -1) : | ||
|
|
||
| def initialize_atmorep(with_ddp): | ||
| atmorep_project_dir = Path(os.environ["SLURM_SUBMIT_DIR"]) |
Owner
There was a problem hiding this comment.
The ddp init should be in a separate file from the path setup.
|
|
||
| dates = args['dates'] | ||
| evaluator = Evaluator.load( cf, model_id, model_epoch, devices) | ||
| print("after loading:", evaluator.cf.user_config) |
Owner
There was a problem hiding this comment.
That's probably a debug comment?
| except : | ||
| if __name__ == "__main__": | ||
| train_fresh = False |
Owner
There was a problem hiding this comment.
It's not pretty how it was before, admittedly, but have an additional variable doesn't help either.
|
|
||
| if 0 == cf.par_rank : | ||
| directory = Path( config.path_results, 'id{}'.format( cf.wandb_id)) | ||
| directory = self.user_config.results / f"id{cf.wandb_id}" |
Owner
There was a problem hiding this comment.
Consistent usage of quotation marks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This merge request addresses most of the pain points with the configuration raised in issue #62.
A new configuration backend (
utils.config) has been introduced consisting of a hierarchy of dataclasses. The purpose of this backend is to document all available configuration options, allow comfortable access (especially to nested parameter lists such as"fields"), and provide serialization/deserialization to/from the already usedmodel_id<wandb_id>.jsonfiles to track configuration.The new configuration backend is designed to to replace
utils.utils.Config. To maintain compatibility during this migrationutils.config_adapter.Configis designed as a drop-in replacement supporting all the same semantics asutils.utils.Config(except assigning unknown configuration options), while already taking advantage of the new backend. This enables a bunch of tools for exploring and refactoring the code base that rely on static code analysis.Information about important file paths such as input data, pretrained models and experiment output was very hardwired throughout the system. It is now more flexible to handle:
config.confignow includes the dataclasses andPlatformConfigandUserConfig.PlatformConfigincludes the path to the input data as well as a location for pretrained models. These platform unique parameters have to be provided by a simple json file for each platform.UserConfigspecifies all output pathes. Under the assumption that atmorep will be run on a HPC platform using slurm, the output directories will be subdirectories of$SLURM_SUBMIT_DIR.Initialization was streamlined in
core.evaluator,core.train,core.train_multi:core.evaluator,core.train,core.train_multiwas consolidated into thecore.train.initialize_atmorepfunction. Initialization includes:UserConfigfrom$SLURM_SUBMIT_DIRutils.utils.init_torch,utils.utils.setup_ddputils.config_adapter.Configobjectdevices,par_rank,par_size,configFor a summary of the changes:
config.configfrom the branch grasse-62-refactor_path_configutils.config_adapter.Configclass, as well as including small fixes for (missing options etc.) for the configuration backend.core.evaluator,core.train,core.train_multirelated to the handling of renamed options (now handled by the configuration backend) and initialization (now handled centrally bycore.train.initialize_atmorep)