Implementing a VVUQ campaign into the FabNESO plugin #43

dleggat · 2024-02-13T11:48:06Z

Resolves #5

Implementing a VVUQ campaign into the FabNESO plugin. The necessary inputs from the user are a conditions template, and a txt file containing the parameter substitutions and outputs to read in a config directory.

The FabNESO/run_vvuq_instance.sh script seems necessary because of the way that the VVUQ campaign attempts to run the NESO solver after encoding. There are two options for running the simulation within the VVUQ code, ExecuteLocal or ExecutePython. The former executes a bash command, whilst the latter executes a Python function.

I had initially assumed that the latter would be the most useful, since we already have Python functions for running NESO set up, but VVUQ gives no way of determining what run you're doing, or from what location in this case - the only piece of information from the campaign that you can access is the current parameter values.

On the other hand, ExecuteLocal runs a command in the current VVUQ run directory, so by calling a Python single_run_vvuq task from the bash script using this, we can do the location manipulation that we need for the decoder to extract the information that we need.

… done in a temp directory within the function

matt-graham

Thanks @dleggat for working on this.

The FabNESO/run_vvuq_instance.sh script seems necessary because of the way that the VVUQ campaign attempts to run the NESO solver after encoding. There are two options for running the simulation within the VVUQ code, ExecuteLocal or ExecutePython. The former executes a bash command, whilst the latter executes a Python function.

I had initially assumed that the latter would be the most useful, since we already have Python functions for running NESO set up, but VVUQ gives no way of determining what run you're doing, or from what location in this case - the only piece of information from the campaign that you can access is the current parameter values.

On the other hand, ExecuteLocal runs a command in the current VVUQ run directory, so by calling a Python single_run_vvuq task from the bash script using this, we can do the location manipulation that we need for the decoder to extract the information that we need.

Thanks for the explanation. What you've proposed makes sense I think if we are going the route of using the EasyVVUQ Campaign class to orchestrate dispatching the runs. However, I'm not sure given the constraints this seems to impose if that is the best solution here. In particular it looks like at the moment the neso tasks are dispatched sequentially and synchronously by default, even when running on a batch scheduler like on ARCHER2? Given there are no dependencies between the different runs ideally we'd like to be able to schedule these all to run in parallel (unlike the VBMC case where we needed to wait for each run to complete to decide on the parameters to use for the next run) as this would be a significant time saver when running on a system like ARCHER2. The nested calls to a FabSim task via the command-line interface within another FabSim task also create quite a verbose output as we have the FabSim output from both the outer and inner level tasks being woven together.

An alternative to using the whole EasyVVUQ campaign infrastructure is to just use the components we need to peform a polynomial chaos expansion (PCE) as we are doing here - specifically the PCESampler and PCEAnalysis classes. The former allows constructing sets of parameters to evaluate the model at and the latter can be used to process the model outputs to fit the coefficients of the PCE. If we use the parameter dictionaries outputted by the PCESampler to create a set of conditions file in a 'sweep' directory we can then use the FabSim run_ensemble function to dispatch jobs for each of these parameter sets. This means we can reuse the existing infrastructure we have for overriding parameters in the conditions files without needing to replicate this using a EasyVVUQ encoder, and then have a separate task which deals with performing the analysis step using the PCEAnalysis class once the ensemble of runs has completed and the results been fetched back.

I've had a go at implementing this alternative approach on a branch mmg/pce-task. This works both locally and on ARCHER2 and on the latter allows jobs to be submitted asynchronously and in parallel. Though it's a shame given all your efforts on this, given it seems like it would be quite a lot of work to get the approach here working, I'd lean towards closing this PR and opening a new PR with this alternative approach instead. Apologies about this, I should have looked in to this more previously and put more guidance in the issue.

matt-graham · 2024-02-14T10:30:16Z

FabNESO/tasks.py

+def _parse_vvuq_parameters(
+    vvuq_param_file: Path,
+) -> tuple[dict[Any, dict[str, float]], dict[str, Any], list[str]]:
+    # read the inpurt file


Suggested change

# read the inpurt file

# read the input file

matt-graham · 2024-02-14T10:32:54Z

FabNESO/tasks.py

+    with Path.open(vvuq_param_file) as f:
+        data = f.read()
+    d = literal_eval(data)


Looking at the example vvuq_parameters.txt file it looks like this could be assumed to be in JSON format? If so it would be better use the load function from json module in standard library (and also probably change the file extension of the example parameters file to vvuq_parameter.json).

matt-graham · 2024-02-14T11:18:54Z

FabNESO/tasks.py

+        vvuq_params[parameter] = {
+            "type": "float",
+            "default": param_values[0],
+        }
+        vary_length = 3
+        if len(param_values) == vary_length:
+            # Make the chaospy distribution. Possibly to be able to edit this later?
+            parameter_variations[parameter] = cp.Uniform(
+                param_values[1], param_values[2]
+            )


Suggested change

vvuq_params[parameter] = {

"type": "float",

"default": param_values[0],

}

vary_length = 3

if len(param_values) == vary_length:

# Make the chaospy distribution. Possibly to be able to edit this later?

parameter_variations[parameter] = cp.Uniform(

param_values[1], param_values[2]

)

default_value, lower_bound, upper_bound = param_values

vvuq_params[parameter] = {

"type": "float",

"default": default_value,

}

parameter_variations[parameter] = cp.Uniform(lower_bound, upper_bound)

Rather than indexing in to param_values and explicitly checking its length (and not doing anything if it isn't correct length), I would say unpacking it would be better as this (a) makes the code self documenting about what the different indices into param_values are expected to correspond to and (b) will raise an error if param_values is of an unexpected length.

matt-graham · 2024-02-15T09:45:00Z

FabNESO/tasks.py

+    # If we're running remotely, tell the submission to wait until done
+    if "archer2" in fab.env.remote:
+        fab.update_environment(
+            {
+                "job_dispatch": "cd /work/$project/$project/$username ; sbatch --wait",
+            }
+        )


Do we need this in this case as ideally we'd be able to send off batches of runs asynchronously here (as there is no sequential dependencies between jobs)?

matt-graham · 2024-02-15T09:52:34Z

FabNESO/tasks.py

+
+        # Copy the mesh and conditions files here for FabSIM to read it properly
+        shutil.copyfile(mesh_file, config_path / Path(mesh_file).name)
+        shutil.copyfile("conditions.xml", config_path / "conditions.xml")


This seems to assume the conditions.xml file is in the local working directory? Is it always the case that the EasyVVUQ encoder will output it there? And will overwrite an existing file with that name? And will the file be removed after the job has finished? If not it would probably be better to have this written to a temporary file / directory which is automatically cleaned up.

matt-graham · 2024-02-15T09:59:11Z

FabNESO/tasks.py

+        run_vvuq_sequential: Set to False to parallelise. Default is True because this
+            can cause issues if unchecked.


Are the issues if run_vvuq_sequential=False that if running on localhost we will end up with too many jobs trying to run in parallel? If so, this presumably would not be the case when running on a system with a scheduler like ARCHER2, where we would probably not want to submit jobs sequentially, so this need some clarification if so.

matt-graham · 2024-02-15T10:04:40Z

FabNESO/tasks.py

+    campaign = uq.Campaign(name=output_dir, params=params, actions=actions)
+
+    # Make and attach a sampler to the campaign
+    my_sampler = uq.sampling.PCESampler(vary)


Ideally we would probably want to be able to pass in polynomial_order as an argument to task as this controls fidelity of expansion and computational cost (number of parameter sets model is evaluated for will be (polynomial_order + 1)**n_parameter_varied where n_parameter_varied is the number of parameters varied.

matt-graham · 2024-02-15T17:32:29Z

FabNESO/tasks.py

+    )
+
+    # Make the VVUQ campaign
+    campaign = uq.Campaign(name=output_dir, params=params, actions=actions)


If output_dir does not exist we get a FileNotFoundError error at this point. Ideally we'd either automatically deal with creating the directory if necessary or document that this needs to be done before running the task.

matt-graham · 2024-02-15T17:41:04Z

FabNESO/tasks.py

+    campaign.set_sampler(my_sampler)
+
+    # Run VVUQ
+    campaign.execute(sequential=run_vvuq_sequential).collate()


Suggested change

campaign.execute(sequential=run_vvuq_sequential).collate()

campaign.execute(sequential=bool(run_vvuq_sequential)).collate()

I think this might be necessary to ensure a string input parsed from command-line options passed to task is converted to a bool

matt-graham · 2024-02-15T17:53:44Z

FabNESO/tasks.py

+
+        # Run an instance of NESO
+        neso(
+            temporary_config_dir,


Passing temporary_config_dir as the config argument to neso works when running on localhost but as its an absolute path on local filesystem, it doesn't work when running on a remote system, for example ARCHER2, with we getting permission denied errors when FabSim tries to create the config directory in the remote file system using a file path from the local file system.

Duncan Leggat added 4 commits February 12, 2024 18:28

Adding in VVUQ campaign functions

3401d5c

Merge branch 'main' into vvuq_dev

a2bec78

Altering vvuq run script to not include the file copying. This is now…

ea9848c

… done in a temp directory within the function

Adding a temporary config directory to the VVUQ run, and fixing typos

4daa701

dleggat requested a review from matt-graham February 13, 2024 11:52

matt-graham requested changes Feb 15, 2024

View reviewed changes

matt-graham mentioned this pull request Feb 15, 2024

Add polynomial chaos expansion ensemble + analysis tasks #44

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementing a VVUQ campaign into the FabNESO plugin #43

Implementing a VVUQ campaign into the FabNESO plugin #43

Uh oh!

dleggat commented Feb 13, 2024

Uh oh!

matt-graham left a comment

Uh oh!

matt-graham Feb 14, 2024

Uh oh!

matt-graham Feb 14, 2024

Uh oh!

matt-graham Feb 14, 2024

Uh oh!

matt-graham Feb 15, 2024

Uh oh!

matt-graham Feb 15, 2024

Uh oh!

matt-graham Feb 15, 2024

Uh oh!

matt-graham Feb 15, 2024

Uh oh!

matt-graham Feb 15, 2024

Uh oh!

matt-graham Feb 15, 2024

Uh oh!

matt-graham Feb 15, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		run_vvuq_sequential: Set to False to parallelise. Default is True because this
		can cause issues if unchecked.

	campaign.execute(sequential=run_vvuq_sequential).collate()
	campaign.execute(sequential=bool(run_vvuq_sequential)).collate()

Implementing a VVUQ campaign into the FabNESO plugin #43

Are you sure you want to change the base?

Implementing a VVUQ campaign into the FabNESO plugin #43

Uh oh!

Conversation

dleggat commented Feb 13, 2024

Uh oh!

matt-graham left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants