Skip to content

Avoid data race condition and formalize preprocess.sh #108

@rajeeja

Description

@rajeeja

Current singularity HPO workflow calls train.sh, that downloads the data and sometimes when the data isn't present all processes involved in HPO download the data resulting into undesirable data race condition.

Possible Fix:
HPO spuns two jobs:

  1. preprocess.sh
    • this runs on one processor or as the model data download in preprocess.sh is setup
    • with args to get and format data as per requirements
  2. launch the HPO job
    • as per the procs and run configuration specified for running the HPO workflow

Note: Design should consider that: this infrastructure will be useful for the cross-study workflows,
where we run multiple HPO for each differing in the preprocess.sh inputs. cross-study workflow also has to run infer.sh on all of the combinations.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions