Skip to content

Conversation

@brandonjnelsonFDA
Copy link
Member

Detects stalled parallel simulations by checking if log files have not been updated for 8 hours. Marks them as errors to prevent the progress monitoring loop from hanging indefinitely. Implements robust ID matching to handle varying zero-padding in case IDs.

google-labs-jules bot and others added 3 commits December 24, 2025 20:29
Detects stalled parallel simulations by checking if log files have not been updated for 8 hours. Marks them as errors to prevent the progress monitoring loop from hanging indefinitely. Implements robust ID matching to handle varying zero-padding in case IDs.
Implements an 8-hour inactivity timeout for parallel simulation logs to detect stalled jobs. All simulation errors (including timeouts) are now logged to 'study_errors.log' in the study root directory to facilitate debugging. Prevents infinite stalling of the progress monitor loop.
@brandonjnelsonFDA brandonjnelsonFDA merged commit 5883137 into DIDSR:master Dec 25, 2025
1 check passed
@brandonjnelsonFDA brandonjnelsonFDA deleted the monitor-timeout-1442522189065789345 branch December 25, 2025 23:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant