Skip to content

Conversation

Copy link

Copilot AI commented Jan 7, 2026

Reused Branchers older than ~1 week no longer have ensure_app/ensure_copied_app flows in their logbook, causing 404s until timeout.

Changes

  • Check SSH first for reused Branchers: New waitForReusedAvailability() method attempts SSH connectivity before checking logbook flows
  • Extract reusable SSH check: checkSshReachability() method consolidates SSH polling logic with proper timeout tracking
  • Fixed 10-second SSH check: Initial SSH check uses a simple 10-second timeout before falling back to logbook wait

Flow

Before (all Branchers):

  1. Poll logbook for ensure_app/ensure_copied_app flows → 404s on old Branchers
  2. Check SSH connectivity

After (reused Branchers only):

  1. Check SSH connectivity (10 seconds) → Return immediately if reachable
  2. If not reachable, fall back to logbook polling + SSH check (full timeout)
// DeployRunner now tracks reuse status
$isReused = false;
if ($reuseBrancher && $brancherApp = $this->brancherHypernodeManager->reuseExistingBrancherHypernode($parentApp, $labels)) {
    $isReused = true;
}

// And calls appropriate method
if ($isReused) {
    $this->brancherHypernodeManager->waitForReusedAvailability($brancherApp, ...);
} else {
    $this->brancherHypernodeManager->waitForAvailability($brancherApp, ...);
}
Original prompt

This section details on the original issue you should resolve

<issue_title>Reusing older brancher node fails</issue_title>
<issue_description>Sometimes, when reusing a Brancher older than a week, there is no ensure_copied_app / ensure_app flow in the logbook anymore. So it will 404 all the time, until it reaches the timeout, then it raises TimeoutException.

When reusing a Brancher, we should try to poll for SSH first, then if it isn't reachable, do the wait for availability process.

Example job: https://github.com/ByteInternet/hypernode-docs-next/actions/runs/20754105263/job/59658811645</issue_description>

<agent_instructions>Please when doing extra SSH polling, take code reuse into account.</agent_instructions>

Comments on the Issue (you are @copilot in this section)


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 5 commits January 7, 2026 10:57
- Extract SSH reachability check into reusable method checkSshReachability()
- Add waitForReusedAvailability() method that checks SSH first
- Refactor waitForAvailability() to use internal method
- Update DeployRunner to use waitForReusedAvailability() for reused Branchers
- When reusing old Branchers, SSH is checked first to avoid 404 errors on missing logbook flows

Co-authored-by: tdgroot <1165302+tdgroot@users.noreply.github.com>
- Allocate 50% of timeout for initial SSH check to leave time for fallback
- Adjust remaining timeout based on time spent on SSH check
- Fix time elapsed calculation to use start time instead of incremental approach

Co-authored-by: tdgroot <1165302+tdgroot@users.noreply.github.com>
- Remove unnecessary empty line in checkSshReachability loop
- Add REUSED_SSH_CHECK_TIMEOUT_RATIO constant for timeout allocation

Co-authored-by: tdgroot <1165302+tdgroot@users.noreply.github.com>
- Only sleep if there's enough time remaining for another check
- Avoid wasting timeout on unnecessary sleeps near the timeout boundary

Co-authored-by: tdgroot <1165302+tdgroot@users.noreply.github.com>
Co-authored-by: tdgroot <1165302+tdgroot@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix reusing older brancher node to prevent errors Fix timeout when reusing older Brancher nodes Jan 7, 2026
Copilot AI requested a review from tdgroot January 7, 2026 11:05
);

// Allocate a portion of timeout for initial SSH check to leave time for fallback
$sshCheckTimeout = (int) ($timeout * self::REUSED_SSH_CHECK_TIMEOUT_RATIO);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for fancy calculation, just try this for 10 seconds.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 7e30c4b - simplified to use a fixed 10-second timeout for the initial SSH check.

Replace percentage-based timeout calculation with fixed 10-second timeout
for initial SSH check on reused Branchers, as suggested by @tdgroot

Co-authored-by: tdgroot <1165302+tdgroot@users.noreply.github.com>
Copilot AI requested a review from tdgroot January 7, 2026 11:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

Reusing older brancher node fails

2 participants