Skip to content

Fix API contract validation workflow backend startup race condition#3

Merged
cliff-de-tech merged 3 commits intomainfrom
copilot/update-api-contracts-workflow
Jan 26, 2026
Merged

Fix API contract validation workflow backend startup race condition#3
cliff-de-tech merged 3 commits intomainfrom
copilot/update-api-contracts-workflow

Conversation

Copy link
Contributor

Copilot AI commented Jan 26, 2026

The API contract validation workflow fails with ECONNREFUSED ::1:8000 because the FastAPI backend isn't ready when the health check runs. No logs captured to debug startup failures or crashes.

Changes

  • Capture backend logs - Redirect uvicorn output to ${github.workspace}/backend.log for post-mortem debugging
  • Extend startup grace period - 15s → 30s sleep before health check
  • Validate process liveness - Check backend PID still exists before attempting health check
  • Increase retry budget - 5 retries @ 2s → 10 retries @ 3s delays
  • Fail with context - Print backend logs when process dies or health check fails

Implementation

- name: Start FastAPI backend
  run: |
    cd backend
    BACKEND_LOG="${{ github.workspace }}/backend.log"
    
    # Start with logging
    python -m uvicorn app:app --host 0.0.0.0 --port 8000 > "$BACKEND_LOG" 2>&1 &
    BACKEND_PID=$!
    
    sleep 30
    
    # Detect crashes early
    if ! ps -p $BACKEND_PID > /dev/null; then
      echo "❌ Backend process died! Logs:"
      cat "$BACKEND_LOG"
      exit 1
    fi
    
    # Health check with retries
    curl --retry 10 --retry-delay 3 --retry-connrefused http://localhost:8000/health || {
      echo "❌ Backend health check failed! Logs:"
      cat "$BACKEND_LOG"
      exit 1
    }

This addresses both timing issues (insufficient startup time) and observability gaps (silent failures).

Original prompt

Problem

The API Contract Validation workflow is failing with connection refused errors when trying to connect to the FastAPI backend at http://localhost:8000. The job logs show:

❌ Error: connect ECONNREFUSED ::1:8000
💡 Make sure the backend is running: cd backend && python app.py

Job URL: https://github.com/cliff-de-tech/Post-Bot/actions/runs/21345140930/job/61431194379

Root Cause

The backend is started in the background but either:

  1. Takes longer than 15 seconds to fully start
  2. Crashes silently without proper logging
  3. May have dependency or configuration issues

Required Changes

Update .github/workflows/api-contracts.yml to fix the backend startup issues:

  1. Add backend logging - Redirect uvicorn output to a log file so we can see what's happening
  2. Increase wait time - Change from 15 to 30 seconds to give the backend more time to start
  3. Add retry logic - Increase curl retries from 5 to 10 with longer delays
  4. Add better error handling - Print backend logs if the health check fails
  5. Verify backend is actually running - Check the process is still alive before proceeding

Specific changes needed in the "Start FastAPI backend" step (lines 74-90):

- name: Start FastAPI backend
  run: |
    cd backend
    # Start backend with logging
    python -m uvicorn app:app --host 0.0.0.0 --port 8000 > ../backend.log 2>&1 &
    BACKEND_PID=$!
    echo "Backend started with PID: $BACKEND_PID"
    
    # Wait for backend to be ready
    echo "Waiting for backend to start..."
    sleep 30
    
    # Check if process is still running
    if ! ps -p $BACKEND_PID > /dev/null; then
      echo "❌ Backend process died! Logs:"
      cat ../backend.log
      exit 1
    fi
    
    # Verify backend is responding with more retries
    echo "Checking backend health..."
    curl --retry 10 --retry-delay 3 --retry-connrefused http://localhost:8000/health || {
      echo "❌ Backend health check failed! Logs:"
      cat ../backend.log
      exit 1
    }
    echo "✅ Backend is ready!"
  env:
    LINKEDIN_CLIENT_ID: test_client_id
    LINKEDIN_CLIENT_SECRET: test_client_secret
    GROQ_API_KEY: test_groq_key
    DATABASE_URL: sqlite+aiosqlite:///./test.db
    ENCRYPTION_KEY: Ag45Scx9q_Q6w3xF8Lz5j2p7n9v0k1m3b5v7c9x1z3m=
    CLERK_ISSUER: https://test-clerk.accounts.dev

This will provide better debugging information and increase the reliability of the backend startup process.

This pull request was created from Copilot chat.


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 2 commits January 26, 2026 03:25
- Add backend logging to backend.log for debugging
- Increase wait time from 15 to 30 seconds
- Add process validation to check if backend is still running
- Increase curl retries from 5 to 10 with 3-second delays
- Add error handling to print backend logs on failure

Co-authored-by: cliff-de-tech <137389025+cliff-de-tech@users.noreply.github.com>
- Define BACKEND_LOG variable to avoid duplication
- Use absolute path with github.workspace for better reliability
- Quote variable references for safety

Co-authored-by: cliff-de-tech <137389025+cliff-de-tech@users.noreply.github.com>
Copilot AI changed the title [WIP] Update API contract validation workflow for backend startup issues Fix API contract validation workflow backend startup race condition Jan 26, 2026
Copilot AI requested a review from cliff-de-tech January 26, 2026 03:28
@cliff-de-tech cliff-de-tech marked this pull request as ready for review January 26, 2026 03:34
Copilot AI review requested due to automatic review settings January 26, 2026 03:34
@cliff-de-tech cliff-de-tech merged commit 84e13ae into main Jan 26, 2026
7 of 8 checks passed
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a race condition in the API Contract Validation workflow where the FastAPI backend wasn't ready when health checks ran, causing ECONNREFUSED errors. The changes improve startup reliability and debugging capabilities.

Changes:

  • Capture backend logs to a file for post-mortem debugging
  • Increase startup grace period from 15s to 30s
  • Add process liveness check before health check attempt
  • Increase curl retry budget from 5 to 10 attempts with 3-second delays
  • Print backend logs when process dies or health check fails

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


# Verify backend is responding with more retries
echo "Checking backend health..."
curl --retry 10 --retry-delay 3 --retry-connrefused http://localhost:8000/health || {
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The curl command is missing the --fail flag. By default, curl returns success (exit code 0) even when receiving HTTP error codes like 404 or 500. Add --fail to ensure that only 2xx HTTP responses are treated as successful. Without this flag, the health check could pass even if the backend is returning error responses, masking startup issues.

Suggested change
curl --retry 10 --retry-delay 3 --retry-connrefused http://localhost:8000/health || {
curl --fail --retry 10 --retry-delay 3 --retry-connrefused http://localhost:8000/health || {

Copilot uses AI. Check for mistakes.

# Verify backend is responding with more retries
echo "Checking backend health..."
curl --retry 10 --retry-delay 3 --retry-connrefused http://localhost:8000/health || {
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The curl command uses 'localhost' which can resolve to either IPv4 (127.0.0.1) or IPv6 (::1). The original error message showed connection refused to ::1:8000 (IPv6). While the backend binds to 0.0.0.0, consider explicitly using http://127.0.0.1:8000/health instead of localhost to avoid any IPv6 resolution issues and ensure consistent IPv4 connection attempts.

Suggested change
curl --retry 10 --retry-delay 3 --retry-connrefused http://localhost:8000/health || {
curl --retry 10 --retry-delay 3 --retry-connrefused http://127.0.0.1:8000/health || {

Copilot uses AI. Check for mistakes.

# Verify backend is responding with more retries
echo "Checking backend health..."
curl --retry 10 --retry-delay 3 --retry-connrefused http://localhost:8000/health || {
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding the --max-time flag to curl to prevent indefinite hangs if the backend responds but doesn't complete the request. For example: curl --retry 10 --retry-delay 3 --retry-connrefused --max-time 5 http://localhost:8000/health. This ensures each individual attempt times out after 5 seconds, preventing the health check from hanging indefinitely on a partially responsive backend.

Suggested change
curl --retry 10 --retry-delay 3 --retry-connrefused http://localhost:8000/health || {
curl --retry 10 --retry-delay 3 --retry-connrefused --max-time 5 http://localhost:8000/health || {

Copilot uses AI. Check for mistakes.
Comment on lines 74 to +102
- name: Start FastAPI backend
run: |
cd backend
python -m uvicorn app:app --host 0.0.0.0 --port 8000 &
BACKEND_LOG="${{ github.workspace }}/backend.log"

# Start backend with logging
python -m uvicorn app:app --host 0.0.0.0 --port 8000 > "$BACKEND_LOG" 2>&1 &
BACKEND_PID=$!
echo "Backend started with PID: $BACKEND_PID"

# Wait for backend to be ready
echo "Waiting for backend to start..."
sleep 15
# Verify backend is responding
curl --retry 5 --retry-delay 2 --retry-connrefused http://localhost:8000/health || exit 1
echo "Backend is ready!"
sleep 30

# Check if process is still running
if ! ps -p $BACKEND_PID > /dev/null; then
echo "❌ Backend process died! Logs:"
cat "$BACKEND_LOG"
exit 1
fi

# Verify backend is responding with more retries
echo "Checking backend health..."
curl --retry 10 --retry-delay 3 --retry-connrefused http://localhost:8000/health || {
echo "❌ Backend health check failed! Logs:"
cat "$BACKEND_LOG"
exit 1
}
echo "✅ Backend is ready!"
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The backend logs are printed to the console on failure, but they are not uploaded as a GitHub Actions artifact. Consider adding a separate step with if: failure() to upload the backend.log file as an artifact, similar to how generated types are uploaded on lines 160-168. This would preserve logs even if the console output is truncated and make debugging easier. The step should use actions/upload-artifact@v4 with the backend.log path and should be added after this "Start FastAPI backend" step.

Copilot uses AI. Check for mistakes.
# Verify backend is responding
curl --retry 5 --retry-delay 2 --retry-connrefused http://localhost:8000/health || exit 1
echo "Backend is ready!"
sleep 30
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fixed 30-second sleep always waits the full duration even if the backend becomes ready sooner. Consider using a polling loop instead that checks the backend health every few seconds and breaks early when ready. This would make the workflow faster when the backend starts quickly while still handling slow starts. The current approach guarantees a minimum 30-second delay regardless of actual startup time.

Copilot uses AI. Check for mistakes.
@cliff-de-tech cliff-de-tech deleted the copilot/update-api-contracts-workflow branch January 26, 2026 04:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants