Fix API contract validation workflow backend startup race condition#3
Conversation
- Add backend logging to backend.log for debugging - Increase wait time from 15 to 30 seconds - Add process validation to check if backend is still running - Increase curl retries from 5 to 10 with 3-second delays - Add error handling to print backend logs on failure Co-authored-by: cliff-de-tech <137389025+cliff-de-tech@users.noreply.github.com>
- Define BACKEND_LOG variable to avoid duplication - Use absolute path with github.workspace for better reliability - Quote variable references for safety Co-authored-by: cliff-de-tech <137389025+cliff-de-tech@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR addresses a race condition in the API Contract Validation workflow where the FastAPI backend wasn't ready when health checks ran, causing ECONNREFUSED errors. The changes improve startup reliability and debugging capabilities.
Changes:
- Capture backend logs to a file for post-mortem debugging
- Increase startup grace period from 15s to 30s
- Add process liveness check before health check attempt
- Increase curl retry budget from 5 to 10 attempts with 3-second delays
- Print backend logs when process dies or health check fails
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| # Verify backend is responding with more retries | ||
| echo "Checking backend health..." | ||
| curl --retry 10 --retry-delay 3 --retry-connrefused http://localhost:8000/health || { |
There was a problem hiding this comment.
The curl command is missing the --fail flag. By default, curl returns success (exit code 0) even when receiving HTTP error codes like 404 or 500. Add --fail to ensure that only 2xx HTTP responses are treated as successful. Without this flag, the health check could pass even if the backend is returning error responses, masking startup issues.
| curl --retry 10 --retry-delay 3 --retry-connrefused http://localhost:8000/health || { | |
| curl --fail --retry 10 --retry-delay 3 --retry-connrefused http://localhost:8000/health || { |
|
|
||
| # Verify backend is responding with more retries | ||
| echo "Checking backend health..." | ||
| curl --retry 10 --retry-delay 3 --retry-connrefused http://localhost:8000/health || { |
There was a problem hiding this comment.
The curl command uses 'localhost' which can resolve to either IPv4 (127.0.0.1) or IPv6 (::1). The original error message showed connection refused to ::1:8000 (IPv6). While the backend binds to 0.0.0.0, consider explicitly using http://127.0.0.1:8000/health instead of localhost to avoid any IPv6 resolution issues and ensure consistent IPv4 connection attempts.
| curl --retry 10 --retry-delay 3 --retry-connrefused http://localhost:8000/health || { | |
| curl --retry 10 --retry-delay 3 --retry-connrefused http://127.0.0.1:8000/health || { |
|
|
||
| # Verify backend is responding with more retries | ||
| echo "Checking backend health..." | ||
| curl --retry 10 --retry-delay 3 --retry-connrefused http://localhost:8000/health || { |
There was a problem hiding this comment.
Consider adding the --max-time flag to curl to prevent indefinite hangs if the backend responds but doesn't complete the request. For example: curl --retry 10 --retry-delay 3 --retry-connrefused --max-time 5 http://localhost:8000/health. This ensures each individual attempt times out after 5 seconds, preventing the health check from hanging indefinitely on a partially responsive backend.
| curl --retry 10 --retry-delay 3 --retry-connrefused http://localhost:8000/health || { | |
| curl --retry 10 --retry-delay 3 --retry-connrefused --max-time 5 http://localhost:8000/health || { |
| - name: Start FastAPI backend | ||
| run: | | ||
| cd backend | ||
| python -m uvicorn app:app --host 0.0.0.0 --port 8000 & | ||
| BACKEND_LOG="${{ github.workspace }}/backend.log" | ||
|
|
||
| # Start backend with logging | ||
| python -m uvicorn app:app --host 0.0.0.0 --port 8000 > "$BACKEND_LOG" 2>&1 & | ||
| BACKEND_PID=$! | ||
| echo "Backend started with PID: $BACKEND_PID" | ||
|
|
||
| # Wait for backend to be ready | ||
| echo "Waiting for backend to start..." | ||
| sleep 15 | ||
| # Verify backend is responding | ||
| curl --retry 5 --retry-delay 2 --retry-connrefused http://localhost:8000/health || exit 1 | ||
| echo "Backend is ready!" | ||
| sleep 30 | ||
|
|
||
| # Check if process is still running | ||
| if ! ps -p $BACKEND_PID > /dev/null; then | ||
| echo "❌ Backend process died! Logs:" | ||
| cat "$BACKEND_LOG" | ||
| exit 1 | ||
| fi | ||
|
|
||
| # Verify backend is responding with more retries | ||
| echo "Checking backend health..." | ||
| curl --retry 10 --retry-delay 3 --retry-connrefused http://localhost:8000/health || { | ||
| echo "❌ Backend health check failed! Logs:" | ||
| cat "$BACKEND_LOG" | ||
| exit 1 | ||
| } | ||
| echo "✅ Backend is ready!" |
There was a problem hiding this comment.
The backend logs are printed to the console on failure, but they are not uploaded as a GitHub Actions artifact. Consider adding a separate step with if: failure() to upload the backend.log file as an artifact, similar to how generated types are uploaded on lines 160-168. This would preserve logs even if the console output is truncated and make debugging easier. The step should use actions/upload-artifact@v4 with the backend.log path and should be added after this "Start FastAPI backend" step.
| # Verify backend is responding | ||
| curl --retry 5 --retry-delay 2 --retry-connrefused http://localhost:8000/health || exit 1 | ||
| echo "Backend is ready!" | ||
| sleep 30 |
There was a problem hiding this comment.
The fixed 30-second sleep always waits the full duration even if the backend becomes ready sooner. Consider using a polling loop instead that checks the backend health every few seconds and breaks early when ready. This would make the workflow faster when the backend starts quickly while still handling slow starts. The current approach guarantees a minimum 30-second delay regardless of actual startup time.
The API contract validation workflow fails with
ECONNREFUSED ::1:8000because the FastAPI backend isn't ready when the health check runs. No logs captured to debug startup failures or crashes.Changes
${github.workspace}/backend.logfor post-mortem debuggingImplementation
This addresses both timing issues (insufficient startup time) and observability gaps (silent failures).
Original prompt
This pull request was created from Copilot chat.
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.