Increase service keepAliveTimeout from 72s to 905s by jdolle · Pull Request #7677 · graphql-hive/console

jdolle · 2026-02-11T02:46:15Z

Background

One of our customers was encountering upstream disconnects. This should help us reduce the likelihood that these 503s happen. Note that these errors do get retried by the hive-client, but it's much more efficient to maintain connections slightly longer. This also matches the idle timeout for a number of AWS services such as NAT Gateway.

Description

Increase keepAliveTimeout for services and contour (envoy) proxy. And I added an inline comment because this configuration is often misunderstood or overlooked.

gemini-code-assist · 2026-02-11T02:46:27Z

Summary of Changes

Hello @jdolle, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses customer-reported upstream disconnects by extending the keepAliveTimeout for the usage-ingestor service. The change aims to reduce 503 errors by allowing connections to persist longer, which is more efficient than relying solely on client-side retries. An explanatory comment has also been added to the configuration for clarity.

Highlights

Increased keepAliveTimeout: The keepAliveTimeout for the usage-ingestor service has been increased from 72 seconds (72_000 ms) to 350 seconds (350_000 ms) to enhance connection stability and reduce upstream disconnects.
Configuration Clarity: An inline comment was added to the keepAliveTimeout configuration in fastify.ts to provide context and explain the rationale behind the new value, preventing potential misunderstandings.

Changelog

.changeset/lovely-points-beg.md
- Added a new changeset entry for the hive package.
packages/services/service-common/src/fastify.ts
- Increased the keepAliveTimeout in the createServer function to 350_000 milliseconds.
- Added a detailed inline comment explaining the rationale behind the keepAliveTimeout value.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request increases the keepAliveTimeout to 350 seconds to mitigate upstream disconnects. While the intention is good, the current implementation applies this change globally to all services using the common createServer function, whereas the justification is specific to usage-ingestor. I've left a comment suggesting how to scope this change correctly to avoid potential side-effects on other services.

packages/services/service-common/src/fastify.ts

github-actions · 2026-02-11T02:47:57Z

🚀 Snapshot Release (`alpha`)

The latest changes of this PR are available as alpha on npm (based on the declared changesets):

Package	Version	Info
`hive`	`9.5.0-alpha-20260225224145-7a3aec61aaaaafa89989af1ed4ff9d755f36f575`	npm ↗︎ unpkg ↗︎

github-actions · 2026-02-11T02:52:00Z

💻 Website Preview

The latest changes are available as preview in: https://pr-7677.hive-landing-page.pages.dev

github-actions · 2026-02-11T02:53:06Z

🐋 This PR was built and pushed to the following Docker images:

Targets: build

Platforms: linux/amd64

Image Tag: 7a3aec61aaaaafa89989af1ed4ff9d755f36f575

n1ru4l · 2026-02-11T07:14:02Z

We are using Cloudflare, which recommends a keep alive of 900 seconds. Is it sufficient to only configure this on the Node.js services - or ~~does this also require adjusting the envoy proxy configuration~~ (did have a look before coffee)?

Should we increase the keep alive of envoy for all services behind cloudflare?

jdolle · 2026-02-11T21:37:10Z

Great find. I didn't realize all of our requests were going through Cloudflare.

Knowing that now, I believe we should increase the keep alive of all our services to 905_000 (905 seconds).

dotansimha · 2026-02-12T09:06:17Z

packages/services/service-common/src/fastify.ts

    trustProxy: true,
+    // If a connection is idle for 905 seconds or more, the connection times out.
+    // The default for fastify is 72_000, but this is meant for more dynamic clients.
+    // Requests to Hive's services are proxied through Cloudflare, which has a 900s


While CloudFlare has 900s configured, it's still only 1 part in the chain: cf -> cluster (envoy) -> usage-service.

Envoy has a limit of 1h by default, so adjusting this to 905s, should be ok, i think? (900s -> 1h -> 905s)

Also, this change assumes all users go through CF, which might not be correct for self-hosters so maybe it should be adjustable?

1hr seems incredibly long.
In reviewing envoy's docs, it looks like we can adjust this but they combined the idle and keepalive timeouts into a single idle_timeout option.

How do we feel about lowering the envoy idle_timeout to 905s and setting our services to 910s? Also I will make this adjustable.

I'm still uncovering some details. It looks like we actually use Contour for our proxy and it has a 60s idle timeout
https://projectcontour.io/docs/main/configuration/#timeout-configuration

Contour is just the K8s integration layer for Envoy, but it's good to know that they configure some things differently than enovy's defaults.

dotansimha

LGTM, this should align the configs across the pipeline.
Can we please test it on dev / staging cluster before merging? :)

Increase usage-ingestor keepAliveTimeout from 72s to 350s

ccca25b

jdolle self-assigned this Feb 11, 2026

jdolle requested a review from n1ru4l February 11, 2026 02:46

gemini-code-assist bot reviewed Feb 11, 2026

View reviewed changes

packages/services/service-common/src/fastify.ts Outdated Show resolved Hide resolved

github-actions bot deployed to preview February 11, 2026 02:51 View deployment

Fix where this was added

dfb6e2d

github-actions bot deployed to preview February 11, 2026 03:08 View deployment

n1ru4l requested a review from dotansimha February 11, 2026 07:06

Modify per Cloudflare

f4b9a99

github-actions bot deployed to preview February 11, 2026 21:48 View deployment

jdolle changed the title ~~Increase usage-ingestor keepAliveTimeout from 72s to 350s~~ Increase service keepAliveTimeout from 72s to 905s Feb 11, 2026

dotansimha reviewed Feb 12, 2026

View reviewed changes

Merge branch 'main' into usage-ingestor-keepalive

b69e1a5

github-actions bot deployed to preview February 12, 2026 18:28 View deployment

Adjust contour idle timeout

4e13540

github-actions bot deployed to preview February 12, 2026 19:17 View deployment

fix typecheck issue

c2c1b8f

github-actions bot deployed to preview February 12, 2026 19:31 View deployment

Merge branch 'main' into usage-ingestor-keepalive

cc2b0a1

github-actions bot deployed to preview February 17, 2026 17:13 View deployment

n1ru4l requested a review from dotansimha February 25, 2026 09:02

dotansimha approved these changes Feb 25, 2026

View reviewed changes

Merge branch 'main' into usage-ingestor-keepalive

7a3aec6

github-actions bot deployed to preview February 25, 2026 22:46 View deployment

theguild-bot deployed to development February 25, 2026 22:54 Active

theguild-bot temporarily deployed to staging February 25, 2026 22:54 Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase service keepAliveTimeout from 72s to 905s#7677

Increase service keepAliveTimeout from 72s to 905s#7677
jdolle wants to merge 8 commits intomainfrom
usage-ingestor-keepalive

jdolle commented Feb 11, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Feb 11, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

github-actions bot commented Feb 11, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 11, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 11, 2026 •

edited

Loading

Uh oh!

n1ru4l commented Feb 11, 2026 •

edited

Loading

Uh oh!

jdolle commented Feb 11, 2026

Uh oh!

dotansimha Feb 12, 2026

Uh oh!

jdolle Feb 12, 2026 •

edited

Loading

Uh oh!

jdolle Feb 12, 2026

Uh oh!

dotansimha Feb 25, 2026 •

edited

Loading

Uh oh!

dotansimha left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

Conversation

jdolle commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

Description

Uh oh!

gemini-code-assist bot commented Feb 11, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

github-actions bot commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🚀 Snapshot Release (alpha)

Uh oh!

github-actions bot commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💻 Website Preview

Uh oh!

github-actions bot commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

n1ru4l commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jdolle commented Feb 11, 2026

Uh oh!

dotansimha Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

jdolle Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jdolle Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

dotansimha Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dotansimha left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

jdolle commented Feb 11, 2026 •

edited

Loading

github-actions bot commented Feb 11, 2026 •

edited

Loading

🚀 Snapshot Release (`alpha`)

github-actions bot commented Feb 11, 2026 •

edited

Loading

github-actions bot commented Feb 11, 2026 •

edited

Loading

n1ru4l commented Feb 11, 2026 •

edited

Loading

jdolle Feb 12, 2026 •

edited

Loading

dotansimha Feb 25, 2026 •

edited

Loading

dotansimha left a comment •

edited

Loading