Skip to content

Implement Hash-Based Routing#519

Draft
hoffmaen wants to merge 17 commits intocloudfoundry:developfrom
sap-contributions:hash-based-routing-with-overload
Draft

Implement Hash-Based Routing#519
hoffmaen wants to merge 17 commits intocloudfoundry:developfrom
sap-contributions:hash-based-routing-with-overload

Conversation

@hoffmaen
Copy link
Contributor

@hoffmaen hoffmaen commented Nov 11, 2025

Summary

This Pull-Request implements #505.

  • Implement hash-based routing in Gorouter
    • Gorouter routes requests to endpoints based on the value of a configured request header
    • Gorouter distributes requests to different predefined application instances, when the current request count to an app instance exceeds the balance factor
    • Gorouter routes requests to the "next instance" in the lookup table on retries
  • Follow decisions documented in this comment

Backward Compatibility

Breaking Change? Yes/No

@b1tamara b1tamara force-pushed the hash-based-routing-with-overload branch from e9cbe0c to 7b456f8 Compare December 3, 2025 09:32
@hoffmaen hoffmaen force-pushed the hash-based-routing-with-overload branch from 9f21f6b to 513c575 Compare December 9, 2025 08:24
Copy link
Contributor

@peanball peanball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First set of comments. I have not looked at the implementation of maglev and the hash based instance determination.

I've focussed on the integration with the existing Gorouter code, route lookup, LB algo handling, etc.

}

func (p *EndpointPool) Endpoints(logger *slog.Logger, initial string, mustBeSticky bool, azPreference string, az string) EndpointIterator {
func (p *EndpointPool) Endpoints(logger *slog.Logger, initial string, mustBeSticky bool, azPreference string, az string, globalLB string, request *http.Request) EndpointIterator {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're passing in the full request to then dynamically fetch the hash based header. Why not pass an optional reference to *HashRoutingProperties? Parsing then can be done at the call site where you have the request anyway.

logger.Debug("endpoint-iterator-with-round-robin-lb-algo")
return NewRoundRobin(logger, p, initial, mustBeSticky, azPreference == config.AZ_PREF_LOCAL, az)
}
return NewRoundRobin(logger, p, initial, mustBeSticky, azPreference == config.AZ_PREF_LOCAL, az)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could also move this to the default: to make it clear in the switch. There is also no debug log for this branch.

if e != nil {
e.RLock()
defer e.RUnlock()
return e.endpoint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e might not be valid anymore, as it is retrieved without lock. You then acquire a lock and should re-fetch the value to be sure that it's still there.

@hoffmaen hoffmaen force-pushed the hash-based-routing-with-overload branch from 5030bbc to 560f935 Compare January 16, 2026 14:29
Copy link
Contributor

@peanball peanball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comments, now on the hash_based.go

balanceFactor := h.pool.HashRoutingProperties.BalanceFactor

if isEndpointOverloaded {
h.logger.Debug("hash-based-routing-endpoint-overloaded", slog.String("host", h.pool.host), slog.String("endpoint-id", endpoint.PrivateInstanceId), slog.Int64("endpoint-connections", currentInFlightRequestCount))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as debug log is usually not on, for performance's sake it's better to guard it by checking if debug level is enabled at all in the logger.

something like

if !h.logger.Enabled(context.Background(), slog.LevelDebug) {
    h.logger.Debug(...)
}

return nil
}

if !h.mustBeSticky {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this set of conditions could be refactored to be more readable.

The code is also almost 1:1 from leastconnection.go (Next()), maybe we can consolidate into one piece of code?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See commit: 6b7daaa

@b1tamara b1tamara requested a review from peanball January 21, 2026 07:22
@peanball
Copy link
Contributor

LGTM. Once we have the performance tests, we should move to community review.

@hoffmaen
Copy link
Contributor Author

hoffmaen commented Jan 28, 2026

Good news - we evaluated the performance of gorouter with hash-based routing support.

Our test setup constists of scaled out HAProxy (as a load balancer in front of gorouter), scaled out Diego Cells and a very simple and performant application with 30 instances. This allows us to performance test gorouter via external requests, where gorouter is the bottleneck.

We compared the performance of the latest stable routing-release, and the same release including changes from this Pull-Request. Using oha (inofficial successor of hey), we put various load patterns on the application and measured throughput, CPU usage of gorouter, and latency. In summary,

  • both versions performed absolutely the same when using round-robin
  • we achieved the same performance using hash-based routing and round-robin, when requests are distributed across all instances (achieved by running oha with multiple instances, each using a different hash_header value)
  • various mixed load scenarios (load on a route with round-robin parallel to load on a route with hash) showed no impact on performance

We also explored that requests are routed to next instances as expected, when the max. connection limit of one app instance was reached. The hash_balance factor is effective and allows distribution of requests to next instances before overloading it.

Please feel free to reach out for details on the results!

peanball

This comment was marked as outdated.

@b1tamara b1tamara requested a review from peanball January 28, 2026 13:35
@peanball peanball dismissed their stale review January 28, 2026 14:00

Changes were addressed. As this is still subject to change, I cannot give "final" approval yet.

@hoffmaen hoffmaen marked this pull request as ready for review January 28, 2026 14:53
@hoffmaen hoffmaen requested a review from a team as a code owner January 28, 2026 14:53
@a18e a18e force-pushed the hash-based-routing-with-overload branch from 21cc8b7 to b414a38 Compare January 30, 2026 10:31
@b1tamara
Copy link
Contributor

b1tamara commented Feb 5, 2026

During performance tests, we figured out that the Gorouter memory consumption is higher than expected if we keep the permutation table (table which is used to fill the final maglev lookup table) in memory. We want to investigate this finding further and optimize it.
Moving PR back to Draft.

@b1tamara b1tamara marked this pull request as draft February 5, 2026 08:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

5 participants