Load balancing at a connection level

I have an application with two pods and some client from inside the same cluster connecting to them by a service, as far as I know this will do a connection level multiplexing. 

There is no reason for the workload to be consistently higher at one pod than another, yet I can see one of the pods receiving nearly 3 times more load than the other over a period of 3 hours.

![image](https://github.com/user-attachments/assets/d1be53a8-cd31-4860-89c6-f28997305216)

The pod with more load was already running when the other pod started.

My first hypothesis was session stickiness, but a quick test shows that the connections are balanced
```bash
for _ in `seq 300` ; 
do
   curl -b cookies.txt -c cookies.txt -s riva-api.riva:8002/metrics | grep '^nv_gpu_utilization'; 
   sleep 0.1; 
done | awk '{print $1}' | sort | uniq -c
```

My new hypothesis is that python riva client is reusing the connections. Does that make sense or we are guaranteed to start a new connection when calling `riva.client.ASRService(auth)`?


Here you can find some snippets of the configuration

riva-api (pod) partial definition
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: riva-api
  namespace: riva
  labels:
    app: riva-api
    release: riva-api
spec:
  replicas: 2
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app: riva-api
      release: riva-api
  template:
    metadata:
      labels:
        app: riva-api
        release: riva-api
    ...
    spec:
      ...
      containers:
        - name: riva-api
          image: nvcr.io/nvidia/riva/riva-speech:2.14.0
          ...
```

riva-api-online definition
```yaml
apiVersion: v1
kind: Service
metadata:
  name: riva-api
  namespace: riva
spec:
  ports:
      ...
  selector:
    app: riva-api
    release: riva-api
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Load balancing at a connection level #110

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Load balancing at a connection level #110

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions