Skip to content

Load balancing at a connection level #110

@o-alexandre-felipe

Description

@o-alexandre-felipe

I have an application with two pods and some client from inside the same cluster connecting to them by a service, as far as I know this will do a connection level multiplexing.

There is no reason for the workload to be consistently higher at one pod than another, yet I can see one of the pods receiving nearly 3 times more load than the other over a period of 3 hours.

image

The pod with more load was already running when the other pod started.

My first hypothesis was session stickiness, but a quick test shows that the connections are balanced

for _ in `seq 300` ; 
do
   curl -b cookies.txt -c cookies.txt -s riva-api.riva:8002/metrics | grep '^nv_gpu_utilization'; 
   sleep 0.1; 
done | awk '{print $1}' | sort | uniq -c

My new hypothesis is that python riva client is reusing the connections. Does that make sense or we are guaranteed to start a new connection when calling riva.client.ASRService(auth)?

Here you can find some snippets of the configuration

riva-api (pod) partial definition

apiVersion: apps/v1
kind: Deployment
metadata:
  name: riva-api
  namespace: riva
  labels:
    app: riva-api
    release: riva-api
spec:
  replicas: 2
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app: riva-api
      release: riva-api
  template:
    metadata:
      labels:
        app: riva-api
        release: riva-api
    ...
    spec:
      ...
      containers:
        - name: riva-api
          image: nvcr.io/nvidia/riva/riva-speech:2.14.0
          ...

riva-api-online definition

apiVersion: v1
kind: Service
metadata:
  name: riva-api
  namespace: riva
spec:
  ports:
      ...
  selector:
    app: riva-api
    release: riva-api

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions