-
Notifications
You must be signed in to change notification settings - Fork 44
Description
I have an application with two pods and some client from inside the same cluster connecting to them by a service, as far as I know this will do a connection level multiplexing.
There is no reason for the workload to be consistently higher at one pod than another, yet I can see one of the pods receiving nearly 3 times more load than the other over a period of 3 hours.
The pod with more load was already running when the other pod started.
My first hypothesis was session stickiness, but a quick test shows that the connections are balanced
for _ in `seq 300` ;
do
curl -b cookies.txt -c cookies.txt -s riva-api.riva:8002/metrics | grep '^nv_gpu_utilization';
sleep 0.1;
done | awk '{print $1}' | sort | uniq -cMy new hypothesis is that python riva client is reusing the connections. Does that make sense or we are guaranteed to start a new connection when calling riva.client.ASRService(auth)?
Here you can find some snippets of the configuration
riva-api (pod) partial definition
apiVersion: apps/v1
kind: Deployment
metadata:
name: riva-api
namespace: riva
labels:
app: riva-api
release: riva-api
spec:
replicas: 2
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app: riva-api
release: riva-api
template:
metadata:
labels:
app: riva-api
release: riva-api
...
spec:
...
containers:
- name: riva-api
image: nvcr.io/nvidia/riva/riva-speech:2.14.0
...riva-api-online definition
apiVersion: v1
kind: Service
metadata:
name: riva-api
namespace: riva
spec:
ports:
...
selector:
app: riva-api
release: riva-api