From 7bf73600d00a007ddc2d86de60bdb96195974b24 Mon Sep 17 00:00:00 2001 From: protonicage Date: Fri, 19 Dec 2025 16:04:21 +0100 Subject: [PATCH] Update README.md Add tensorrt llm hint --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index 93e97b54..716640fc 100644 --- a/README.md +++ b/README.md @@ -1522,6 +1522,8 @@ to load the model after the server has been started. The model loading API is currently not supported during the `auto_complete_config` and `finalize` functions. +The model loading API applies to repository-managed backends. +TensorRT-LLM models must be launched via the TensorRT-LLM launcher and cannot be instantiated via pb_utils.load_model(files=...). ## Using BLS with Stateful Models [Stateful models](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/architecture.md#stateful-models)