[Bugfix] Respect explicit deploymentMode when minReplicas=0 #446

jskswamy · 2025-12-15T10:55:57Z

What this PR does

Fixes the bug where minReplicas: 0 on a component (Engine or Router) ignores the explicit deploymentMode: RawDeployment annotation.

Changes

File	Change
`deployment.go`	Added `shouldInferServerless()` helper to centralize logic; updated `DetermineEngineDeploymentMode()` and `DetermineDeploymentModes()` signatures
`controller.go`	Pass global `deploymentMode` to `DetermineDeploymentModes()`
`ingress/reconciler.go`	Get global mode from annotations and pass it
`components/engine.go`	Updated call to new function signature
`utils_test.go`	Updated existing test for new signature
`deployment_test.go`	Added 12 new tests for the fix

Key Implementation

Added a centralized helper to avoid duplicate logic:

// shouldInferServerless checks if minReplicas=0 should trigger Serverless mode.
// Returns false if globalDeploymentMode explicitly blocks it (e.g., RawDeployment).
func shouldInferServerless(minReplicas *int, globalDeploymentMode constants.DeploymentModeType) bool {
    if globalDeploymentMode == constants.RawDeployment {
        return false
    }
    return minReplicas != nil && *minReplicas == 0
}

Why we need it

When users explicitly set deploymentMode: RawDeployment via annotation, they expect that mode to be respected. However, setting minReplicas: 0 on Engine or Router would override this to Serverless mode.

This caused:

Unexpected Knative Service creation instead of Deployment
Confusing errors when Knative is not installed: no kind is registered for the type v1.Service in scheme
User's explicit configuration being silently ignored
Inability to use external autoscalers (like KEDA) with scale-to-zero

Before

annotations:
  ome.io/deployment-mode: RawDeployment # Ignored!
spec:
  predictor:
    engine:
      minReplicas: 0 # Triggers Serverless anyway

After

Global deploymentMode: RawDeployment is respected. minReplicas: 0 only triggers Serverless when the global mode is not explicitly set to RawDeployment.

Fixes #445

Checklist

Tests added/updated (if applicable)
Docs updated (if applicable) - N/A, internal bug fix
make fmt passes locally
make vet passes locally
make tidy passes locally

* clean up deps * update router image * add router config to sglang runtimes * add pd runtimes * add decoderconfig to pd runtimes * add -pd as suffix to runtime names * use --model-path for consistancy * replace --tp with --tp-size for consistency * use latest sgl image in runtimes * fix image repo * fix ome manager img name * cleanup dep * add pd to runtime kustomize

* [doc] add missing benchmark in readme * [misc] add pr labeler

Bumps [actions/configure-pages](https://github.com/actions/configure-pages) from 4 to 5. - [Release notes](https://github.com/actions/configure-pages/releases) - [Commits](actions/configure-pages@v4...v5) --- updated-dependencies: - dependency-name: actions/configure-pages dependency-version: '5' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Bumps [softprops/action-gh-release](https://github.com/softprops/action-gh-release) from 1 to 2. - [Release notes](https://github.com/softprops/action-gh-release/releases) - [Changelog](https://github.com/softprops/action-gh-release/blob/master/CHANGELOG.md) - [Commits](softprops/action-gh-release@v1...v2) --- updated-dependencies: - dependency-name: softprops/action-gh-release dependency-version: '2' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…ct#112)

* remove llama 3.3 70B FP8 from models * change llama 3.2 90B rt to bf16 version * remove 11b 90b pd rt * add llama 3.3 pd isvc sample * sgl runtime fixes * update sample isvc * fix inference service status propagation * fix API version * Revert "fix API version" This reverts commit 326f4b6. * Revert "fix inference service status propagation" This reverts commit 46fe309.

…ct#119)

…ng (sgl-project#121) - Fix progress logging by reordering HubConfig options to prevent logger override - Fix model deletion using consistent path calculation with getDestPath - Add proper nil checks for progressManager.logger to prevent panics - Add HTTP 429 rate limiting handling with exponential backoff - Enable detailed progress logging for model downloads - Fix RBAC permissions to allow model-agent to get secrets

* [release] release 0.1.2

…gl-project#126)

…gl-project#127) - Add support for AWS S3, Azure Blob, GCS, and GitHub storage providers - Extend storage package with parse and validation functions for each provider - Update BuildStorageArgs to generate appropriate CLI arguments for genai-bench - Add comprehensive test coverage for all storage types - Update benchmark documentation with multi-cloud examples and best practices

sgl-project#128) [misc] Add support for 20+ new model architectures to HuggingFace model config parser

…project#131)

Replace deprecated beta.kubernetes.io/instance-type label with the GA node.kubernetes.io/instance-type label for node instance type detection. The code now tries the newer label first using the ok value from map lookup and falls back to the deprecated beta label for backward compatibility. This ensures the code works with both old and new Kubernetes versions while preferring the non-deprecated label when available. The deprecated label constant has been moved to pkg/constants for better maintainability.

…ject#430) * [chart] Update InferenceService template for PD mode support - Remove runtime spec (auto-selected by operator) - Add PD mode auto-detection (model names ending with "-pd") - PD mode: requires engine, decoder, and router - Non-PD mode: requires engine, optional router (no decoder) - Support both flat (minReplicas) and nested (engine.minReplicas) config * [chart] Replace -pd model entries with pdMode configuration option - Remove 10 -pd model entries from values.yaml (they're not separate models) - Remove 10 -pd entries from model registry in _helpers.tpl - Add pdMode field: when true, InferenceService includes decoder and router - Update InferenceService to use explicit pdMode flag (not model name suffix) - Update README with pdMode documentation and supported PD models list - Update model count from 176 to 165 PD mode supported models: kimi-k2-instruct, deepseek-rdma, llama-3-1-70b-instruct, llama-3-2-1b-instruct, llama-3-2-3b-instruct, llama-3-3-70b-instruct, llama-4-maverick-17b-128e-instruct-fp8, llama-4-scout-17b-16e-instruct, mistral-7b-instruct, mixtral-8x7b-instruct

…ct#440)

…ce creation (sgl-project#441) - Add createModel option to control ClusterBaseModel/BaseModel creation - Add createRuntime option to control ClusterServingRuntime creation - Both default to true for backward compatibility - Allows creating runtime-only or model-only deployments - Update README with examples for both use cases

…t coming soon (sgl-project#442)

- Remove fake models (deepseek-rdma) - Remove variant models (-grpc, -bf16, -pd-grpc) - Rename models to match config/models metadata names - Add missing config/models entries for gemma-2-27b-it and Qwen1.5 models

…ject#444) - Add 40 missing models to values.yaml from config/models - Remove obsolete gpt-oss-120b-bf16 and gpt-oss-20b-bf16 configs - Total: 196 models now in sync

…#411) * Add model capabilities for the supported endpoints * Add new model capacities and endpoints * Use ModelAPICapability as the name

Fix bug where minReplicas=0 on Engine or Router components would incorrectly trigger Serverless deployment mode, ignoring the user's explicit deploymentMode: RawDeployment annotation. Changes: - Add shouldInferServerless() helper to centralize the logic - Pass globalDeploymentMode through DetermineDeploymentModes() - Update DetermineEngineDeploymentMode() signature - Add comprehensive tests for the fix When deploymentMode: RawDeployment is explicitly set, minReplicas=0 now correctly creates a Deployment instead of a Knative Service.

gemini-code-assist · 2025-12-15T10:56:15Z

Summary of Changes

Hello @jskswamy, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical issue where user-defined RawDeployment mode was being silently ignored if a component specified minReplicas: 0, leading to unexpected Serverless deployments. The changes ensure that explicit deployment mode annotations are always honored, preventing misconfigurations, confusing errors, and enabling proper use of external autoscalers with scale-to-zero capabilities.

Highlights

Bug Fix: Deployment Mode Override: Resolved a bug where minReplicas: 0 on an Engine or Router component would incorrectly override an explicitly set deploymentMode: RawDeployment annotation, forcing Serverless mode.
Centralized Logic for Serverless Inference: Introduced a new helper function, shouldInferServerless(), to centralize the logic for determining if minReplicas=0 should trigger Serverless mode, ensuring it respects the global deployment mode.
Updated Function Signatures: Modified the signatures of DetermineEngineDeploymentMode() and DetermineDeploymentModes() to accept a globalDeploymentMode parameter, allowing the system to properly consider explicit global settings.
Enhanced Test Coverage: Added 12 new tests in deployment_test.go to thoroughly validate the fix and ensure that deployment mode inference correctly respects explicit global settings.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request effectively addresses the bug where minReplicas: 0 incorrectly overrides an explicit deploymentMode: RawDeployment annotation. The introduction of the shouldInferServerless helper function is a great improvement, centralizing the decision logic and making the code cleaner and more maintainable. The function signatures have been updated appropriately to propagate the global deployment mode, and the changes are well-tested with a comprehensive new test suite. I've found one issue in the new test file that will cause a compilation failure, which I've commented on. Otherwise, this is a solid fix.

pkg/controller/v1beta1/inferenceservice/utils/deployment_test.go

slin1237 and others added 30 commits June 27, 2025 18:13

[ci] fix release (sgl-project#87)

fc92bc3

[docs] fix concept site references (sgl-project#88)

0fd5742

[bugfix] fix gateway api hyperlink in readme (sgl-project#91)

442a59c

Slin/doc fix 2 (sgl-project#92)

75d42fb

* [doc] add missing benchmark in readme * [misc] add pr labeler

[oep] add multi cloud support proposal (sgl-project#93)

365d7b7

[misc] add auth interfaces (sgl-project#104)

799f857

[misc] add oci auth for up, ip, and rp (sgl-project#105)

b6b9762

[misc] add aws auth for accesskey, ip, assumerole, wi, etc (sgl-proje…

ac78419

…ct#112)

[misc] ad gcp auth module (sgl-project#115)

81de6df

[misc] ad azure auth module (sgl-project#117)

8aa4900

[bugfix] fixes the model agent daemonset deployment volume (sgl-proje…

1edc643

…ct#119)

[docs] updates the serving runtime documentation (sgl-project#120)

574f39d

[bugfix] fix hf token fetch and deletion logic (sgl-project#122)

52edf70

[release] release 0.1.3 (sgl-project#123)

dc72b52

Release v0.1.3 (sgl-project#124)

07085a3

* [release] release 0.1.2

Renames SRT to SGLang for clarity (sgl-project#125)

c5bd92b

[BUG] Fix Inference Service Status Propagation (sgl-project#89)

6edc59a

[bugfix] fix: resolve model agent download loops and timeout issues (s…

8e2356f

…gl-project#126)

[misc] Add support for 20+ new model architectures to HuggingFace mod… (

43ff6b0

sgl-project#128) [misc] Add support for 20+ new model architectures to HuggingFace model config parser

feat(helm): Add imagePullSecrets support to ome-resources chart (sgl-…

3638db0

…project#131)

[docs] update intallation document (sgl-project#130)

5bf5cc7

Fix YAML syntax error in helm workflow (sgl-project#133)

50cc7d6

Fix typo: rename Scount to Scout (sgl-project#135)

712494d

Xinyue Zhang and others added 12 commits December 13, 2025 04:56

[core] Add node affinity for benchmark job pod (sgl-project#428)

b990c51

Bump actions/download-artifact from 6 to 7 (sgl-project#435)

e3c20ff

Bump actions/cache from 4 to 5 (sgl-project#432)

f8f1bee

Bump actions/upload-artifact from 5 to 6 (sgl-project#434)

5034435

[helm] Add PD (Prefill-Decode) disaggregation mode support (sgl-proje…

57098e8

…ct#440)

[helm] Add createRuntime: true to all models and document vLLM suppor…

c3a7e6c

…t coming soon (sgl-project#442)

[helm] Sync values.yaml models with config/models (sgl-project#443)

6a58139

- Remove fake models (deepseek-rdma) - Remove variant models (-grpc, -bf16, -pd-grpc) - Rename models to match config/models metadata names - Add missing config/models entries for gemma-2-27b-it and Qwen1.5 models

[helm] Sync all models between values.yaml and config/models (sgl-pro…

9d1ed38

…ject#444) - Add 40 missing models to values.yaml from config/models - Remove obsolete gpt-oss-120b-bf16 and gpt-oss-20b-bf16 configs - Total: 196 models now in sync

[API] Add model capabilities for the supported endpoints (sgl-project…

462005f

…#411) * Add model capabilities for the supported endpoints * Add new model capacities and endpoints * Use ModelAPICapability as the name

jskswamy requested review from CatherineSue, YouNeedCryDear, beiguo218, pallasathena92 and slin1237 as code owners December 15, 2025 10:55

github-actions bot added inferenceservice InferenceService controller changes controller Controller changes ingress Ingress/networking changes tests Test changes labels Dec 15, 2025

gemini-code-assist bot reviewed Dec 15, 2025

View reviewed changes

pkg/controller/v1beta1/inferenceservice/utils/deployment_test.go Show resolved Hide resolved

slin1237 force-pushed the main branch from f630cdf to eabe72d Compare December 22, 2025 22:40

slin1237 requested review from XinyueZhang369, chengjieyao, davidsnahm, frankzhouhr, heymrbox and speedmancs as code owners December 22, 2025 22:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] Respect explicit deploymentMode when minReplicas=0 #446

[Bugfix] Respect explicit deploymentMode when minReplicas=0 #446

Uh oh!

jskswamy commented Dec 15, 2025

Uh oh!

gemini-code-assist bot commented Dec 15, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

[Bugfix] Respect explicit deploymentMode when minReplicas=0 #446

Are you sure you want to change the base?

[Bugfix] Respect explicit deploymentMode when minReplicas=0 #446

Uh oh!

Conversation

jskswamy commented Dec 15, 2025

What this PR does

Changes

Key Implementation

Why we need it

Before

After

Checklist

Uh oh!

gemini-code-assist bot commented Dec 15, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants