-
Notifications
You must be signed in to change notification settings - Fork 57
[Bugfix] Respect explicit deploymentMode when minReplicas=0 #446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
* clean up deps * update router image * add router config to sglang runtimes * add pd runtimes * add decoderconfig to pd runtimes * add -pd as suffix to runtime names * use --model-path for consistancy * replace --tp with --tp-size for consistency * use latest sgl image in runtimes * fix image repo * fix ome manager img name * cleanup dep * add pd to runtime kustomize
* [doc] add missing benchmark in readme * [misc] add pr labeler
Bumps [actions/configure-pages](https://github.com/actions/configure-pages) from 4 to 5. - [Release notes](https://github.com/actions/configure-pages/releases) - [Commits](actions/configure-pages@v4...v5) --- updated-dependencies: - dependency-name: actions/configure-pages dependency-version: '5' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [softprops/action-gh-release](https://github.com/softprops/action-gh-release) from 1 to 2. - [Release notes](https://github.com/softprops/action-gh-release/releases) - [Changelog](https://github.com/softprops/action-gh-release/blob/master/CHANGELOG.md) - [Commits](softprops/action-gh-release@v1...v2) --- updated-dependencies: - dependency-name: softprops/action-gh-release dependency-version: '2' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* remove llama 3.3 70B FP8 from models * change llama 3.2 90B rt to bf16 version * remove 11b 90b pd rt * add llama 3.3 pd isvc sample * sgl runtime fixes * update sample isvc * fix inference service status propagation * fix API version * Revert "fix API version" This reverts commit 326f4b6. * Revert "fix inference service status propagation" This reverts commit 46fe309.
…ng (sgl-project#121) - Fix progress logging by reordering HubConfig options to prevent logger override - Fix model deletion using consistent path calculation with getDestPath - Add proper nil checks for progressManager.logger to prevent panics - Add HTTP 429 rate limiting handling with exponential backoff - Enable detailed progress logging for model downloads - Fix RBAC permissions to allow model-agent to get secrets
* [release] release 0.1.2
…gl-project#127) - Add support for AWS S3, Azure Blob, GCS, and GitHub storage providers - Extend storage package with parse and validation functions for each provider - Update BuildStorageArgs to generate appropriate CLI arguments for genai-bench - Add comprehensive test coverage for all storage types - Update benchmark documentation with multi-cloud examples and best practices
sgl-project#128) [misc] Add support for 20+ new model architectures to HuggingFace model config parser
Replace deprecated beta.kubernetes.io/instance-type label with the GA node.kubernetes.io/instance-type label for node instance type detection. The code now tries the newer label first using the ok value from map lookup and falls back to the deprecated beta label for backward compatibility. This ensures the code works with both old and new Kubernetes versions while preferring the non-deprecated label when available. The deprecated label constant has been moved to pkg/constants for better maintainability.
…ject#430) * [chart] Update InferenceService template for PD mode support - Remove runtime spec (auto-selected by operator) - Add PD mode auto-detection (model names ending with "-pd") - PD mode: requires engine, decoder, and router - Non-PD mode: requires engine, optional router (no decoder) - Support both flat (minReplicas) and nested (engine.minReplicas) config * [chart] Replace -pd model entries with pdMode configuration option - Remove 10 -pd model entries from values.yaml (they're not separate models) - Remove 10 -pd entries from model registry in _helpers.tpl - Add pdMode field: when true, InferenceService includes decoder and router - Update InferenceService to use explicit pdMode flag (not model name suffix) - Update README with pdMode documentation and supported PD models list - Update model count from 176 to 165 PD mode supported models: kimi-k2-instruct, deepseek-rdma, llama-3-1-70b-instruct, llama-3-2-1b-instruct, llama-3-2-3b-instruct, llama-3-3-70b-instruct, llama-4-maverick-17b-128e-instruct-fp8, llama-4-scout-17b-16e-instruct, mistral-7b-instruct, mixtral-8x7b-instruct
…ce creation (sgl-project#441) - Add createModel option to control ClusterBaseModel/BaseModel creation - Add createRuntime option to control ClusterServingRuntime creation - Both default to true for backward compatibility - Allows creating runtime-only or model-only deployments - Update README with examples for both use cases
- Remove fake models (deepseek-rdma) - Remove variant models (-grpc, -bf16, -pd-grpc) - Rename models to match config/models metadata names - Add missing config/models entries for gemma-2-27b-it and Qwen1.5 models
…ject#444) - Add 40 missing models to values.yaml from config/models - Remove obsolete gpt-oss-120b-bf16 and gpt-oss-20b-bf16 configs - Total: 196 models now in sync
…#411) * Add model capabilities for the supported endpoints * Add new model capacities and endpoints * Use ModelAPICapability as the name
Fix bug where minReplicas=0 on Engine or Router components would incorrectly trigger Serverless deployment mode, ignoring the user's explicit deploymentMode: RawDeployment annotation. Changes: - Add shouldInferServerless() helper to centralize the logic - Pass globalDeploymentMode through DetermineDeploymentModes() - Update DetermineEngineDeploymentMode() signature - Add comprehensive tests for the fix When deploymentMode: RawDeployment is explicitly set, minReplicas=0 now correctly creates a Deployment instead of a Knative Service.
Summary of ChangesHello @jskswamy, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical issue where user-defined Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request effectively addresses the bug where minReplicas: 0 incorrectly overrides an explicit deploymentMode: RawDeployment annotation. The introduction of the shouldInferServerless helper function is a great improvement, centralizing the decision logic and making the code cleaner and more maintainable. The function signatures have been updated appropriately to propagate the global deployment mode, and the changes are well-tested with a comprehensive new test suite. I've found one issue in the new test file that will cause a compilation failure, which I've commented on. Otherwise, this is a solid fix.
What this PR does
Fixes the bug where
minReplicas: 0on a component (Engine or Router) ignores the explicitdeploymentMode: RawDeploymentannotation.Changes
deployment.goshouldInferServerless()helper to centralize logic; updatedDetermineEngineDeploymentMode()andDetermineDeploymentModes()signaturescontroller.godeploymentModetoDetermineDeploymentModes()ingress/reconciler.gocomponents/engine.goutils_test.godeployment_test.goKey Implementation
Added a centralized helper to avoid duplicate logic:
Why we need it
When users explicitly set
deploymentMode: RawDeploymentvia annotation, they expect that mode to be respected. However, settingminReplicas: 0on Engine or Router would override this to Serverless mode.This caused:
no kind is registered for the type v1.Service in schemeBefore
After
Global
deploymentMode: RawDeploymentis respected.minReplicas: 0only triggers Serverless when the global mode is not explicitly set to RawDeployment.Fixes #445
Checklist
make fmtpasses locallymake vetpasses locallymake tidypasses locally