Skip to content

Conversation

@bcfre
Copy link
Owner

@bcfre bcfre commented Jan 14, 2026

What this PR does

Why we need it

Fixes #

How to test

Checklist

  • Tests added/updated (if applicable)
  • Docs updated (if applicable)
  • make test passes locally

@github-actions github-actions bot added documentation Documentation changes controller Controller changes inferenceservice InferenceService controller changes autoscaling Autoscaling (HPA/KEDA) changes oep OME Enhancement Proposal tests Test changes dependencies Dependency updates labels Jan 14, 2026

在引入工作负载策略层之前,OME的工作负载部署逻辑紧耦合在各个角色的Reconciler中,每个组件独立创建Deployment、Service、Knative Service等资源。这种架构存在以下问题:

1. **扩展性受限**:只支持每个角色有自己独立的资源,无法支持多个角色共用一个工作负载,难以支持All-in-One工作负载类型如RBG、Grove等
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 可运维性不足:
  • 不支持镜像原地升级:现有工作负载在镜像更新时必须重建 Pod,导致服务中断或延迟,拖慢整体发布效率。
  • 缺少指定预热控制能力。
    2. 多角色协同能力弱
  • 缺乏跨角色统一编排:各角色独立调谐,无法感知彼此的版本、状态或依赖关系。在扩容或升级时,不能保证角色间的版本一致性或比例协调(比如“先升 A 再升 B”、“A:B = 2:1”等策略),容易引发兼容性问题或服务异常。

@github-actions github-actions bot added the config Configuration changes label Jan 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

autoscaling Autoscaling (HPA/KEDA) changes config Configuration changes controller Controller changes dependencies Dependency updates documentation Documentation changes inferenceservice InferenceService controller changes oep OME Enhancement Proposal tests Test changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants