[WIP] Feat: update existing resources instead of wholesale recreation#92
Draft
[WIP] Feat: update existing resources instead of wholesale recreation#92
Conversation
changed default resource "identifier" to be a resource id that includes an input config name not generated from config args so a logical resource isn't defined by its config (so we can change config for the same resource) added a "resource hash" to the base sls resource class so we can detect changes to input config args and update resources in place instead of redeploying adds update methods to serverless resources changed deploy method to add platform-related state (eg durable resource ids) back to pickled state and config objects at runtime so we can fetch and interact with runpod sls endpoints created via tetra add update template methods to sls resource so we can update template-only variables via gql (eg env vars) changed the defaults for some sls resource configs to reflect existing defaults in runpod add update path to resource manager class when existing and new config have differnt resource hashes changed the behavior of sync gpu and gpuIds fields because there was a bug where gpus would always get created and pickled as the ANY gpu group
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview: update existing resources instead of wholesale recreation
Objective: Enable in-place resource updates instead of costly redeploys when configuration changes are detected.
Previously, any configuration change would trigger a complete resource teardown and redeploy cycle.
Solution Overview
This PR introduces anupdate system that:
Detects configuration changes using content-based hashing
Updates resources in-place via platform APIs when possible
Handles complex updates that may require both template and endpoint modifications
Key Changes
resource_id: Now represents a logical, human-readable identifier (ResourceType_name)
Provides stable identity across configuration changes
Enables resource reuse and update tracking
Replaces the previous hash-based approach for resource identification
resource_hash: Content-based hash for change detection
Built from _hashed_fields - only mutable configuration parameters
Excludes platform state (IDs, deployment metadata) to focus on user-controllable config
Triggers update flow when hash changes between runs
New GraphQL operations: update_endpoint() and update_template() mutations
Granular updates: System determines whether template, endpoint, or both need updating
State preservation: Maintains platform IDs and deployment metadata across updates
_hashed_fields: Class-level definition of configuration fields that trigger updates
fields_to_update: Runtime tracking of specific changes to optimize update operations
sync_config_with_deployed_resource(): Transfers deployment state between resource instances
Bug Fixes
GPU configuration persistence: Fixed issue where gpuIds wasn't being properly stored in pickled resource state
Template ID tracking: Ensures template relationships are maintained through update cycles
Logic flow for resource update/creation
flowchart TD A[get_or_deploy_resource called] --> B[Acquire resource lock] B --> C{Resource exists?} C -->|No| D[Deploy new resource] D --> E[Add to manager & save] E --> F[Return deployed resource] C -->|Yes| G{Is resource deployed?} G -->|No| H[Remove invalid resource] H --> I[Deploy new resource] I --> J[Add to manager & save] J --> K[Return deployed resource] G -->|Yes| L{resource_hash changed?} L -->|No| M[Resource unchanged] M --> N[Return existing resource] L -->|Yes| O[Config change detected] O --> P[Compare _hashed_fields] P --> Q[Identify changed fields] Q --> R[Add to fields_to_update set] R --> S[sync_config_with_deployed_resource] S --> T[Call resource.update] T --> U{Pod template needs update?} U -->|Yes| V[Update template via GraphQL] V --> W{Template-only changes?} W -->|Yes| X[Return updated resource] W -->|No| Y[Update endpoint via GraphQL] U -->|No| Y Y --> Z[Remove old resource] Z --> AA[Add updated resource] AA --> BB[Return updated resource]In the future, we'll have to integrate with durable Tetra state on the server side.