feat(gpu): refactor gpu plugins#2
Open
JustinChengLZ wants to merge 22 commits intoluomingmeng:dev/support-gpu-memory-qrm-pluginfrom
Open
feat(gpu): refactor gpu plugins#2JustinChengLZ wants to merge 22 commits intoluomingmeng:dev/support-gpu-memory-qrm-pluginfrom
JustinChengLZ wants to merge 22 commits intoluomingmeng:dev/support-gpu-memory-qrm-pluginfrom
Conversation
chore: add unit tests chore: add unit tests chore: add unit tests chore: add unit tests
a6c0da9 to
01a1035
Compare
…lugins feat: introduce rdma state and allow states to share within gpu sub-plugins feat: introduce rdma state and allow states to share within gpu sub-plugins
Dev/support rdma state
…ompany resource allocation feat: implement rdma custom device plugin and implement logic for accompany resource allocation
a4a23f1 to
9442945
Compare
feat: implement allocation of accompany resource first before device
- Remove unused ResourcePluginsNames field and related configurations - Add DefaultAccompanyResourceName method to CustomDevicePlugin interface - Make registry maps private and add getter functions - Improve error handling and cleanup in StaticPolicy allocation - Simplify device topology initialization and allocation logic
b361407 to
7ea9a29
Compare
refactor(gpu): restructure device plugin and resource management
introduce a new strategy framework for GPU allocation with filtering, sorting and binding components add helper functions for GPU memory and device allocation remove redundant checks and simplify allocation logic
restructure gpu allocation strategy into separate packages for better maintainability. move filtering, sorting and binding strategies to dedicated directories and implement unified generic allocation strategy. update manager to use new strategy structure and rename default strategy constant
Convert public strategy fields to private and provide getter/setter methods to maintain encapsulation while allowing controlled access to the strategies
Introduce DeviceAffinityGroup field to DeviceInfo struct to support device affinity grouping with priority levels.
feat(gpu): implement strategy-based GPU allocation framework
feat: implement device affinity strategy
feat(npu): develop device affinity binding and filtering strategies
f4ff416 to
5e84d3a
Compare
… allocation feat: when device affinity of first priority is unable to decide allocation, go to next priority to allocate feat: when device affinity of first priority is unable to decide allocation, go to next priority to allocate feat: when device affinity of first priority is unable to decide allocation, go to next priority to allocate feat: when device affinity of first priority is unable to decide allocation, go to next priority to allocate feat: when device affinity of first priority is unable to decide allocation, go to next priority to allocate feat: when device affinity of first priority is unable to decide allocation, go to next priority to allocate fix: simplify logic of unallocated devices and change name of field
feat(gpu): implement device affinity binding strategy
- introduce DefaultResourceStateGeneratorRegistry for resource state generation - add SetResourceState method to state interface - move strategy registry to separate package - enhance GenericAllocationStrategy with dynamic strategy selection - update device topology registry with thread-safe operations - consolidate GPU and RDMA device plugin initialization - improve state checkpoint handling with resource state generators - add custom strategy configuration options
refractor gpu plugin state and allocation strategy manager
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What type of PR is this?
What this PR does / why we need it:
Which issue(s) this PR fixes:
Special notes for your reviewer: