-
Notifications
You must be signed in to change notification settings - Fork 47
feat: add device-api-server with NVML fallback provider #720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
ArangoGutierrez
wants to merge
1
commit into
device-api
Choose a base branch
from
device-api-server
base: device-api
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,18 +1,130 @@ | ||
| # Development Guide | ||
| # NVIDIA Device API: Development Guide | ||
|
|
||
| This guide covers the development setup and workflows for contributing to the NVIDIA Device API. | ||
|
|
||
| ## Module Structure | ||
|
|
||
| This repository is a multi-module monorepo containing multiple Go modules: | ||
|
|
||
| | Module | Path | Description | | ||
| |--------|------|-------------| | ||
| | `github.com/nvidia/nvsentinel` | `/` | Device API Server implementation | | ||
| | `github.com/nvidia/nvsentinel/api` | `/api` | API definitions (protobuf and Go types) | | ||
| | `github.com/nvidia/nvsentinel/client-go` | `/client-go` | Kubernetes-style gRPC clients | | ||
| | `github.com/nvidia/nvsentinel/code-generator` | `/code-generator` | Code generation tools | | ||
|
|
||
| The API module is designed to be imported independently by consumers who only need the type definitions. | ||
|
|
||
| ## Architecture | ||
|
|
||
| This project bridges **gRPC** (for node-local performance) with **Kubernetes API Machinery** (for developer experience). | ||
|
|
||
| 1. **Definitions**: `api/proto` (Wire format) and `api/device` (Go types). | ||
| 2. **Conversion**: `api/device/${version}/converter.go` maps gRPC messages to K8s-style structs. | ||
| 3. **Generation**: A pipeline driven by `code-generator/kube_codegen.sh`, which utilizes a modified `client-gen` to produce gRPC-backed Kubernetes clients in the `client-go` module. | ||
|
|
||
| --- | ||
|
|
||
| ## Code Generation Pipeline | ||
|
|
||
| The NVIDIA Device API uses a multi-stage pipeline to bridge gRPC with Kubernetes API machinery. For module-specific details, see the [client-go Development Guide](./client-go/DEVELOPMENT.md). | ||
|
|
||
| ```mermaid | ||
| graph TD | ||
| API["API Definitions<br/>(nvidia/nvsentinel/api)"] -->|Input| CG(client-gen<br/>*Custom Build*) | ||
| API -->|Input| LG(lister-gen) | ||
|
|
||
| CG -->|Generates| CLIENT[client/versioned] | ||
| LG -->|Generates| LISTERS[listers/] | ||
|
|
||
| CLIENT & LISTERS -->|Input| IG(informer-gen) | ||
| IG -->|Generates| INFORMERS[informers/] | ||
|
|
||
| CLIENT & LISTERS & INFORMERS -->|Final Output| SDK[Ready-to-use SDK] | ||
| ``` | ||
|
|
||
| ### Build Sequence | ||
|
|
||
| When you run `make code-gen` from the root, the following sequence is executed: | ||
|
|
||
| 1. **Protoc**: Compiles `.proto` into Go gRPC stubs in `api/gen/`. | ||
| 2. **DeepCopy**: Generates `runtime.Object` methods required for K8s compatibility. | ||
| 3. **Goverter**: Generates type conversion logic between Protobuf and Go structs. | ||
| 4. **Custom client-gen**: Orchestrated by `code-generator/kube_codegen.sh` to produce the versioned Clientset, Informers, and Listers in `client-go/`. | ||
|
|
||
| --- | ||
|
|
||
| ## Development Workflow | ||
|
|
||
| 1. **Modify**: Edit the Protobuf definitions in `api/proto` or Go types in `api/device`. | ||
| 2. **Update**: Update the conversion logic in `api/device/${version}/converter.go` to handle changes, if necessary. | ||
| 3. **Generate**: Run `make code-gen` from the root. This updates the gRPC stubs, helper methods, and the `client-go` SDK. | ||
| 4. **Verify**: Run `make verify-codegen` to ensure the workspace is consistent. | ||
| 5. **Test**: Add tests to the affected module and run `make test` from the root. | ||
|
|
||
| > [!NOTE] Use the fake clients in `client-go/client/versioned/fake` for testing controllers without a real gRPC server. | ||
|
|
||
| --- | ||
|
|
||
| ## Code Standards & Compliance | ||
|
|
||
| ### Commit Messages & Signing (DCO) | ||
|
|
||
| We follow the [Conventional Commits](https://www.conventionalcommits.org) specification. Additionally, all commits **must** be signed off to comply with the Developer Certificate of Origin (DCO). | ||
|
|
||
| ```bash | ||
| # Example: feat, fix, docs, chore, refactor | ||
| git commit -s -m "feat: add new GPU condition type" | ||
| ``` | ||
|
|
||
| ### License Headers | ||
|
|
||
| Every source file (.go, .proto, .sh, Makefile) must include the Apache 2.0 license header. | ||
|
|
||
| - **Go/Proto Template**: See `api/hack/boilerplate.go.txt`. | ||
| - **Year**: Ensure the copyright year is current. | ||
|
|
||
| --- | ||
|
|
||
| ## Code Generation | ||
| ## Troubleshooting | ||
|
|
||
| This project relies heavily on generated code to ensure consistency with the Kubernetes API machinery. | ||
| ### Tooling Not Found | ||
|
|
||
| We use `.versions.yaml` to pin tool versions. Our Makefile attempts to use tools from your system path or download them to your Go bin directory. | ||
|
|
||
| - **Verify Installation**: `which protoc` or `which yq`. | ||
| - **Fix**: Ensure your `GOPATH/bin` is in your system `$PATH`: | ||
| ```bash | ||
| export PATH=$PATH:$(go env GOPATH)/bin | ||
| ``` | ||
|
|
||
| ### Generated Code Out of Sync | ||
|
|
||
| If the build fails or `make verify-codegen` returns an error, your generated artifacts are likely stale. | ||
|
|
||
| ```bash | ||
| # Clean all generated files across the monorepo | ||
| make clean | ||
|
|
||
| # Re-run the full pipeline | ||
| make code-gen | ||
| ``` | ||
|
|
||
| ### Dependency Issues | ||
|
|
||
| If you see "module not found" or checksum errors: | ||
|
|
||
| ```bash | ||
| # Tidy all modules | ||
| make tidy | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ### Generation Pipeline | ||
| The `make code-gen` command orchestrates several tools: | ||
| ## Getting Help | ||
|
|
||
| 1. **Protoc**: Generates gRPC Go bindings from `api/proto`. | ||
| 2. **Goverter**: Generates type-safe conversion logic between internal gRPC types and the Kubernetes-style API types defined in `api/device/`. | ||
| 3. **K8s Code-Gen**: | ||
| - Generates `DeepCopy` methods for API types to support standard Kubernetes object manipulation. | ||
| - Generates a versioned, typed **clientset**, along with **listers** and **informers**, providing a native `client-go` experience for consumers. | ||
| - **Issues**: [Create an issue](https://github.com/NVIDIA/device-api/issues/new) | ||
| - **Questions**: [Start a discussion](https://github.com/NVIDIA/device-api/discussions) | ||
| - **Security**: Please refer to [SECURITY](SECURITY.md) for reporting vulnerabilities. | ||
|
|
||
| --- |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
Repository: NVIDIA/NVSentinel
Length of output: 43
🏁 Script executed:
Repository: NVIDIA/NVSentinel
Length of output: 71
🏁 Script executed:
Repository: NVIDIA/NVSentinel
Length of output: 274
🏁 Script executed:
cat -n Makefile | head -100Repository: NVIDIA/NVSentinel
Length of output: 4087
🏁 Script executed:
# Check the entire Makefile structure wc -l MakefileRepository: NVIDIA/NVSentinel
Length of output: 73
🏁 Script executed:
Repository: NVIDIA/NVSentinel
Length of output: 855
🏁 Script executed:
Repository: NVIDIA/NVSentinel
Length of output: 754
🏁 Script executed:
Repository: NVIDIA/NVSentinel
Length of output: 1234
MODULESvariable is not defined.The targets
build-modules,test-modules,lint,clean, andtidyall iterate over$(MODULES), but this variable is not defined in the Makefile. This will cause these targets to silently do nothing.🐛 Proposed fix to define MODULES
Add the
MODULESdefinition in the Configuration section:🤖 Prompt for AI Agents