Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/headers/LICENSE
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down
13 changes: 13 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,13 @@ code-quality-report.json
go.work
go.work.sum

# Local tool binaries (managed by api/Makefile)
api/bin/*

# Server binary output
bin/
/device-api-server

# ==============================================================================
# IDE & Editor Configurations
# ==============================================================================
Expand All @@ -48,3 +55,9 @@ go.work.sum
# Emacs
*~
\#*\#


# ==============================================================================
# Git Worktrees
# ==============================================================================
.worktrees/
4 changes: 2 additions & 2 deletions .versions.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
# Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -34,6 +34,6 @@ go_tools:

# Protocol Buffers / gRPC
protobuf:
protobuf: 'v33.0'
protobuf: 'v33.4'
protoc_gen_go: 'v1.36.10'
protoc_gen_go_grpc: 'v1.5.1'
132 changes: 122 additions & 10 deletions DEVELOPMENT.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,130 @@
# Development Guide
# NVIDIA Device API: Development Guide

This guide covers the development setup and workflows for contributing to the NVIDIA Device API.

## Module Structure

This repository is a multi-module monorepo containing multiple Go modules:

| Module | Path | Description |
|--------|------|-------------|
| `github.com/nvidia/nvsentinel` | `/` | Device API Server implementation |
| `github.com/nvidia/nvsentinel/api` | `/api` | API definitions (protobuf and Go types) |
| `github.com/nvidia/nvsentinel/client-go` | `/client-go` | Kubernetes-style gRPC clients |
| `github.com/nvidia/nvsentinel/code-generator` | `/code-generator` | Code generation tools |

The API module is designed to be imported independently by consumers who only need the type definitions.

## Architecture

This project bridges **gRPC** (for node-local performance) with **Kubernetes API Machinery** (for developer experience).

1. **Definitions**: `api/proto` (Wire format) and `api/device` (Go types).
2. **Conversion**: `api/device/${version}/converter.go` maps gRPC messages to K8s-style structs.
3. **Generation**: A pipeline driven by `code-generator/kube_codegen.sh`, which utilizes a modified `client-gen` to produce gRPC-backed Kubernetes clients in the `client-go` module.

---

## Code Generation Pipeline

The NVIDIA Device API uses a multi-stage pipeline to bridge gRPC with Kubernetes API machinery. For module-specific details, see the [client-go Development Guide](./client-go/DEVELOPMENT.md).

```mermaid
graph TD
API["API Definitions<br/>(nvidia/nvsentinel/api)"] -->|Input| CG(client-gen<br/>*Custom Build*)
API -->|Input| LG(lister-gen)

CG -->|Generates| CLIENT[client/versioned]
LG -->|Generates| LISTERS[listers/]

CLIENT & LISTERS -->|Input| IG(informer-gen)
IG -->|Generates| INFORMERS[informers/]

CLIENT & LISTERS & INFORMERS -->|Final Output| SDK[Ready-to-use SDK]
```

### Build Sequence

When you run `make code-gen` from the root, the following sequence is executed:

1. **Protoc**: Compiles `.proto` into Go gRPC stubs in `api/gen/`.
2. **DeepCopy**: Generates `runtime.Object` methods required for K8s compatibility.
3. **Goverter**: Generates type conversion logic between Protobuf and Go structs.
4. **Custom client-gen**: Orchestrated by `code-generator/kube_codegen.sh` to produce the versioned Clientset, Informers, and Listers in `client-go/`.

---

## Development Workflow

1. **Modify**: Edit the Protobuf definitions in `api/proto` or Go types in `api/device`.
2. **Update**: Update the conversion logic in `api/device/${version}/converter.go` to handle changes, if necessary.
3. **Generate**: Run `make code-gen` from the root. This updates the gRPC stubs, helper methods, and the `client-go` SDK.
4. **Verify**: Run `make verify-codegen` to ensure the workspace is consistent.
5. **Test**: Add tests to the affected module and run `make test` from the root.

> [!NOTE] Use the fake clients in `client-go/client/versioned/fake` for testing controllers without a real gRPC server.

---

## Code Standards & Compliance

### Commit Messages & Signing (DCO)

We follow the [Conventional Commits](https://www.conventionalcommits.org) specification. Additionally, all commits **must** be signed off to comply with the Developer Certificate of Origin (DCO).

```bash
# Example: feat, fix, docs, chore, refactor
git commit -s -m "feat: add new GPU condition type"
```

### License Headers

Every source file (.go, .proto, .sh, Makefile) must include the Apache 2.0 license header.

- **Go/Proto Template**: See `api/hack/boilerplate.go.txt`.
- **Year**: Ensure the copyright year is current.

---

## Code Generation
## Troubleshooting

This project relies heavily on generated code to ensure consistency with the Kubernetes API machinery.
### Tooling Not Found

We use `.versions.yaml` to pin tool versions. Our Makefile attempts to use tools from your system path or download them to your Go bin directory.

- **Verify Installation**: `which protoc` or `which yq`.
- **Fix**: Ensure your `GOPATH/bin` is in your system `$PATH`:
```bash
export PATH=$PATH:$(go env GOPATH)/bin
```

### Generated Code Out of Sync

If the build fails or `make verify-codegen` returns an error, your generated artifacts are likely stale.

```bash
# Clean all generated files across the monorepo
make clean

# Re-run the full pipeline
make code-gen
```

### Dependency Issues

If you see "module not found" or checksum errors:

```bash
# Tidy all modules
make tidy
```

---

### Generation Pipeline
The `make code-gen` command orchestrates several tools:
## Getting Help

1. **Protoc**: Generates gRPC Go bindings from `api/proto`.
2. **Goverter**: Generates type-safe conversion logic between internal gRPC types and the Kubernetes-style API types defined in `api/device/`.
3. **K8s Code-Gen**:
- Generates `DeepCopy` methods for API types to support standard Kubernetes object manipulation.
- Generates a versioned, typed **clientset**, along with **listers** and **informers**, providing a native `client-go` experience for consumers.
- **Issues**: [Create an issue](https://github.com/NVIDIA/device-api/issues/new)
- **Questions**: [Start a discussion](https://github.com/NVIDIA/device-api/discussions)
- **Security**: Please refer to [SECURITY](SECURITY.md) for reporting vulnerabilities.

---
170 changes: 142 additions & 28 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
# Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand All @@ -21,14 +21,28 @@
SHELL = /usr/bin/env bash -o pipefail
.SHELLFLAGS = -ec

VERSION_PKG = github.com/nvidia/nvsentinel/pkg/util/version
GIT_VERSION := $(shell git describe --tags --always --dirty)
GIT_COMMIT := $(shell git rev-parse HEAD)
BUILD_DATE := $(shell date -u +'%Y-%m-%dT%H:%M:%SZ')

LDFLAGS := -X $(VERSION_PKG).GitVersion=$(GIT_VERSION) \
-X $(VERSION_PKG).GitCommit=$(GIT_COMMIT) \
-X $(VERSION_PKG).BuildDate=$(BUILD_DATE)
# Go build settings
GOOS ?= $(shell go env GOOS)
GOARCH ?= $(shell go env GOARCH)
VERSION ?= $(shell git describe --tags --always --dirty 2>/dev/null || echo "dev")
GIT_COMMIT ?= $(shell git rev-parse --short HEAD 2>/dev/null || echo "unknown")
GIT_TREE_STATE ?= $(shell if git diff --quiet 2>/dev/null; then echo "clean"; else echo "dirty"; fi)
BUILD_DATE ?= $(shell date -u +"%Y-%m-%dT%H:%M:%SZ")

# Version package path for ldflags
VERSION_PKG = github.com/nvidia/nvsentinel/pkg/version

# Container settings
CONTAINER_RUNTIME ?= docker
IMAGE_REGISTRY ?= ghcr.io/nvidia/nvsentinel
DOCKERFILE := deployments/container/Dockerfile

# Linker flags
LDFLAGS = -s -w \
-X $(VERSION_PKG).Version=$(VERSION) \
-X $(VERSION_PKG).GitCommit=$(GIT_COMMIT) \
-X $(VERSION_PKG).GitTreeState=$(GIT_TREE_STATE) \
-X $(VERSION_PKG).BuildDate=$(BUILD_DATE)

# ==============================================================================
# Targets
Expand Down Expand Up @@ -59,34 +73,134 @@ verify-codegen: code-gen ## Verify generated code is up-to-date.
exit 1; \
fi

.PHONY: tidy
tidy: ## Run go mod tidy
go mod tidy

##@ Build & Test
##@ Build

.PHONY: build
build: ## Build the device-apiserver binary.
go build -ldflags "$(LDFLAGS)" -o bin/device-apiserver ./cmd/device-apiserver
build: build-modules build-server ## Build all modules and server.

.PHONY: build-modules
build-modules: ## Build all modules.
@for mod in $(MODULES); do \
if [ -f $$mod/Makefile ]; then \
$(MAKE) -C $$mod build; \
fi \
done
Comment on lines +83 to +87
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for MODULES definition in the repository
rg -n "^MODULES\s*[:?]?=" --type make

Repository: NVIDIA/NVSentinel

Length of output: 43


🏁 Script executed:

# Find and examine the Makefile
find . -name "Makefile" -type f | head -20

Repository: NVIDIA/NVSentinel

Length of output: 71


🏁 Script executed:

# Get more context - search for MODULES definition with various patterns
rg -n "MODULES" --type make | head -50

Repository: NVIDIA/NVSentinel

Length of output: 274


🏁 Script executed:

cat -n Makefile | head -100

Repository: NVIDIA/NVSentinel

Length of output: 4087


🏁 Script executed:

# Check the entire Makefile structure
wc -l Makefile

Repository: NVIDIA/NVSentinel

Length of output: 73


🏁 Script executed:

# Look at lines around 83-87 and nearby content
sed -n '70,100p' Makefile

Repository: NVIDIA/NVSentinel

Length of output: 855


🏁 Script executed:

# Check for MODULES definition throughout the entire file
sed -n '1,206p' Makefile | grep -n "MODULES\|^[A-Z_]*\s*[:?]=" | head -30

Repository: NVIDIA/NVSentinel

Length of output: 754


🏁 Script executed:

# Specifically look for any variable assignments in the config section and beyond
sed -n '18,50p' Makefile

Repository: NVIDIA/NVSentinel

Length of output: 1234


MODULES variable is not defined.

The targets build-modules, test-modules, lint, clean, and tidy all iterate over $(MODULES), but this variable is not defined in the Makefile. This will cause these targets to silently do nothing.

🐛 Proposed fix to define MODULES

Add the MODULES definition in the Configuration section:

 # Go build settings
 GOOS ?= $(shell go env GOOS)
 GOARCH ?= $(shell go env GOARCH)
+
+# Modules with their own Makefiles
+MODULES ?= cmd/nvml-provider
+
 VERSION ?= $(shell git describe --tags --always --dirty 2>/dev/null || echo "dev")
🤖 Prompt for AI Agents
In `@Makefile` around lines 83 - 87, The Makefile is missing a MODULES variable so
targets like build-modules, test-modules, lint, clean, and tidy iterate over an
empty list; add a MODULES definition in the Configuration section (e.g. a
variable named MODULES or MODULE_DIRS) that enumerates the submodule directories
to operate on (or derive it dynamically with shell globbing), so the for-loop in
the recipes (the block using $(MODULES) and $(MAKE) -C $$mod ...) actually runs;
update the MODULES name if different and ensure all targets reference this same
variable.


.PHONY: build-server
build-server: ## Build the Device API Server
@echo "Building device-api-server..."
@mkdir -p bin
CGO_ENABLED=0 GOOS=$(GOOS) GOARCH=$(GOARCH) go build \
-ldflags "$(LDFLAGS)" \
-o bin/device-api-server \
./cmd/device-api-server
@echo "Built bin/device-api-server"

.PHONY: build-nvml-provider
build-nvml-provider: ## Build the NVML Provider sidecar (requires CGO)
@echo "Building nvml-provider..."
@mkdir -p bin
CGO_ENABLED=1 GOOS=$(GOOS) GOARCH=$(GOARCH) go build \
-tags=nvml \
-ldflags "$(LDFLAGS)" \
-o bin/nvml-provider \
./cmd/nvml-provider
@echo "Built bin/nvml-provider"

##@ Testing

.PHONY: test
test: ## Run unit tests.
GOTOOLCHAIN=go1.25.5+auto go test -v $$(go list ./... | grep -vE '/pkg/client-go/(client|informers|listers)|/internal/generated/|/test/integration/|/examples/') -cover cover.out
test: test-modules test-server ## Run tests in all modules.

.PHONY: test-modules
test-modules: ## Run tests in all modules.
@for mod in $(MODULES); do \
if [ -f $$mod/Makefile ]; then \
$(MAKE) -C $$mod test; \
fi \
done

.PHONY: test-server
test-server: ## Run server tests only
go test -race -v ./pkg/...

.PHONY: test-integration
test-integration: ## Run integration tests.
test-integration: ## Run integration tests
go test -v ./test/integration/...

##@ Linting

.PHONY: lint
lint: ## Run golangci-lint.
golangci-lint run ./...
lint: ## Run linting on all modules.
@for mod in $(MODULES); do \
if [ -f $$mod/Makefile ]; then \
$(MAKE) -C $$mod lint; \
fi \
done
go vet ./...

##@ Container Images

.PHONY: docker-build
docker-build: docker-build-server docker-build-nvml-provider ## Build all container images

.PHONY: docker-build-server
docker-build-server: ## Build device-api-server container image
$(CONTAINER_RUNTIME) build \
--target device-api-server \
--build-arg VERSION=$(VERSION) \
--build-arg GIT_COMMIT=$(GIT_COMMIT) \
--build-arg GIT_TREE_STATE=$(GIT_TREE_STATE) \
--build-arg BUILD_DATE=$(BUILD_DATE) \
-t $(IMAGE_REGISTRY)/device-api-server:$(VERSION) \
-f $(DOCKERFILE) .

.PHONY: docker-build-nvml-provider
docker-build-nvml-provider: ## Build nvml-provider container image
$(CONTAINER_RUNTIME) build \
--target nvml-provider \
--build-arg VERSION=$(VERSION) \
--build-arg GIT_COMMIT=$(GIT_COMMIT) \
--build-arg GIT_TREE_STATE=$(GIT_TREE_STATE) \
--build-arg BUILD_DATE=$(BUILD_DATE) \
-t $(IMAGE_REGISTRY)/nvml-provider:$(VERSION) \
-f $(DOCKERFILE) .

.PHONY: docker-push
docker-push: ## Push all container images
$(CONTAINER_RUNTIME) push $(IMAGE_REGISTRY)/device-api-server:$(VERSION)
$(CONTAINER_RUNTIME) push $(IMAGE_REGISTRY)/nvml-provider:$(VERSION)

##@ Helm

.PHONY: helm-lint
helm-lint: ## Lint Helm chart
helm lint deployments/helm/device-api-server

.PHONY: helm-template
helm-template: ## Render Helm chart templates
helm template device-api-server deployments/helm/device-api-server

.PHONY: helm-package
helm-package: ## Package Helm chart
@mkdir -p dist/
helm package deployments/helm/device-api-server -d dist/

##@ Cleanup

.PHONY: clean
clean: ## Remove generated artifacts.
@echo "Cleaning generated artifacts..."
clean: ## Clean generated artifacts in all modules.
@for mod in $(MODULES); do \
if [ -f $$mod/Makefile ]; then \
$(MAKE) -C $$mod clean; \
fi \
done
rm -rf bin/
rm -rf internal/generated/
rm -rf pkg/client-go/client/ pkg/client-go/informers/ pkg/client-go/listers/
find api/ -name "zz_generated.deepcopy.go" -delete
find api/ -name "zz_generated.goverter.go" -delete
rm -f cover.out

.PHONY: tidy
tidy: ## Run go mod tidy on all modules.
@for mod in $(MODULES); do \
echo "Tidying $$mod..."; \
(cd $$mod && go mod tidy); \
done
go mod tidy
Loading