[P2P] Move lease creation from model-agent to controller #484

slin1237 · 2026-01-02T18:16:54Z

Move P2P coordination lease creation from model-agent to the basemodel
controller to eliminate race conditions when multiple agents try to
create/acquire the same lease simultaneously.

Changes:

Controller now creates P2P leases using resource UUID as lease name
Model-agent only acquires existing leases (no longer creates them)
Updated RBAC: controller gets create/delete, agent gets only get/update
Lease cleanup added to model deletion flow

This ensures only one lease is created per model (by controller) and
agents simply race to acquire it, reducing API pressure from 13+ agents
all trying to create the same lease.

Add BitTorrent library dependency (anacrolix/torrent) and define constants for P2P model distribution: - Lease coordination constants (prefix, labels, durations) - Default configuration values (ports, rates, timeouts) - Environment variable keys for P2P configuration

Introduce the P2P distributor package with: - Config struct with validation and defaults - ConfigFromEnv for environment-based configuration - Comprehensive Prometheus metrics for P2P operations - Download metrics (total, duration, failures) - Peer discovery and connection metrics - Lease and seeding metrics - Metainfo server metrics

Implement ModelDistributor for P2P model distribution: - BitTorrent client management with rate limiting - Peer discovery via Kubernetes headless service DNS - Metainfo fetching from peers for torrent coordination - Model seeding and download operations - Active torrent tracking with proper cleanup Fix API compatibility with anacrolix/torrent v1.57.1: - Use bencode.Marshal instead of info.MarshalBencode - Use t.Complete().Bool() instead of t.Complete.Bool - Handle PeerRemoteAddr type assertion for peer addresses

Implement MetainfoServer to enable peers to discover available models: - GET /metainfo/{modelHash} - serve torrent metainfo - GET /health - health check endpoint - GET /stats - P2P distribution statistics - GET /models - list available models with seeding status - Graceful shutdown support Fix API compatibility with anacrolix/torrent v1.57.1: - Add exists() helper function - Use bencode.Marshal instead of info.MarshalBencode

Add comprehensive tests for the distributor package: - Config validation tests (valid, missing fields, invalid ports) - ConfigWithDefaults tests - Metrics recording tests - Stats struct tests - Test helper functions for integration tests Fix test config to include required LeaseDurationSeconds field.

Implement P2PLeaseManager for coordinating model downloads: - Lease acquisition with expired lease takeover - Lease renewal for long-running downloads - Complete/release lifecycle management - Ensures only one node downloads from HuggingFace Tests cover: - Lease acquisition (new, existing, expired) - Lease expiration detection - Lease name generation with hash truncation - Renewal and holder verification

Integrate P2P model distribution into the Gopher download workflow: - Add P2P fields to Gopher struct (distributor, lease manager, timeout) - EnableP2P() and SetP2PTimeout() configuration methods - computeModelHash() for consistent model identification - downloadWithP2P() orchestrates P2P-first download strategy - downloadWithLeaseHeld() handles HF download with lease coordination - waitForP2PAvailability() with exponential backoff for waiting nodes - startSeeding() begins seeding after successful download Flow: Check P2P peers → Try P2P download → Acquire lease → HF download → Seed

Add deployment configuration for P2P model distribution: - Headless Service for peer discovery via DNS - DaemonSet with P2P-enabled model-agent - Documentation with architecture overview and usage instructions

Integrate P2P model distribution into the model-agent entry point: - Add P2P configuration fields (ports, rates, encryption, timeout) - Read P2P settings from environment variables - Create ModelDistributor when P2P_ENABLED=true - Create P2PLeaseManager for download coordination - Call gopher.EnableP2P() to activate P2P download flow - Start MetainfoServer for peer discovery - Add graceful shutdown for P2P resources Environment variables: - P2P_ENABLED: Enable/disable P2P (default: false) - PEERS_SERVICE: Headless service DNS for peer discovery - P2P_TORRENT_PORT: BitTorrent port (default: 6881) - P2P_METAINFO_PORT: Metainfo HTTP port (default: 8081) - P2P_MAX_DOWNLOAD_RATE: Max download rate in bytes/s - P2P_MAX_UPLOAD_RATE: Max upload rate in bytes/s - P2P_ENCRYPTION_ENABLED: Enable BitTorrent encryption - P2P_DOWNLOAD_TIMEOUT: P2P download timeout in seconds

- Add dedicated deleteTaskChan for delete tasks to ensure deletions are never blocked by downloads (even with 100 concurrent downloads) - Scout sends delete tasks to deleteChan instead of gopherChan - Gopher runs dedicated runDeleteWorker for immediate delete processing - Skip download if already seeding the model (prevents infinite loop when update events trigger DownloadOverride for already-downloaded models)

- Add --p2p-* CLI flags to model-agent (replaces P2P_* env vars) - Fix pod restart: check if model exists on disk before P2P/lease logic - Update daemonset to use CLI args instead of env vars - Increase default rate limits from 500 MB/s to 2 GB/s - Remove unused ConfigFromEnv() and P2P env var constants - Update README documentation

Move P2P coordination lease creation from model-agent to the basemodel controller to eliminate race conditions when multiple agents try to create/acquire the same lease simultaneously. Changes: - Controller now creates P2P leases using resource UUID as lease name - Model-agent only acquires existing leases (no longer creates them) - Updated RBAC: controller gets create/delete, agent gets only get/update - Lease cleanup added to model deletion flow This ensures only one lease is created per model (by controller) and agents simply race to acquire it, reducing API pressure from 13+ agents all trying to create the same lease.

gemini-code-assist · 2026-01-02T18:18:03Z

Summary of Changes

Hello @slin1237, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the P2P model distribution mechanism by centralizing the creation and management of coordination leases within the basemodel controller. This change aims to eliminate race conditions and significantly reduce the load on the Kubernetes API server by streamlining how model-agents interact with leases. It also introduces a robust BitTorrent-based P2P distribution system, complete with peer discovery, metainfo serving, and comprehensive monitoring, to optimize large model file transfers across the cluster.

Highlights

P2P Lease Creation Shift: The responsibility for creating P2P coordination leases has been moved from the model-agent to the basemodel controller. This change centralizes lease creation, ensuring that only one lease is generated per model, which helps prevent race conditions.
Model-Agent Role Update: Model-agents will now only acquire existing P2P leases and no longer create them. This reduces the API pressure on the Kubernetes API server, as multiple agents will not simultaneously attempt to create the same lease.
RBAC Adjustments: Role-Based Access Control (RBAC) permissions have been updated. The controller now has create and delete permissions for leases resources, while the model-agent is restricted to get, list, update, and patch permissions.
Lease Cleanup on Model Deletion: A mechanism has been added to ensure that P2P leases are properly cleaned up when a model is deleted, preventing orphaned lease resources.
P2P Distribution Implementation: New components for P2P model distribution using BitTorrent have been introduced, including a ModelDistributor for peer discovery and metainfo handling, a MetainfoServer to serve torrent files, and Prometheus metrics for monitoring P2P activity.
Dedicated Delete Worker: A dedicated worker channel (deleteTaskChan) has been added to the Gopher in the model-agent to process deletion tasks immediately, preventing them from being blocked by long-running download operations.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This is a significant and well-executed change that introduces a P2P model distribution mechanism to alleviate pressure on HuggingFace and improve download times in a cluster. The core logic is robust, handling various edge cases like pod restarts, lease holder crashes, and download cancellations. Moving lease creation to the controller and having agents only acquire them is a solid architectural decision that correctly addresses the race condition problem. The introduction of a dedicated channel for delete tasks is another great improvement that enhances responsiveness. The new distributor package is well-designed, with good optimizations like parallel hashing and metainfo caching. The documentation and tests are also comprehensive. I have a few suggestions to further improve the code, mostly around configuration and removing dead code.

gemini-code-assist · 2026-01-02T18:20:23Z

cmd/model-agent/main.go

+			LeaseDurationSeconds:      120, // 2 minutes
+			LeaseRenewIntervalSeconds: 30,  // renew every 30 seconds


The lease duration and renewal interval are currently hardcoded. It would be beneficial to make these values configurable via command-line flags, similar to other P2P settings. This would provide greater flexibility for tuning the P2P coordination behavior in different cluster environments without requiring code changes.

gemini-code-assist · 2026-01-02T18:20:23Z

pkg/controller/v1beta1/basemodel/controller.go

+// ensureP2PLease creates a P2P coordination lease for the model if it doesn't exist.
+// The lease is used to coordinate which node downloads from HuggingFace while others
+// wait for P2P availability. Using the resource UID ensures a unique, stable lease name.
+func ensureP2PLease(ctx context.Context, kubeClient client.Client, log logr.Logger, uid types.UID, ownerName string, isClusterScope bool) error {


The isClusterScope parameter in the ensureP2PLease function signature is not used within the function body. To improve code clarity and simplify the function, this parameter should be removed from both the function definition and its call sites.

Suggested change

func ensureP2PLease(ctx context.Context, kubeClient client.Client, log logr.Logger, uid types.UID, ownerName string, isClusterScope bool) error {

func ensureP2PLease(ctx context.Context, kubeClient client.Client, log logr.Logger, uid types.UID, ownerName string) error {

gemini-code-assist · 2026-01-02T18:20:23Z

pkg/modelagent/gopher.go

+// runDeleteWorker is a dedicated worker for processing delete tasks.
+// This worker runs separately from download workers to ensure deletions
+// are never blocked by downloads (even with 100 concurrent downloads).
+// Cancellation of active downloads is handled inside processTask.
+func (s *Gopher) runDeleteWorker() {
+	for {
+		select {
+		case task, ok := <-s.deleteChan:
+			if ok {
+				err := s.processTask(task)
+				if err != nil {
+					s.logger.Errorf("Delete task failed with error: %s", err.Error())
+				}
+			} else {
+				s.logger.Info("delete channel closed, delete worker exits.")
+				return
+			}
+		default:
+			time.Sleep(500 * time.Millisecond)
+		}
+	}
+}


This dedicated worker for delete tasks is a great architectural improvement. It ensures that model deletions are processed promptly and are not blocked by long-running download tasks, which significantly improves the responsiveness and reliability of the cleanup process. The separation of concerns between download and delete workers is excellent.

gemini-code-assist · 2026-01-02T18:20:23Z

pkg/modelagent/gopher.go

+// downloadWithP2P orchestrates the model download with P2P support.
+// The flow is:
+//  1. If already seeding (in-memory state) → skip, we have it
+//  2. Try to acquire lease first (determines if we're the HF downloader)
+//  3. If lease acquired → download from HF (handles partial/resume) → start seeding
+//  4. If lease not acquired → wait for P2P from lease holder
+//  5. If P2P wait fails → fallback to HF download
+//
+// Key insight: The lease holder ALWAYS downloads from HF because HF handles
+// partial file recovery and resume. We don't skip HF based on files existing
+// on disk because they might be incomplete from an interrupted download.
+func (s *Gopher) downloadWithP2P(ctx context.Context, task *GopherTask, baseModelSpec v1beta1.BaseModelSpec,
+	hfComponents *storage.HuggingFaceStorageComponents, destPath, modelHash, modelInfo, modelType, namespace, name string) error {
+
+	// If P2P is not enabled, go directly to HuggingFace
+	if !s.p2pEnabled || s.p2pDistributor == nil {
+		return s.downloadFromHuggingFace(ctx, task, baseModelSpec, hfComponents, destPath, modelInfo, modelType, namespace, name)
+	}
+
+	// Check if we're already seeding this model - means we already have it locally
+	// and verified (seeding state is only set after successful download + metainfo creation).
+	// This prevents re-downloading a model we just downloaded (e.g., from update events).
+	if s.p2pDistributor.IsSeeding(modelHash) {
+		s.logger.Infof("Already seeding model %s (hash: %s), skipping download", modelInfo, modelHash[:16])
+		return nil
+	}
+
+	// Check ConfigMap - if model is marked Ready, files are complete (pod restart recovery).
+	// ConfigMap is only updated to Ready AFTER successful download, so this is safe.
+	// We still need to start seeding since in-memory state was lost on restart.
+	if s.configMapReconciler.IsModelReady(task.BaseModel, task.ClusterBaseModel) {
+		if stat, err := os.Stat(destPath); err == nil && stat.IsDir() {
+			s.logger.Infof("Model %s marked Ready in ConfigMap (pod restart recovery), starting seeding", modelInfo)
+			s.startSeeding(destPath, modelHash, modelInfo)
+			return nil
+		}
+		// ConfigMap says Ready but files don't exist - this is unexpected, proceed with download
+		s.logger.Warnf("Model %s marked Ready in ConfigMap but files not found at %s, will re-download", modelInfo, destPath)
+	}

-			s.markModelOnNodeFailed(task)
+	// Check if model already exists on disk - handles cases where ConfigMap cache is not populated
+	// (e.g., pod restart before cache was synced). This is a safety net for restart recovery.
+	// We check the hash directory path to see if it exists (either as symlink or directory).
+	// os.Stat follows symlinks, so IsDir() will be true if hash path is a symlink to a directory.
+	hashPath := filepath.Join(s.modelRootDir, modelHash)
+	if stat, err := os.Stat(hashPath); err == nil && stat.IsDir() {
+		s.logger.Infof("Model %s already exists at hash path %s (disk recovery), starting seeding", modelInfo, hashPath)
+		// Use destPath for seeding to maintain consistency with the rest of the codebase
+		s.startSeeding(destPath, modelHash, modelInfo)
+		return nil
+	}
+
+	s.logger.Infof("P2P enabled for model %s (hash: %s), acquiring lease", modelInfo, modelHash[:16])
+
+	// Step 1: Try to acquire lease FIRST
+	// The lease determines who downloads from HuggingFace vs who waits for P2P.
+	// We acquire lease before checking for peers because:
+	// - Lease holder always downloads from HF (handles partial recovery)
+	// - Non-lease holders wait for P2P from the lease holder
+	if s.p2pLeaseManager == nil {
+		// No lease manager, just download directly from HF
+		s.logger.Infof("No lease manager configured, downloading directly from HuggingFace for model %s", modelInfo)
+		if err := s.downloadFromHuggingFace(ctx, task, baseModelSpec, hfComponents, destPath, modelInfo, modelType, namespace, name); err != nil {
 			return err
 		}
+		if ctx.Err() != nil {
+			return ctx.Err()
+		}
+		s.startSeeding(destPath, modelHash, modelInfo)
+		return nil
+	}

-		s.logger.Infof("Successfully downloaded HuggingFace model %s to %s",
-			modelInfo, downloadPath)
+	// Use resource UID for lease name (matches what controller creates)
+	resourceUID := getModelUID(task)
+	leaseName := constants.GetP2PLeaseName(types.UID(resourceUID))
+	acquired, err := s.p2pLeaseManager.TryAcquire(ctx, leaseName)
+	if err != nil {
+		s.logger.Warnf("Failed to acquire P2P lease for model %s: %v, will try P2P or fallback", modelInfo, err)
 	}

-	// Parse model config and update ConfigMap
-	var baseModel *v1beta1.BaseModel
-	var clusterBaseModel *v1beta1.ClusterBaseModel
+	// Step 2: If we acquired the lease, we're the designated HF downloader
+	if acquired {
+		s.logger.Infof("Acquired lease for model %s, downloading from HuggingFace", modelInfo)
+		return s.downloadWithLeaseHeld(ctx, task, baseModelSpec, hfComponents, destPath, modelHash, modelInfo, modelType, namespace, name, leaseName)
+	}

-	if task.BaseModel != nil {
-		baseModel = task.BaseModel
-		s.logger.Debugf("Using BaseModel %s/%s for config parsing", baseModel.Namespace, baseModel.Name)
-	} else if task.ClusterBaseModel != nil {
-		clusterBaseModel = task.ClusterBaseModel
-		s.logger.Debugf("Using ClusterBaseModel %s for config parsing", clusterBaseModel.Name)
+	// Step 3: Lease held by another node - first check if P2P is already available
+	// (the lease holder might have already finished downloading and started seeding)
+	if s.p2pDistributor.HasPeers(ctx, modelHash) {
+		s.logger.Infof("Peers found for model %s, attempting P2P download", modelInfo)
+		if err := s.p2pDistributor.TryP2PDownload(ctx, modelHash, destPath, s.p2pTimeout); err == nil {
+			if ctx.Err() != nil {
+				return ctx.Err()
+			}
+			s.logger.Infof("Successfully downloaded model %s via P2P", modelInfo)
+			// Start seeding so we can serve other nodes
+			s.startSeeding(destPath, modelHash, modelInfo)
+			return nil
+		}
+		s.logger.Warnf("P2P download failed for model %s: %v, will wait for P2P availability", modelInfo, err)
 	}

-	if err := s.safeParseAndUpdateModelConfig(destPath, baseModel, clusterBaseModel, shaStr); err != nil {
-		s.logger.Errorf("Failed to parse and update model config: %v", err)
+	// Step 4: Wait for P2P availability from the lease holder
+	s.logger.Infof("Lease held by another node for model %s, waiting for P2P availability", modelInfo)
+	if err := s.waitForP2PAvailability(ctx, task, modelHash, modelInfo, leaseName, destPath); err == nil {
+		if ctx.Err() != nil {
+			return ctx.Err()
+		}
+		s.logger.Infof("Model %s downloaded via P2P", modelInfo)
+		// Start seeding so we can serve other nodes
+		s.startSeeding(destPath, modelHash, modelInfo)
+		return nil
+	} else {
+		s.logger.Warnf("Wait for P2P failed for model %s: %v, falling back to HuggingFace download", modelInfo, err)
+	}
+
+	// Step 5: Final fallback - download directly from HuggingFace
+	// This happens when:
+	// - P2P wait timed out (lease holder took too long or crashed)
+	// - Lease expired (holder crashed before completing)
+	if ctx.Err() != nil {
+		return ctx.Err()
+	}
+
+	s.logger.Infof("Fallback: downloading model %s directly from HuggingFace", modelInfo)
+	if err := s.downloadFromHuggingFace(ctx, task, baseModelSpec, hfComponents, destPath, modelInfo, modelType, namespace, name); err != nil {
+		return err
+	}
+
+	if ctx.Err() != nil {
+		return ctx.Err()
+	}
+
+	s.startSeeding(destPath, modelHash, modelInfo)
+	return nil
+}


The P2P download logic is very robust. It correctly handles various scenarios, including pod restarts (by checking IsModelReady and disk state), lease acquisition races, and fallback to direct HuggingFace download if P2P fails. This resilience is crucial for a production system. The flow of trying to acquire a lease first, then deciding whether to download from HF or wait for P2P, is well-designed.

gemini-code-assist · 2026-01-02T18:20:23Z

pkg/modelagent/p2p_lease.go

+// GetLeaseName returns the lease name for a model hash.
+func (m *P2PLeaseManager) GetLeaseName(modelHash string) string {
+	// Truncate hash to fit Kubernetes name constraints
+	if len(modelHash) > 16 {
+		modelHash = modelHash[:16]
+	}
+	return constants.P2PLeasePrefix + modelHash
+}


This GetLeaseName method appears to be dead code, as the lease name is generated in gopher.go using constants.GetP2PLeaseName. Furthermore, its truncation logic is different from what's actually used and could potentially lead to hash collisions. To avoid confusion and prevent future bugs, this unused method should be removed.

gemini-code-assist · 2026-01-02T18:20:24Z

pkg/modelagent/p2p_lease.go

+	renewCtx, cancel := context.WithCancel(ctx)
+
+	go func() {
+		ticker := time.NewTicker(time.Duration(constants.P2PDefaultLeaseRenewSeconds) * time.Second)


The lease renewal interval is hardcoded using a constant. For better configurability and to keep the P2PLeaseManager self-contained, consider adding a leaseRenewInterval field to the struct, initialized from the distributor's configuration. This would be consistent with how leaseDurationSeconds is handled.

gemini-code-assist · 2026-01-02T18:20:24Z

config/model-agent/README.md

+1. **Pod starts**: Model-agent initializes P2P distributor
+2. **Model request**: Scout detects new BaseModel/ClusterBaseModel
+3. **Check local**: If model exists on hostPath, seed it
+4. **Try P2P**: Query peers for model via metainfo HTTP
+5. **Lease coordination**: If no peers have it, try to acquire lease
+6. **HF download**: Lease holder downloads from HuggingFace
+7. **Seed**: Downloaded model is seeded to other nodes
+8. **Complete**: All nodes have the model


The workflow described in the 'How It Works' section does not fully match the implementation in gopher.go. The implementation first attempts to acquire a lease before checking for P2P peers, whereas the documentation suggests the opposite. To avoid confusion, please update this section to reflect the actual execution order:

Pod starts

Model request detected

Check local (if already seeding or files exist)

Try to acquire lease

If lease acquired: Download from HuggingFace, then seed.

If lease not acquired: Try to download from P2P peers. If peers are not ready, wait for them. If P2P fails, fallback to HuggingFace download.

pkg/distributor/distributor.go

+// Returns nil, nil if file doesn't exist.
+func (d *ModelDistributor) LoadMetainfoFromFile(modelHash string) (*metainfo.MetaInfo, error) {
+	filePath := d.metainfoFilePath(modelHash)
+	f, err := os.Open(filePath)


In general, the fix is to validate or constrain the user-controlled modelHash before using it to construct a filesystem path. There are two common approaches: (1) enforce that modelHash is a single “filename-like” component (no path separators or ..), or (2) after joining it with the base directory, resolve the absolute path and verify that it still lies within that base directory. Here, the cleanest, least invasive fix is to ensure the value used in the path is a single component that cannot traverse directories.

The best targeted fix without changing existing functionality is to (a) validate modelHash in handleMetainfo so it cannot contain /, \, or .., and (b) defensively constrain how metainfoFilePath uses modelHash by stripping any directory components and using only the final base name. This preserves current behavior for legitimate hash strings (which do not contain separators), but prevents an attacker-supplied path from escaping d.dataDir. Concretely:

In pkg/distributor/metainfo_server.go, after computing modelHash from r.URL.Path, add a check that rejects any value containing /, \, or .. with 400 Bad Request.

In pkg/distributor/distributor.go, update metainfoFilePath to normalize modelHash to its last path element using filepath.Base before joining with d.dataDir. This ensures that even if other callers ever pass a path-like string, only the last component is used in the filename.

No new imports are required: both files already import path/filepath and strings.

slin1237 added 12 commits December 30, 2025 20:10

feat(p2p): add Kubernetes deployment manifests for P2P distribution

565ed32

Add deployment configuration for P2P model distribution: - Headless Service for peer discovery via DNS - DaemonSet with P2P-enabled model-agent - Documentation with architecture overview and usage instructions

slin1237 requested review from CatherineSue, XinyueZhang369, beiguo218 and pallasathena92 as code owners January 2, 2026 18:16

github-actions bot added documentation Documentation changes helm Helm chart changes rbac RBAC configuration changes model-agent Model agent changes controller Controller changes tests Test changes config Configuration changes dependencies Dependency updates labels Jan 2, 2026

gemini-code-assist bot reviewed Jan 2, 2026

View reviewed changes

github-advanced-security bot found potential problems Jan 2, 2026

View reviewed changes

@@ -517,7 +517,8 @@
             // metainfoFilePath returns the path to the cached .torrent file for a model.
             func (d *ModelDistributor) metainfoFilePath(modelHash string) string {
-            	return filepath.Join(d.dataDir, modelHash+".torrent")
+            	safeName := filepath.Base(modelHash)
+            	return filepath.Join(d.dataDir, safeName+".torrent")
             }
             // saveMetainfoToFile caches the metainfo to a .torrent file for fast serving.

@@ -75,8 +75,11 @@
             	modelHash := strings.TrimPrefix(r.URL.Path, "/metainfo/")
             	modelHash = filepath.Clean(modelHash)
-            	if modelHash == "" || modelHash == "." {
-            		http.Error(w, "Model hash required", http.StatusBadRequest)
+            	// Ensure modelHash is a single path component without traversal
+            	if modelHash == "" || modelHash == "." ||
+            		strings.Contains(modelHash, "/") || strings.Contains(modelHash, "\\") ||
+            		strings.Contains(modelHash, "..") {
+            		http.Error(w, "Invalid model hash", http.StatusBadRequest)
             		return
             	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[P2P] Move lease creation from model-agent to controller #484

[P2P] Move lease creation from model-agent to controller #484

Uh oh!

slin1237 commented Jan 2, 2026

Uh oh!

gemini-code-assist bot commented Jan 2, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 2, 2026

Uh oh!

gemini-code-assist bot Jan 2, 2026

Uh oh!

gemini-code-assist bot Jan 2, 2026

Uh oh!

gemini-code-assist bot Jan 2, 2026

Uh oh!

gemini-code-assist bot Jan 2, 2026

Uh oh!

gemini-code-assist bot Jan 2, 2026

Uh oh!

gemini-code-assist bot Jan 2, 2026

Uh oh!

Check failure

Copilot Autofix

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		LeaseDurationSeconds: 120, // 2 minutes
		LeaseRenewIntervalSeconds: 30, // renew every 30 seconds

	func ensureP2PLease(ctx context.Context, kubeClient client.Client, log logr.Logger, uid types.UID, ownerName string, isClusterScope bool) error {
	func ensureP2PLease(ctx context.Context, kubeClient client.Client, log logr.Logger, uid types.UID, ownerName string) error {

[P2P] Move lease creation from model-agent to controller #484

Are you sure you want to change the base?

[P2P] Move lease creation from model-agent to controller #484

Uh oh!

Conversation

slin1237 commented Jan 2, 2026

Uh oh!

gemini-code-assist bot commented Jan 2, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

Check failure

Uh oh!

Copilot Autofix

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants