Skip to content

Conversation

@slin1237
Copy link
Collaborator

Integrate P2P model distribution into the model-agent entry point:

  • Add P2P configuration fields (ports, rates, encryption, timeout)
  • Read P2P settings from environment variables
  • Create ModelDistributor when P2P_ENABLED=true
  • Create P2PLeaseManager for download coordination
  • Call gopher.EnableP2P() to activate P2P download flow
  • Start MetainfoServer for peer discovery
  • Add graceful shutdown for P2P resources

Environment variables:

  • P2P_ENABLED: Enable/disable P2P (default: false)
  • PEERS_SERVICE: Headless service DNS for peer discovery
  • P2P_TORRENT_PORT: BitTorrent port (default: 6881)
  • P2P_METAINFO_PORT: Metainfo HTTP port (default: 8081)
  • P2P_MAX_DOWNLOAD_RATE: Max download rate in bytes/s
  • P2P_MAX_UPLOAD_RATE: Max upload rate in bytes/s
  • P2P_ENCRYPTION_ENABLED: Enable BitTorrent encryption
  • P2P_DOWNLOAD_TIMEOUT: P2P download timeout in seconds

Add BitTorrent library dependency (anacrolix/torrent) and define
constants for P2P model distribution:
- Lease coordination constants (prefix, labels, durations)
- Default configuration values (ports, rates, timeouts)
- Environment variable keys for P2P configuration
Introduce the P2P distributor package with:
- Config struct with validation and defaults
- ConfigFromEnv for environment-based configuration
- Comprehensive Prometheus metrics for P2P operations
  - Download metrics (total, duration, failures)
  - Peer discovery and connection metrics
  - Lease and seeding metrics
  - Metainfo server metrics
Implement ModelDistributor for P2P model distribution:
- BitTorrent client management with rate limiting
- Peer discovery via Kubernetes headless service DNS
- Metainfo fetching from peers for torrent coordination
- Model seeding and download operations
- Active torrent tracking with proper cleanup

Fix API compatibility with anacrolix/torrent v1.57.1:
- Use bencode.Marshal instead of info.MarshalBencode
- Use t.Complete().Bool() instead of t.Complete.Bool
- Handle PeerRemoteAddr type assertion for peer addresses
Implement MetainfoServer to enable peers to discover available models:
- GET /metainfo/{modelHash} - serve torrent metainfo
- GET /health - health check endpoint
- GET /stats - P2P distribution statistics
- GET /models - list available models with seeding status
- Graceful shutdown support

Fix API compatibility with anacrolix/torrent v1.57.1:
- Add exists() helper function
- Use bencode.Marshal instead of info.MarshalBencode
Add comprehensive tests for the distributor package:
- Config validation tests (valid, missing fields, invalid ports)
- ConfigWithDefaults tests
- Metrics recording tests
- Stats struct tests
- Test helper functions for integration tests

Fix test config to include required LeaseDurationSeconds field.
Implement P2PLeaseManager for coordinating model downloads:
- Lease acquisition with expired lease takeover
- Lease renewal for long-running downloads
- Complete/release lifecycle management
- Ensures only one node downloads from HuggingFace

Tests cover:
- Lease acquisition (new, existing, expired)
- Lease expiration detection
- Lease name generation with hash truncation
- Renewal and holder verification
Integrate P2P model distribution into the Gopher download workflow:
- Add P2P fields to Gopher struct (distributor, lease manager, timeout)
- EnableP2P() and SetP2PTimeout() configuration methods
- computeModelHash() for consistent model identification
- downloadWithP2P() orchestrates P2P-first download strategy
- downloadWithLeaseHeld() handles HF download with lease coordination
- waitForP2PAvailability() with exponential backoff for waiting nodes
- startSeeding() begins seeding after successful download

Flow: Check P2P peers → Try P2P download → Acquire lease → HF download → Seed
Add deployment configuration for P2P model distribution:
- Headless Service for peer discovery via DNS
- DaemonSet with P2P-enabled model-agent
- Documentation with architecture overview and usage instructions
@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@github-actions github-actions bot added documentation Documentation changes model-agent Model agent changes tests Test changes config Configuration changes dependencies Dependency updates labels Dec 31, 2025
@slin1237 slin1237 force-pushed the feature/p2p-model-distribution-n/9 branch 3 times, most recently from 65ebed9 to 8ef6179 Compare December 31, 2025 07:47
Integrate P2P model distribution into the model-agent entry point:
- Add P2P configuration fields (ports, rates, encryption, timeout)
- Read P2P settings from environment variables
- Create ModelDistributor when P2P_ENABLED=true
- Create P2PLeaseManager for download coordination
- Call gopher.EnableP2P() to activate P2P download flow
- Start MetainfoServer for peer discovery
- Add graceful shutdown for P2P resources

Environment variables:
- P2P_ENABLED: Enable/disable P2P (default: false)
- PEERS_SERVICE: Headless service DNS for peer discovery
- P2P_TORRENT_PORT: BitTorrent port (default: 6881)
- P2P_METAINFO_PORT: Metainfo HTTP port (default: 8081)
- P2P_MAX_DOWNLOAD_RATE: Max download rate in bytes/s
- P2P_MAX_UPLOAD_RATE: Max upload rate in bytes/s
- P2P_ENCRYPTION_ENABLED: Enable BitTorrent encryption
- P2P_DOWNLOAD_TIMEOUT: P2P download timeout in seconds
@slin1237 slin1237 force-pushed the feature/p2p-model-distribution-n/9 branch from 8ef6179 to 79ddf68 Compare December 31, 2025 08:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

config Configuration changes dependencies Dependency updates documentation Documentation changes model-agent Model agent changes tests Test changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants