Conversation
Adds a one-time provision key mechanism: admin creates a key tied to an agent ID, shares it with the agent operator, and the agent uses it to obtain certificates via REST — no admin API key needed. - Add in-memory KeyStore with create/validate/revoke/cleanup - Add admin endpoints (POST/GET/DELETE /api/v1/provision-keys) - Add agent endpoint (POST /api/v1/provision) authenticated by key - Add agent CLI subcommand: silo-proxy-agent provision --server --key - Add provision config section (disabled by default)
|
Caution Review failedThe pull request is closed. WalkthroughAdds a certificate provisioning subsystem and agent provisioning CLI support. Introduces an in-memory one-time provision KeyStore with TTL and cleanup, admin HTTP endpoints to create/list/revoke provision keys, and a provision HTTP endpoint that validates a key, signs a CSR via a CertSigner, marks keys used, and returns agent and CA certificates. Adds agent CLI Sequence Diagram(s)sequenceDiagram
participant AgentCLI as Agent CLI
participant ServerHTTP as Server HTTP
participant KeyStore as KeyStore
participant CertService as CertService
AgentCLI->>ServerHTTP: POST /api/v1/provision { key, csr }
ServerHTTP->>KeyStore: Validate(key)
KeyStore-->>ServerHTTP: provision key (valid) / error
alt key valid
ServerHTTP->>CertService: Sign CSR for agentID
CertService-->>ServerHTTP: agent_cert PEM
ServerHTTP->>KeyStore: MarkUsed(key)
ServerHTTP-->>AgentCLI: 200 { agent_id, cert_pem, key_pem, ca_cert_pem }
AgentCLI->>AgentCLI: write certs to disk
else invalid/expired/used
KeyStore-->>ServerHTTP: error
ServerHTTP-->>AgentCLI: 401/400 error
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
🤖 Fix all issues with AI agents
In `@cmd/silo-proxy-agent/provision.go`:
- Around line 37-42: Replace the use of the default http.Post with a bespoke
http.Client that has a timeout to avoid indefinite hangs: create an
http.Client{Timeout: time.Second * <N>} (choose appropriate N), build the
request for url (using http.NewRequest with reqBody) and call client.Do(req)
instead of http.Post, handle errors the same way and defer resp.Body.Close() as
before; update imports to include "time". Use the existing local symbols url,
reqBody and resp handling code in provision.go to slot this change in.
In `@cmd/silo-proxy-server/main.go`:
- Around line 96-107: When provisioning is enabled, validate
config.Provision.KeyTTLHours and config.Provision.CleanupIntervalMinutes before
creating provision.NewKeyStore and starting keyStore.StartCleanup: ensure both
values are > 0 (reject zero/negative), log a clear error via slog.Error and
os.Exit(1) if invalid, and only then compute ttl :=
time.Duration(config.Provision.KeyTTLHours)*time.Hour, cleanupInterval :=
time.Duration(config.Provision.CleanupIntervalMinutes)*time.Minute and proceed
to call provision.NewKeyStore and go keyStore.StartCleanup(context.Background(),
cleanupInterval); keep existing certService nil check intact.
In `@docs/cert-provisioning/overview.md`:
- Around line 22-54: The fenced code blocks in
docs/cert-provisioning/overview.md are missing language identifiers (MD040);
update each triple-backtick block (for example the large ASCII diagram and the
blocks containing "POST /api/v1/provision-keys", "GET /api/v1/provision-keys",
"DELETE /api/v1/provision-keys/:agent_id", "POST /api/v1/provision", and the
certs tree) to include a language tag such as text (e.g., ```text) so
markdownlint passes; scan the file for all similar blocks (noted around the
diagram and ranges 74-116, 122-149, 199-216) and add the same tag consistently.
- Around line 86-92: Replace any example secret-like values used in the examples
(e.g., the "provision_key" value "sk_a1b2c3d4e5f6..." and similar strings) with
clearly fake placeholders such as "sk_example_..." or "provision_key_example"
and likewise use non-secret example values for "agent_id" (e.g.,
"agent_example") in both occurrences mentioned; update all instances (including
the second occurrence) so example values cannot be mistaken for real secrets and
will not trigger secret scanners.
In `@docs/cert-provisioning/phase1.md`:
- Around line 52-59: Update the Validate documentation to correctly state the
error conditions: change the first bullet to "Key not found in map → 'invalid
provision key'", and rephrase bullets 2–3 to read "Key is expired → 'provision
key expired'" and "Key already used → 'provision key already used'"; ensure the
paragraph referencing Validate and the returned ProvisionKey reflects these
corrected conditions.
In `@internal/api/http/handler/provision.go`:
- Around line 82-147: The ProvisionHandler.Provision currently calls
keyStore.Validate(...) then later keyStore.MarkUsed(...), causing a race; add an
atomic KeyStore.Consume(key string) method that acquires the write lock,
verifies the key exists and is unused, marks it used, and returns the parsed
provision key (or an error) in one operation, then in Provision replace the
Validate(...) + MarkUsed(...) sequence with a single call to Consume(req.Key),
use the returned pk.AgentID as before, and handle/log errors exactly where
Validate/MarkUsed were handled.
In `@internal/provision/key_store.go`:
- Around line 66-81: The Validate function releases ks.mu.RLock() too early
causing a data race when MarkUsed modifies the same ProvisionKey; keep the read
lock held while checking pk.Used and pk.ExpiresAt, then return a shallow copy of
the ProvisionKey (not the pointer stored in ks.keys) before releasing the lock
so callers cannot mutate the internal struct; update KeyStore.Validate to
perform all reads under ks.mu.RLock()/RUnlock() and return a copy of the
ProvisionKey to avoid concurrent mutation.
🧹 Nitpick comments (4)
docs/cert-provisioning/phase2.md (1)
20-24: Add language specifier to fenced code block.The route listing block should have a language specifier for consistency with markdown best practices.
📝 Proposed fix
-``` +```text POST /api/v1/provision-keys → CreateProvisionKey GET /api/v1/provision-keys → ListProvisionKeys DELETE /api/v1/provision-keys/:id → RevokeProvisionKey</details> </blockquote></details> <details> <summary>docs/cert-provisioning/phase3.md (1)</summary><blockquote> `54-63`: **Minor: Clarify validity period calculation in pseudocode.** Line 58 shows `NotAfter: time.Now().Add(validityDays)` but `validityDays` is an `int`. The actual implementation should multiply by `24 * time.Hour` to convert days to duration. This is just pseudocode, but clarifying it would help implementers. <details> <summary>📝 Suggested clarification</summary> ```diff // - NotBefore: time.Now() -// - NotAfter: time.Now().Add(validityDays) +// - NotAfter: time.Now().Add(time.Duration(validityDays) * 24 * time.Hour)docs/cert-provisioning/phase4.md (1)
45-62: Add language specifier to flow block.The flow block should have a language specifier for consistency.
📝 Proposed fix
-``` +```text 1. Parse and validate request body - provision_key: required, non-empty - csr: required, non-empty ...internal/api/http/handler/provision_test.go (1)
145-181: Invalid/missing-key tests only hit the TLS-disabled branch.Both tests build the handler with a nil
certService, so they never reach key validation or missing-key handling. Consider renaming these tests or adding a non-nilcertService(or a lightweight test helper) to exercise invalid/missing key behavior.
| url := *server + "/api/v1/provision" | ||
| resp, err := http.Post(url, "application/json", bytes.NewBuffer(reqBody)) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to connect to server: %w", err) | ||
| } | ||
| defer resp.Body.Close() |
There was a problem hiding this comment.
Add HTTP client timeout to prevent indefinite hangs.
The http.Post call uses the default HTTP client which has no timeout. If the server is unresponsive, the agent will hang indefinitely.
🛡️ Proposed fix to add timeout
+ client := &http.Client{Timeout: 30 * time.Second}
url := *server + "/api/v1/provision"
- resp, err := http.Post(url, "application/json", bytes.NewBuffer(reqBody))
+ resp, err := client.Post(url, "application/json", bytes.NewBuffer(reqBody))
if err != nil {
return fmt.Errorf("failed to connect to server: %w", err)
}You'll also need to add "time" to imports.
🤖 Prompt for AI Agents
In `@cmd/silo-proxy-agent/provision.go` around lines 37 - 42, Replace the use of
the default http.Post with a bespoke http.Client that has a timeout to avoid
indefinite hangs: create an http.Client{Timeout: time.Second * <N>} (choose
appropriate N), build the request for url (using http.NewRequest with reqBody)
and call client.Do(req) instead of http.Post, handle errors the same way and
defer resp.Body.Close() as before; update imports to include "time". Use the
existing local symbols url, reqBody and resp handling code in provision.go to
slot this change in.
| var keyStore *provision.KeyStore | ||
| if config.Provision.Enabled { | ||
| if certService == nil { | ||
| slog.Error("Provisioning requires TLS to be enabled") | ||
| os.Exit(1) | ||
| } | ||
| ttl := time.Duration(config.Provision.KeyTTLHours) * time.Hour | ||
| keyStore = provision.NewKeyStore(ttl) | ||
| cleanupInterval := time.Duration(config.Provision.CleanupIntervalMinutes) * time.Minute | ||
| go keyStore.StartCleanup(context.Background(), cleanupInterval) | ||
| slog.Info("Provisioning enabled", "key_ttl_hours", config.Provision.KeyTTLHours) | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find the provision package and StartCleanup function
find . -path ./vendor -prune -o -type f -name "*.go" -print | xargs grep -l "func.*StartCleanup" | head -5Repository: EternisAI/silo-proxy
Length of output: 97
🏁 Script executed:
#!/bin/bash
# Search for StartCleanup implementation across the codebase
rg -n "func.*StartCleanup" -A 10 --type goRepository: EternisAI/silo-proxy
Length of output: 685
🏁 Script executed:
#!/bin/bash
# Check the config structure for Provision settings
rg -n "type.*Provision|KeyTTLHours|CleanupIntervalMinutes" --type go | head -20Repository: EternisAI/silo-proxy
Length of output: 1245
🏁 Script executed:
#!/bin/bash
# Check NewKeyStore implementation and how it handles TTL
rg -n "func NewKeyStore" -A 10 --type goRepository: EternisAI/silo-proxy
Length of output: 773
🏁 Script executed:
#!/bin/bash
# Check config loading and validation
rg -n "config.*Provision|viper|unmarshal" --type go | grep -i "provision\|config" | head -20Repository: EternisAI/silo-proxy
Length of output: 467
🏁 Script executed:
#!/bin/bash
# Search for any existing validation of these config values
rg -n "KeyTTLHours|CleanupIntervalMinutes" --type goRepository: EternisAI/silo-proxy
Length of output: 604
🏁 Script executed:
#!/bin/bash
# Check time.NewTicker behavior with documentation or test references
rg -n "time.NewTicker|StartCleanup" -B 2 -A 5 --type goRepository: EternisAI/silo-proxy
Length of output: 2172
🏁 Script executed:
#!/bin/bash
# Check the cleanup implementation and how it uses TTL
rg -n "func.*cleanup|ttl" -A 8 --type go internal/provision/key_store.goRepository: EternisAI/silo-proxy
Length of output: 917
🏁 Script executed:
#!/bin/bash
# Look at the complete ProvisionConfig definition
cat -n cmd/silo-proxy-server/config.go | head -40Repository: EternisAI/silo-proxy
Length of output: 1436
🏁 Script executed:
#!/bin/bash
# Check if there's any config validation layer
rg -n "validate|Validate|Config\(\)" -B 2 -A 5 --type go cmd/silo-proxy-server/Repository: EternisAI/silo-proxy
Length of output: 1366
🏁 Script executed:
#!/bin/bash
# Check Go time.NewTicker behavior with zero or negative duration
# Search for any tests or documentation about this
rg -n "NewTicker.*0|NewTicker.*-" --type go | head -10Repository: EternisAI/silo-proxy
Length of output: 46
🏁 Script executed:
#!/bin/bash
# Check if there's a default configuration file or environment setup
find . -type f \( -name "*.yaml" -o -name "*.yml" -o -name "*.env*" -o -name "application*" \) 2>/dev/null | head -10Repository: EternisAI/silo-proxy
Length of output: 236
🏁 Script executed:
#!/bin/bash
# Check what values are used in tests or default configs
rg -n "key_ttl_hours|cleanup_interval_minutes" -A 1 -B 1 --type go --type yamlRepository: EternisAI/silo-proxy
Length of output: 844
🏁 Script executed:
#!/bin/bash
# Check Go's time.NewTicker behavior - let's see if there's any panic handling or test refs
rg -n "panic|NewTicker" -B 2 -A 2 --type go | grep -A 5 -B 5 "NewTicker"Repository: EternisAI/silo-proxy
Length of output: 1375
🏁 Script executed:
#!/bin/bash
# Get the full main.go context around the provisioning setup
sed -n '96,110p' cmd/silo-proxy-server/main.goRepository: EternisAI/silo-proxy
Length of output: 638
Add validation for provisioning TTL and cleanup interval before initialization.
The code directly passes config.Provision.KeyTTLHours and config.Provision.CleanupIntervalMinutes to duration calculations without validation. If either is zero or negative, time.NewTicker will panic, and keys will be created with immediate expiry. Add explicit validation to fail fast when provisioning is enabled.
Suggested fix
if config.Provision.Enabled {
if certService == nil {
slog.Error("Provisioning requires TLS to be enabled")
os.Exit(1)
}
+ if config.Provision.KeyTTLHours <= 0 {
+ slog.Error("Provisioning requires key_ttl_hours to be > 0")
+ os.Exit(1)
+ }
+ if config.Provision.CleanupIntervalMinutes <= 0 {
+ slog.Error("Provisioning requires cleanup_interval_minutes to be > 0")
+ os.Exit(1)
+ }
ttl := time.Duration(config.Provision.KeyTTLHours) * time.Hour
keyStore = provision.NewKeyStore(ttl)
cleanupInterval := time.Duration(config.Provision.CleanupIntervalMinutes) * time.Minute
go keyStore.StartCleanup(context.Background(), cleanupInterval)
slog.Info("Provisioning enabled", "key_ttl_hours", config.Provision.KeyTTLHours)
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| var keyStore *provision.KeyStore | |
| if config.Provision.Enabled { | |
| if certService == nil { | |
| slog.Error("Provisioning requires TLS to be enabled") | |
| os.Exit(1) | |
| } | |
| ttl := time.Duration(config.Provision.KeyTTLHours) * time.Hour | |
| keyStore = provision.NewKeyStore(ttl) | |
| cleanupInterval := time.Duration(config.Provision.CleanupIntervalMinutes) * time.Minute | |
| go keyStore.StartCleanup(context.Background(), cleanupInterval) | |
| slog.Info("Provisioning enabled", "key_ttl_hours", config.Provision.KeyTTLHours) | |
| } | |
| if config.Provision.Enabled { | |
| if certService == nil { | |
| slog.Error("Provisioning requires TLS to be enabled") | |
| os.Exit(1) | |
| } | |
| if config.Provision.KeyTTLHours <= 0 { | |
| slog.Error("Provisioning requires key_ttl_hours to be > 0") | |
| os.Exit(1) | |
| } | |
| if config.Provision.CleanupIntervalMinutes <= 0 { | |
| slog.Error("Provisioning requires cleanup_interval_minutes to be > 0") | |
| os.Exit(1) | |
| } | |
| ttl := time.Duration(config.Provision.KeyTTLHours) * time.Hour | |
| keyStore = provision.NewKeyStore(ttl) | |
| cleanupInterval := time.Duration(config.Provision.CleanupIntervalMinutes) * time.Minute | |
| go keyStore.StartCleanup(context.Background(), cleanupInterval) | |
| slog.Info("Provisioning enabled", "key_ttl_hours", config.Provision.KeyTTLHours) | |
| } |
🤖 Prompt for AI Agents
In `@cmd/silo-proxy-server/main.go` around lines 96 - 107, When provisioning is
enabled, validate config.Provision.KeyTTLHours and
config.Provision.CleanupIntervalMinutes before creating provision.NewKeyStore
and starting keyStore.StartCleanup: ensure both values are > 0 (reject
zero/negative), log a clear error via slog.Error and os.Exit(1) if invalid, and
only then compute ttl := time.Duration(config.Provision.KeyTTLHours)*time.Hour,
cleanupInterval :=
time.Duration(config.Provision.CleanupIntervalMinutes)*time.Minute and proceed
to call provision.NewKeyStore and go keyStore.StartCleanup(context.Background(),
cleanupInterval); keep existing certService nil check intact.
| ``` | ||
| Admin Server Agent Device | ||
| | | | | ||
| | POST /api/v1/ | | | ||
| | provision-keys | | | ||
| | {agent_id: "agent-5"} | | | ||
| |----------------------->| | | ||
| | provision_key: | | | ||
| | "sk_a1b2c3..." | | | ||
| |<-----------------------| | | ||
| | | | | ||
| | (copy key to device) | | | ||
| |~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~>| | ||
| | | | | ||
| | | 1. Generate private key | | ||
| | | 2. Create CSR | | ||
| | | | | ||
| | | POST /api/v1/provision | | ||
| | | {key, csr} | | ||
| | |<---------------------------| | ||
| | | | | ||
| | | 3. Validate key | | ||
| | | 4. Sign CSR with CA | | ||
| | | 5. Mark key as used | | ||
| | | | | ||
| | | {agent_cert, ca_cert} | | ||
| | |--------------------------->| | ||
| | | | | ||
| | | 6. Save certs to disk | | ||
| | | 7. Connect via mTLS | | ||
| | |<===========================| | ||
| | | (gRPC stream) | | ||
| ``` |
There was a problem hiding this comment.
Add language identifiers to fenced code blocks (MD040).
markdownlint flags these blocks as missing a language. Use text (or a more specific language) to satisfy MD040.
🔧 Proposed fix (representative blocks)
-```
+```text
Admin Server Agent Device
| | |
@@
-```
+```
-```
+```text
POST /api/v1/provision-keys
-```
+```
-```
+```text
GET /api/v1/provision-keys
-```
+```
-```
+```text
DELETE /api/v1/provision-keys/:agent_id
-```
+```
-```
+```text
POST /api/v1/provision
-```
+```
-```
+```text
~/.silo-proxy/certs/
├── agent-key.pem # generated locally
├── agent-cert.pem # signed by server CA
└── ca-cert.pem # for verifying server
-```
+```Also applies to: 74-116, 122-149, 199-216
🧰 Tools
🪛 markdownlint-cli2 (0.20.0)
[warning] 22-22: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
In `@docs/cert-provisioning/overview.md` around lines 22 - 54, The fenced code
blocks in docs/cert-provisioning/overview.md are missing language identifiers
(MD040); update each triple-backtick block (for example the large ASCII diagram
and the blocks containing "POST /api/v1/provision-keys", "GET
/api/v1/provision-keys", "DELETE /api/v1/provision-keys/:agent_id", "POST
/api/v1/provision", and the certs tree) to include a language tag such as text
(e.g., ```text) so markdownlint passes; scan the file for all similar blocks
(noted around the diagram and ranges 74-116, 122-149, 199-216) and add the same
tag consistently.
| Response: | ||
| ```json | ||
| { | ||
| "provision_key": "sk_a1b2c3d4e5f6...", | ||
| "agent_id": "agent-5", | ||
| "expires_at": "2026-02-10T12:00:00Z" | ||
| } |
There was a problem hiding this comment.
Use clearly fake placeholders for example provision keys.
The example sk_a1b2c3... strings trigger secret scanners. Swap to explicit placeholders (e.g., sk_example_...) to avoid false positives.
🧹 Proposed fix
- "provision_key": "sk_a1b2c3d4e5f6...",
+ "provision_key": "sk_example_123456...",
@@
- "provision_key": "sk_a1b2c3d4e5f6...",
+ "provision_key": "sk_example_123456...",Also applies to: 134-139
🧰 Tools
🪛 Gitleaks (8.30.0)
[high] 89-89: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
🤖 Prompt for AI Agents
In `@docs/cert-provisioning/overview.md` around lines 86 - 92, Replace any example
secret-like values used in the examples (e.g., the "provision_key" value
"sk_a1b2c3d4e5f6..." and similar strings) with clearly fake placeholders such as
"sk_example_..." or "provision_key_example" and likewise use non-secret example
values for "agent_id" (e.g., "agent_example") in both occurrences mentioned;
update all instances (including the second occurrence) so example values cannot
be mistaken for real secrets and will not trigger secret scanners.
| ### Validate Logic | ||
|
|
||
| `Validate` checks three conditions and returns a specific error for each: | ||
| 1. Key exists in the map → `"invalid provision key"` | ||
| 2. Key not expired → `"provision key expired"` | ||
| 3. Key not already used → `"provision key already used"` | ||
|
|
||
| Returns the `ProvisionKey` on success. |
There was a problem hiding this comment.
Clarify validation error conditions.
Line 55 states "Key exists in the map → invalid provision key" which is confusing. The error "invalid provision key" should occur when the key does not exist in the map.
📝 Suggested fix
`Validate` checks three conditions and returns a specific error for each:
-1. Key exists in the map → `"invalid provision key"`
+1. Key does not exist in the map → `"invalid provision key"`
2. Key not expired → `"provision key expired"`
3. Key not already used → `"provision key already used"`The conditions on lines 56-57 also read awkwardly. Consider rewording to describe when errors occur:
1. Key not found in map → "invalid provision key"
2. Key is expired → "provision key expired"
3. Key already used → "provision key already used"
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ### Validate Logic | |
| `Validate` checks three conditions and returns a specific error for each: | |
| 1. Key exists in the map → `"invalid provision key"` | |
| 2. Key not expired → `"provision key expired"` | |
| 3. Key not already used → `"provision key already used"` | |
| Returns the `ProvisionKey` on success. | |
| ### Validate Logic | |
| `Validate` checks three conditions and returns a specific error for each: | |
| 1. Key does not exist in the map → `"invalid provision key"` | |
| 2. Key not expired → `"provision key expired"` | |
| 3. Key not already used → `"provision key already used"` | |
| Returns the `ProvisionKey` on success. |
🤖 Prompt for AI Agents
In `@docs/cert-provisioning/phase1.md` around lines 52 - 59, Update the Validate
documentation to correctly state the error conditions: change the first bullet
to "Key not found in map → 'invalid provision key'", and rephrase bullets 2–3 to
read "Key is expired → 'provision key expired'" and "Key already used →
'provision key already used'"; ensure the paragraph referencing Validate and the
returned ProvisionKey reflects these corrected conditions.
| func (h *ProvisionHandler) Provision(ctx *gin.Context) { | ||
| if h.certService == nil { | ||
| slog.Warn("Provision requested but TLS is disabled") | ||
| ctx.JSON(http.StatusBadRequest, gin.H{"error": "TLS is not enabled on this server"}) | ||
| return | ||
| } | ||
|
|
||
| var req dto.ProvisionRequest | ||
| if err := ctx.ShouldBindJSON(&req); err != nil { | ||
| ctx.JSON(http.StatusBadRequest, gin.H{"error": err.Error()}) | ||
| return | ||
| } | ||
|
|
||
| pk, err := h.keyStore.Validate(req.Key) | ||
| if err != nil { | ||
| slog.Warn("Provision key validation failed", "error", err) | ||
| ctx.JSON(http.StatusUnauthorized, gin.H{"error": err.Error()}) | ||
| return | ||
| } | ||
|
|
||
| agentID := pk.AgentID | ||
|
|
||
| if h.certService.AgentCertExists(agentID) { | ||
| slog.Warn("Certificate already exists for agent", "agent_id", agentID) | ||
| ctx.JSON(http.StatusConflict, gin.H{"error": "Certificate already exists for this agent"}) | ||
| return | ||
| } | ||
|
|
||
| agentCert, agentKey, err := h.certService.GenerateAgentCert(agentID) | ||
| if err != nil { | ||
| slog.Error("Failed to generate agent certificate", "error", err, "agent_id", agentID) | ||
| ctx.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to generate agent certificate"}) | ||
| return | ||
| } | ||
|
|
||
| certPEM, err := cert.CertToPEM(agentCert) | ||
| if err != nil { | ||
| slog.Error("Failed to encode certificate", "error", err, "agent_id", agentID) | ||
| ctx.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to encode certificate"}) | ||
| return | ||
| } | ||
|
|
||
| keyPEM, err := cert.KeyToPEM(agentKey) | ||
| if err != nil { | ||
| slog.Error("Failed to encode key", "error", err, "agent_id", agentID) | ||
| ctx.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to encode key"}) | ||
| return | ||
| } | ||
|
|
||
| caCertBytes, err := h.certService.GetCACert() | ||
| if err != nil { | ||
| slog.Error("Failed to read CA certificate", "error", err, "agent_id", agentID) | ||
| ctx.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to read CA certificate"}) | ||
| return | ||
| } | ||
|
|
||
| h.keyStore.MarkUsed(req.Key) | ||
|
|
||
| slog.Info("Agent provisioned successfully", "agent_id", agentID) | ||
| ctx.JSON(http.StatusOK, dto.ProvisionResponse{ | ||
| AgentID: agentID, | ||
| CertPEM: string(certPEM), | ||
| KeyPEM: string(keyPEM), | ||
| CACertPEM: string(caCertBytes), | ||
| }) | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find KeyStore interface and implementation
find . -type f -name "*.go" | xargs grep -l "type KeyStore" | head -20Repository: EternisAI/silo-proxy
Length of output: 97
🏁 Script executed:
#!/bin/bash
# Search for Validate and MarkUsed method definitions
rg "func.*Validate\(|func.*MarkUsed\(" --type go -B 2 -A 10Repository: EternisAI/silo-proxy
Length of output: 2072
🏁 Script executed:
#!/bin/bash
# Check if a Consume method already exists
rg "Consume\(" --type go -B 2 -A 5Repository: EternisAI/silo-proxy
Length of output: 46
🏁 Script executed:
#!/bin/bash
# Look at the provision handler file to understand context
cat -n internal/api/http/handler/provision.go | head -150Repository: EternisAI/silo-proxy
Length of output: 5337
Provision key race condition allows issuing multiple certificates for a single key.
Validate() releases its read lock immediately after checking the key is unused (line 95), then certificate generation, encoding, and CA cert retrieval occur without holding the lock (lines 110–136). Another concurrent request can pass Validate() during this window before the first request calls MarkUsed() (line 138), resulting in two requests using the same provision key.
Implement an atomic Consume(key string) method in KeyStore that acquires the write lock, validates the key exists and is unused, marks it used, and releases the lock—all as a single atomic operation. Replace both Validate() and MarkUsed() calls with this single operation.
🤖 Prompt for AI Agents
In `@internal/api/http/handler/provision.go` around lines 82 - 147, The
ProvisionHandler.Provision currently calls keyStore.Validate(...) then later
keyStore.MarkUsed(...), causing a race; add an atomic KeyStore.Consume(key
string) method that acquires the write lock, verifies the key exists and is
unused, marks it used, and returns the parsed provision key (or an error) in one
operation, then in Provision replace the Validate(...) + MarkUsed(...) sequence
with a single call to Consume(req.Key), use the returned pk.AgentID as before,
and handle/log errors exactly where Validate/MarkUsed were handled.
| func (ks *KeyStore) Validate(key string) (*ProvisionKey, error) { | ||
| ks.mu.RLock() | ||
| pk, exists := ks.keys[key] | ||
| ks.mu.RUnlock() | ||
|
|
||
| if !exists { | ||
| return nil, ErrKeyNotFound | ||
| } | ||
| if pk.Used { | ||
| return nil, ErrKeyAlreadyUsed | ||
| } | ||
| if time.Now().After(pk.ExpiresAt) { | ||
| return nil, ErrKeyExpired | ||
| } | ||
| return pk, nil | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
fd "key_store.go" --type fRepository: EternisAI/silo-proxy
Length of output: 95
🏁 Script executed:
cat -n internal/provision/key_store.go | head -100Repository: EternisAI/silo-proxy
Length of output: 2632
🏁 Script executed:
rg -n "func.*MarkUsed|func.*cleanup" internal/provision/Repository: EternisAI/silo-proxy
Length of output: 209
🏁 Script executed:
sed -n '137,170p' internal/provision/key_store.goRepository: EternisAI/silo-proxy
Length of output: 368
Data race in Validate due to unlock-before-read.
Validate releases the read lock at line 69 before checking pk.Used (line 74) and pk.ExpiresAt (line 77), while MarkUsed can acquire the write lock and modify pk.Used concurrently on the same pointer. This violates Go's memory model and produces a data race. Keep the lock held while reading all fields and return a copy to prevent inconsistent validation and external mutation.
🛠️ Proposed fix to keep reads under lock
func (ks *KeyStore) Validate(key string) (*ProvisionKey, error) {
- ks.mu.RLock()
- pk, exists := ks.keys[key]
- ks.mu.RUnlock()
-
- if !exists {
- return nil, ErrKeyNotFound
- }
- if pk.Used {
- return nil, ErrKeyAlreadyUsed
- }
- if time.Now().After(pk.ExpiresAt) {
- return nil, ErrKeyExpired
- }
- return pk, nil
+ ks.mu.RLock()
+ pk, exists := ks.keys[key]
+ if !exists {
+ ks.mu.RUnlock()
+ return nil, ErrKeyNotFound
+ }
+ if pk.Used {
+ ks.mu.RUnlock()
+ return nil, ErrKeyAlreadyUsed
+ }
+ if time.Now().After(pk.ExpiresAt) {
+ ks.mu.RUnlock()
+ return nil, ErrKeyExpired
+ }
+ pkCopy := *pk
+ ks.mu.RUnlock()
+ return &pkCopy, nil
}🤖 Prompt for AI Agents
In `@internal/provision/key_store.go` around lines 66 - 81, The Validate function
releases ks.mu.RLock() too early causing a data race when MarkUsed modifies the
same ProvisionKey; keep the read lock held while checking pk.Used and
pk.ExpiresAt, then return a shallow copy of the ProvisionKey (not the pointer
stored in ks.keys) before releasing the lock so callers cannot mutate the
internal struct; update KeyStore.Validate to perform all reads under
ks.mu.RLock()/RUnlock() and return a copy of the ProvisionKey to avoid
concurrent mutation.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@cmd/silo-proxy-agent/provision.go`:
- Around line 67-70: The code unmarshals into provResp (dto.ProvisionResponse)
but does not validate provResp.AgentID before using it to build file paths; add
a check immediately after json.Unmarshal to trim whitespace and ensure
provResp.AgentID is non-empty and contains no path separators (e.g., reject
values with "/" or "\"), returning a descriptive error if invalid, so subsequent
code that constructs certs/agents/<AgentID>-cert.pem or creates directories will
not operate on an empty or malicious agent id.
In `@internal/api/http/handler/provision.go`:
- Around line 111-115: When the provision handler detects an existing
certificate (!created) it currently returns 409 without marking the provision
key used; call the provision key store's MarkUsed method for the key (same key
that was validated earlier) before returning the conflict response so the key's
Used flag is set and cannot be reused—update the branch in the handler (the code
around the created check in provision.go handling agentID) to invoke
MarkUsed(key) then log and ctx.JSON the 409 response; no error handling is
required for MarkUsed since it returns void.
| var provResp dto.ProvisionResponse | ||
| if err := json.Unmarshal(body, &provResp); err != nil { | ||
| return fmt.Errorf("failed to parse response: %w", err) | ||
| } |
There was a problem hiding this comment.
Validate AgentID before using it in file paths.
If the server returns an empty AgentID, the code will create paths like certs/agents//-cert.pem and a directory named with an empty string, which could cause unexpected behavior.
🛡️ Proposed fix
var provResp dto.ProvisionResponse
if err := json.Unmarshal(body, &provResp); err != nil {
return fmt.Errorf("failed to parse response: %w", err)
}
+ if provResp.AgentID == "" {
+ return fmt.Errorf("server returned empty agent ID")
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| var provResp dto.ProvisionResponse | |
| if err := json.Unmarshal(body, &provResp); err != nil { | |
| return fmt.Errorf("failed to parse response: %w", err) | |
| } | |
| var provResp dto.ProvisionResponse | |
| if err := json.Unmarshal(body, &provResp); err != nil { | |
| return fmt.Errorf("failed to parse response: %w", err) | |
| } | |
| if provResp.AgentID == "" { | |
| return fmt.Errorf("server returned empty agent ID") | |
| } |
🤖 Prompt for AI Agents
In `@cmd/silo-proxy-agent/provision.go` around lines 67 - 70, The code unmarshals
into provResp (dto.ProvisionResponse) but does not validate provResp.AgentID
before using it to build file paths; add a check immediately after
json.Unmarshal to trim whitespace and ensure provResp.AgentID is non-empty and
contains no path separators (e.g., reject values with "/" or "\"), returning a
descriptive error if invalid, so subsequent code that constructs
certs/agents/<AgentID>-cert.pem or creates directories will not operate on an
empty or malicious agent id.
| if !created { | ||
| slog.Warn("Certificate already exists for agent", "agent_id", agentID) | ||
| ctx.JSON(http.StatusConflict, gin.H{"error": "Certificate already exists for this agent"}) | ||
| return | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Find key_store.go and examine MarkUsed signature and behavior
fd -a "key_store.go" -type fRepository: EternisAI/silo-proxy
Length of output: 234
🏁 Script executed:
# Also search for KeyStore interface/type definition
rg -n "type KeyStore|func.*MarkUsed|func.*Validate" --max-count 50 -A 3Repository: EternisAI/silo-proxy
Length of output: 3770
🏁 Script executed:
# Examine the provision.go file around the specified lines
cat -n internal/api/http/handler/provision.go | head -160 | tail -60Repository: EternisAI/silo-proxy
Length of output: 2347
🏁 Script executed:
cat -n internal/provision/key_store.goRepository: EternisAI/silo-proxy
Length of output: 3908
🏁 Script executed:
# Also check the Validate function to understand key consumption/reuse behavior
rg -n "func.*Validate|Used|func.*Revoke" internal/provision/key_store.go -A 5Repository: EternisAI/silo-proxy
Length of output: 1380
Mark the provision key as used before returning 409 when certificate already exists.
When the certificate already exists (!created at line 111), the handler returns 409 without calling MarkUsed, allowing the key to potentially be reused. The Validate function checks the Used flag and rejects already-used keys, so the key should be marked used before returning the error to prevent reuse attempts.
Note: MarkUsed is a void function that does not return errors, so error handling around it is not applicable. The current design relies on the Used flag to prevent reuse.
🤖 Prompt for AI Agents
In `@internal/api/http/handler/provision.go` around lines 111 - 115, When the
provision handler detects an existing certificate (!created) it currently
returns 409 without marking the provision key used; call the provision key
store's MarkUsed method for the key (same key that was validated earlier) before
returning the conflict response so the key's Used flag is set and cannot be
reused—update the branch in the handler (the code around the created check in
provision.go handling agentID) to invoke MarkUsed(key) then log and ctx.JSON the
409 response; no error handling is required for MarkUsed since it returns void.
Summary by CodeRabbit
New Features
Configuration
Documentation
Tests
Tools