Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 31 additions & 5 deletions .claude/agents/actor.md
Original file line number Diff line number Diff line change
Expand Up @@ -375,8 +375,12 @@ Document key decisions using this structure:
- [ ] Error cases (invalid input, failures)
- [ ] Security cases (injection, auth bypass) — if applicable

**Validation criteria → tests (MANDATORY when test_strategy is not N/A)**:
- For each `VCn:` item in `validation_criteria`, implement or update at least one automated test that would fail without your change and pass with it.
- Prefer naming tests with `vc<n>` (e.g., `test_vc1_*`, `TestVC1*`) so Monitor can deterministically confirm coverage.

**Format**:
```
```text
1. test_[function]_[scenario]_[expected]
Input: [specific input]
Expected: [specific output/behavior]
Expand All @@ -392,7 +396,18 @@ Document key decisions using this structure:
Expected: 409, {"error": "Email already registered"}
</example>

## 6. Used Patterns (ACE Learning)
## 6. Validation Criteria Coverage (Evidence)

If the subtask packet includes `validation_criteria`, list each `VCn:` and where it is enforced.

**Format**:
```text
VC1: <criterion text>
- Code: path/to/file.ext#SymbolOrLocation
- Tests: path/to/test_file.ext::test_name (or N/A with reason)
```

## 7. Used Patterns (ACE Learning)

**Format**: `["impl-0012", "sec-0034"]` or `[]` if none

Expand All @@ -403,7 +418,7 @@ Document key decisions using this structure:

**If no patterns match**: `[]` with note "No relevant patterns in current mem0"

## 7. Integration Notes (If Applicable)
## 8. Integration Notes (If Applicable)

Only include if changes affect:
- Database schema (migrations needed?)
Expand Down Expand Up @@ -449,6 +464,7 @@ Only include if changes affect:
- [ ] AAG contract stated BEFORE code (Section 1)
- [ ] Trade-offs documented with alternatives
- [ ] Test cases cover happy + edge + error paths
- [ ] Each `validation_criteria` item has at least one automated test (or explicit N/A with reason)
- [ ] Used patterns tracked (or `[]` if none)
- [ ] Template variables `{{...}}` preserved in generated code

Expand Down Expand Up @@ -518,6 +534,14 @@ with the following JSON content:
"summary": "<one-line description of what was implemented>",
"aag_contract": "<the AAG contract line>",
"files_changed": ["<list of modified file paths>"],
"tests_changed": ["<list of modified/added test file paths>"],
"validation_criteria_coverage": [
{
"criterion": "VC1: ...",
"tests": ["path/to/test_file.ext::test_name"],
"notes": "Short justification if tests are N/A or partial"
}
],
"status": "applied"
}
```
Expand Down Expand Up @@ -715,7 +739,10 @@ output:

{{feedback}}

**Action Required**: Address ALL issues above. Focus on:
**Action Required**: Address ALL issues above. Do NOT dismiss feedback as "out of scope" or "separate task".
If you believe an item should be deferred, STOP and ask the user for explicit approval to defer.

Focus on:
1. Specific line items mentioned
2. Quality checklist items that failed
3. Security or constraint violations
Expand Down Expand Up @@ -1083,4 +1110,3 @@ export class ReconnectingWebSocket {
**Used Bullets**: `[]` (No similar patterns in mem0. Novel implementation.)

</Actor_Reference_Examples>

50 changes: 44 additions & 6 deletions .claude/agents/monitor.md
Original file line number Diff line number Diff line change
Expand Up @@ -821,15 +821,27 @@ When `{{requirements}}` or `{{subtask_description}}` includes `validation_criter
```
FOR each criterion in validation_criteria:
1. PARSE criterion into testable assertion
2. VERIFY assertion against {{solution}}
3. RECORD result: PASS | FAIL | PARTIAL | UNTESTABLE
2. VERIFY assertion against {{solution}} (code-path evidence)
3. VERIFY test coverage using test_strategy (if not N/A)
4. RECORD result: PASS | FAIL | PARTIAL | UNTESTABLE

CONTRACT_STATUS:
- ALL PASS → contract_compliant: true
- ANY FAIL → contract_compliant: false, list violations
- ANY UNTESTABLE → flag for clarification
```

### Test Coverage Rule (Executable Contracts)

Design constraints only become reliable when they are enforced by executable checks.

For each `VCn:` criterion:
- If `test_strategy` is provided and not `N/A`, require at least one concrete test case that covers it.
- Prefer deterministic mapping: test names include `vc<n>` (e.g., `test_vc1_*`, `TestVC1*`).
- Evidence MUST include both:
- **Code evidence** (where in code the behavior is implemented), and
- **Test evidence** (where in tests it is asserted).

### Contract Assertion Patterns

| Criterion Type | How to Verify | Example |
Expand All @@ -852,15 +864,31 @@ Include in JSON output when validation_criteria provided:
"failed": 1,
"untestable": 0,
"details": [
{"criterion": "Returns 401 for expired token", "status": "PASS", "evidence": "Line 45: if token.expired: return 401"},
{"criterion": "Creates audit log entry", "status": "FAIL", "evidence": "No audit.log() call found in create_user()"}
{
"criterion": "VC1: Returns 401 for expired token (auth/middleware.py:validate_token)",
"status": "PASS",
"code_evidence": "auth/middleware.py:45: if token.expired: return 401",
"test_coverage": "PASS",
"test_evidence": "tests/test_auth.py::test_vc1_expired_token_returns_401"
},
{
"criterion": "VC2: Creates audit log entry with user_id (audit/logger.py:log_event)",
"status": "FAIL",
"code_evidence": "No audit.log_event() call found in create_user()",
"test_coverage": "MISSING",
"test_evidence": "No test found matching vc2 or described in test_strategy"
}
]
},
"contract_compliant": false
}
```

**Decision Rule**: If `contract_compliant: false`, set `valid: false` unless ALL failed contracts are LOW severity (documentation, naming).
**Decision Rule**:
- If `contract_compliant: false`, set `valid: false` unless ALL failed contracts are LOW severity (documentation, naming).
- If any Behavioral/Integration/Edge-case criterion has `test_coverage != PASS` and test_strategy is not `N/A`:
- If `security_critical == true`: set `valid: false` (missing executable enforcement is a release blocker).
- Otherwise: add a **testability** issue and require Actor to add tests.

</Monitor_Contract_Validation>

Expand Down Expand Up @@ -2495,6 +2523,10 @@ def check_rate_limit(user_id, action, limit=100, window=3600):
- Requirements unmet → valid=false
- Only MEDIUM/LOW issues → valid=true (with feedback)

**Hard-stop semantics**:
- If you set `valid=false`, the workflow MUST resolve the issues before proceeding.
- Do not accept "we'll do it later" reasoning as a resolution unless the user explicitly approves deferral.

</Monitor_Critical_Reminders>

### Evidence File (Artifact-Gated Validation)
Expand All @@ -2512,7 +2544,13 @@ with the following JSON content:
"timestamp": "<ISO 8601 UTC>",
"valid": true,
"issues_found": 0,
"recommendation": "approve|reject|revise"
"recommendation": "approve|reject|revise",
"validation_criteria_test_coverage": {
"total": 0,
"covered": 0,
"missing": 0,
"notes": "Optional: summarize VC→test coverage findings"
}
}
```

Expand Down
36 changes: 31 additions & 5 deletions .claude/agents/task-decomposer.md
Original file line number Diff line number Diff line change
Expand Up @@ -248,8 +248,21 @@ Return **ONLY** valid JSON in this exact structure:
**subtasks[].complexity_rationale**: MUST reference factors: "Score N: factor (+X), factor (+Y)..."
**subtasks[].validation_criteria**: Array of **testable conditions** that prove completion
- REQUIRED: 2-4 specific, verifiable outcomes
- Good: "Returns 401 for expired token", "Creates audit log entry with user_id"
- Bad: "Works correctly", "Handles errors"
- Format (recommended): Prefix each item with `VC1:`, `VC2:`, ... for stable cross-agent reference.
- Each criterion MUST be both:
- **Behavior-/artifact-verifiable** (can be checked by reading code), and
- **Test-verifiable** (has at least one concrete test case planned in `test_strategy`).
- Each criterion SHOULD include a concrete anchor:
- endpoint/handler + route, OR
- function/class name + file path
- Good:
- "VC1: POST /users returns 201 and persists normalized email (users/routes.py:create_user)"
- "VC2: Returns 401 for expired token (auth/middleware.py:validate_token)"
- "VC3: Creates audit log entry with user_id (audit/logger.py:log_event)"
- Bad:
- "Works correctly"
- "Handles errors"
- "Tests pass"
**subtasks[].contracts**: Array of **executable assertion patterns** (optional but recommended for complexity_score ≥ 5)
- `type`: "precondition" | "postcondition" | "invariant"
- `assertion`: Executable pattern (e.g., "response.status == 401 WHEN token.expired")
Expand All @@ -260,15 +273,26 @@ Return **ONLY** valid JSON in this exact structure:
- This is the primary handoff artifact to the Actor agent
- Actor "compiles" this contract into code; Monitor verifies against it
- Format: `"<Actor> -> <Action>(params) -> <Goal with success criteria>"`
- **Integration is part of the contract**:
- Prefer describing the *entrypoint + call chain* that makes the behavior real (especially for validation, policy checks, auth, migrations).
- Avoid leaf-only contracts that are easy to satisfy in isolation but not wired into production code paths.
- Examples:
- `"AuthService -> validate(token) -> returns 401|200 with user_id"`
- `"ProjectModel -> add_field(archived_at: DateTime?) -> migration passes"`
- `"RateLimiter -> decorate(endpoint, 100/min) -> returns 429 when exceeded"`
- `"ConfigLoader -> load_policy(path) -> calls validate_risk_policy(); raises ConfigValidationError on contradictions"`
**subtasks[].implementation_hint**: Optional guidance for non-obvious implementations
- RECOMMENDED when: complexity_score >= 5 OR security_critical OR dependencies.length >= 2
- OMIT when: standard pattern with obvious implementation
- Example: "Use existing RateLimiter middleware, configure for /api/* routes"
**subtasks[].test_strategy**: Required object with unit/integration/e2e keys. Use "N/A" for levels not applicable.
- MUST map `validation_criteria` → tests:
- For each `VCn:` criterion, include at least one planned test name that covers it.
- Recommended naming: include `vc<n>` in the test name (e.g., `test_vc1_*`, `TestVC1*`) for deterministic grep-ability.
- Recommended format: `path/to/test_file.ext::test_name_or_symbol`
- "N/A" is acceptable ONLY when:
- The repository has no automated test harness, and adding one is out-of-scope for this subtask.
- In that case: either add a FOUNDATION subtask to introduce a minimal test harness, or document the gap explicitly in risks/assumptions.
**subtasks[].affected_files**: Precise file paths (NOT "backend", "frontend"); use [] if paths unknown

### Subtask Ordering
Expand Down Expand Up @@ -484,16 +508,18 @@ When invoked with `mode: "re_decomposition"` from the orchestrator, you receive
- [ ] Each subtask is atomic (independently implementable + testable)
- [ ] Each subtask has an aag_contract in `Actor -> Action(params) -> Goal` format
- [ ] AAG contracts are specific (not "does stuff" — name classes, methods, return types)
- [ ] AAG contracts include wiring/integration when relevant (entrypoint + validator/policy checks, not leaf-only helpers)
- [ ] All dependencies are explicit and accurate
- [ ] Subtasks ordered by dependency (foundations first)
- [ ] 5-8 subtasks (not too granular or too coarse)
- [ ] Titles are action-oriented (start with verb)
- [ ] Descriptions explain HOW, not just WHAT

**Acceptance Criteria**:
- [ ] Each subtask has 3-5 specific criteria
- [ ] Each subtask has 2-4 specific criteria
- [ ] Criteria are testable and measurable
- [ ] Criteria cover: functionality + edge cases + testing
- [ ] Criteria cover: functionality + edge cases (as applicable)
- [ ] Each VC has a concrete verification hook in test_strategy (at least one planned test per VC)
- [ ] No vague criteria ("works", "is good", "done")

**File Paths**:
Expand All @@ -510,7 +536,7 @@ When invoked with `mode: "re_decomposition"` from the orchestrator, you receive

**Test Strategy**:
- [ ] test_strategy object included for each subtask
- [ ] Unit tests specified (REQUIRED for all subtasks)
- [ ] Unit tests specified (default). If repo has no test harness: add a FOUNDATION subtask to introduce minimal tests or explicitly justify "N/A".
- [ ] Integration tests specified when subtask integrates multiple components
- [ ] E2e tests specified when subtask impacts user-facing functionality
- [ ] "N/A" used appropriately when test layer not applicable
Expand Down
6 changes: 5 additions & 1 deletion .claude/commands/map-debate.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,10 +44,14 @@ Task: $ARGUMENTS

Hard requirements:
- Use `blueprint.subtasks[].validation_criteria` (2-4 testable, verifiable outcomes)
- Prefix each criterion with `VC1:`, `VC2:`, ... (stable references for Actor/Monitor)
- Include a concrete anchor per VC (endpoint/function + file path)
- Use `blueprint.subtasks[].dependencies` (array of subtask IDs) and order subtasks by dependency
- Include `blueprint.subtasks[].complexity_score` (1-10) and `risk_level` (low|medium|high)
- Include `blueprint.subtasks[].security_critical` (true for auth/crypto/validation/data access)
- Include `blueprint.subtasks[].test_strategy` with unit/integration/e2e keys"
- Include `blueprint.subtasks[].test_strategy` with unit/integration/e2e keys
- Map every `VCn:` to ≥1 planned test case (prefer test name contains `vc<n>`)
- Recommended format: `path/to/test_file.ext::test_name_or_symbol`"
)
```

Expand Down
4 changes: 4 additions & 0 deletions .claude/commands/map-efficient.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,10 +109,14 @@ Task: $ARGUMENTS

Hard requirements:
- Use `blueprint.subtasks[].validation_criteria` (2-4 testable outcomes)
- Prefix each criterion with `VC1:`, `VC2:`, ... (stable references for Actor/Monitor)
- Include a concrete anchor per VC (endpoint/function + file path)
- Use `blueprint.subtasks[].dependencies` (array of subtask IDs)
- Include `complexity_score` (1-10) and `risk_level` (low|medium|high)
- Include `security_critical` (true for auth/crypto/validation)
- Include `test_strategy` with unit/integration/e2e keys
- Map every `VCn:` to ≥1 planned test case (prefer test name contains `vc<n>`)
- Recommended format: `path/to/test_file.ext::test_name_or_symbol`
- Include `aag_contract` (one-line pseudocode: Actor -> Action -> Goal)

AAG Contract format (REQUIRED per subtask):
Expand Down
Loading