[plan][test]: Code Review Agent Dogfooding Validation

# Problem Statement

The code-review agent test document () indicates all test cases are untested and marked as "To test (dogfooding)". This means the  agent created in issue #38 has not been validated through real-world usage.

## Current Status

All test cases (TC-1 through TC-7) have unchecked validation boxes:
- **TC-1**: Agent Configuration - NOT verified
- **TC-2**: Agent Invocation - NOT verified  
- **TC-3**: Isolated Context Execution - NOT verified
- **TC-4**: Review Execution - NOT verified
- **TC-5**: Error Handling - NOT verified
- **TC-6**: Long Context Handling - NOT verified
- **TC-7**: Comparison with Command - NOT verified

## Dogfooding Validation Fields (All TBD)

- First Use Date: TBD
- PR Tested: TBD
- Agent initialization: TBD
- Review execution: TBD
- Report quality: TBD
- Context handling: TBD
- Issues Found: TBD
- Validation Notes: TBD

## Proposed Solution

### Goal

Complete dogfooding validation of the  agent by running actual code reviews on real PRs and documenting the findings.

### Success Criteria

- [ ] TC-1: Agent YAML frontmatter correctly configured (name, description, tools, model, skills)
- [ ] TC-2: Agent successfully invoked via Task tool
- [ ] TC-3: Agent operates in isolated context (no parent conversation access)
- [ ] TC-4: Agent performs complete code review (all 3 phases of review-standard skill)
- [ ] TC-5: Agent provides helpful error messages for edge cases
- [ ] TC-6: Agent handles large diffs (>10 files, >500 lines) within timeout
- [ ] TC-7: Agent review quality comparable to command review
- [ ] All validation fields in docs/test/code-review-agent.md populated with actual findings

### Implementation Steps

**Step 1: Select PR for Testing** (Estimated: 5 min)
- Choose a recent merged PR that has meaningful code changes
- Example candidates: PR #489 (permission ordering), PR #488 (plan for permission fix), or similar

**Step 2: Run Agent Review** (Estimated: 15 min)
- Invoke  agent via Task tool
- Provide the merged PR as context
- Capture full review output

**Step 3: Validate TC-1 (Configuration)** (Estimated: 5 min)
- Verify agent file 
- Check YAML frontmatter: name, description, tools, model, skills

**Step 4: Validate TC-2 (Invocation)** (Estimated: 5 min)
- Confirm agent starts successfully
- Verify Opus model is used
- Confirm review-standard skill is loaded

**Step 5: Validate TC-3 (Isolated Context)** (Estimated: 5 min)
- Verify agent workspace is clean
- Confirm no parent conversation history leak
- Agent returns only final report

**Step 6: Validate TC-4 (Review Execution)** (Estimated: 10 min)
- Confirm branch validation (not main)
- Check changed files retrieved
- Verify full diff obtained
- Confirm all 3 review phases executed
- Review structured report quality

**Step 7: Validate TC-5 (Error Handling)** (Estimated: 10 min)
- Test main branch detection
- Test no-changes detection
- Test non-git-repo detection
- Verify error messages are clear

**Step 8: Validate TC-6 (Long Context)** (Estimated: 15 min)
- Use large PR (>10 files or >500 lines)
- Verify completion within 600s timeout
- Assess thoroughness of analysis

**Step 9: Validate TC-7 (Comparison)** (Estimated: 15 min)
- Run command-based review on same PR
- Compare Phase 1, 2, 3 findings
- Assess agent's context handling advantage

**Step 10: Document Findings** (Estimated: 15 min)
- Update docs/test/code-review-agent.md with:
  - First Use Date (actual date)
  - PR Tested (actual PR number)
  - Agent initialization findings
  - Review execution findings
  - Report quality assessment
  - Context handling assessment
  - Actual issues found (if any)
  - Validation notes

### Test Execution Commands

./tests/test-all.sh
======================================
Running all Agentize SDK tests
Shell: bash
======================================

✓ test-agentize-cli-invalid-agentize-home
✓ test-agentize-cli-missing-agentize-home
✓ test-agentize-server-feat-request
✓ test-agentize-server-filtering
✗ test-agentize-server-module-exports FAILED
✓ test-agentize-server-pr-discovery
✓ test-agentize-server-refinement
✓ test-agentize-server-session-lookup
✓ test-agentize-server-telegram-notify
✗ test-agentize-server-workers FAILED
✓ test-handsoff-session-path
✗ test-hook-permission-matching FAILED
✓ test-install-script
✓ test-lol-claude-clean
✓ test-lol-command-functions-loaded
✓ test-lol-complete-commands
✓ test-lol-complete-flags
✓ test-lol-help-text
✓ test-lol-python-cli
✓ test-lol-serve-graphql
✓ test-lol-serve
✓ test-lol-usage
✓ test-lol-version
✓ test-test-all-strict-shells
✓ test-wt-bare-repo-required
✓ test-wt-clone-basic
✓ test-wt-complete-commands
✓ test-wt-complete-flags
✓ test-wt-complete-goto-targets
✓ test-wt-goto
✓ test-wt-pathto
✓ test-wt-purge
✓ test-wt-rebase
✓ test-wt-spawn-claims-status
✓ test-wt-spawn-headless-nonblocking
✓ test-wt-zsh-completion-crash
✓ test-external-consensus-doc-planning
✓ test-lol-zsh-completion-file
✓ test-makefile-setup-zsh-completion
✓ test-plugin-manifest-valid
✓ test-wt-zsh-completion-file
✗ test-ccr-logs-permission FAILED
✓ test-external-consensus-issue-interface
✗ test-gh-credential FAILED
✓ test-lint-documentation
✓ test-lol-project-associate
✓ test-lol-project-auto-field
✓ test-lol-project-automation-write
✓ test-lol-project-automation
✓ test-lol-project-create-user
✓ test-lol-project-create
✓ test-lol-project-help
✓ test-lol-project-metadata-preservation
✓ test-lol-project-missing-metadata
✓ test-open-issue-draft-non-plan
✓ test-open-issue-update-maintains-format
✓ test-open-issue-update-mode
✓ test-open-issue-with-draft
✓ test-open-issue-without-draft
✗ test-sandbox-build FAILED
✗ test-sandbox-run-cmd-option FAILED
✓ test-sandbox-run
✗ test-sandbox-session-management FAILED
✗ test-sandbox-volume-permissions FAILED
✓ test-worktree-flag-order-after-issue
✓ test-worktree-reject-description-arg
✓ test-worktree-spawn-yolo-no-agent
✓ test-wt-cross-init-creates-main
✓ test-wt-cross-invalid-agentize-home
✓ test-wt-cross-missing-agentize-home
✓ test-wt-cross-spawn-from-linked

======================================
Test Summary for bash
======================================
Total:  71
Passed: 62
Failed: 9
======================================

Some tests failed in bash!

======================================
Running all Agentize SDK tests
Shell: zsh
======================================

✓ test-agentize-cli-invalid-agentize-home
✓ test-agentize-cli-missing-agentize-home
✓ test-agentize-server-feat-request
✓ test-agentize-server-filtering
✗ test-agentize-server-module-exports FAILED
✓ test-agentize-server-pr-discovery
✓ test-agentize-server-refinement
✓ test-agentize-server-session-lookup
✓ test-agentize-server-telegram-notify
✗ test-agentize-server-workers FAILED
✓ test-handsoff-session-path
✓ test-hook-permission-matching (skipped: hook tests only run in bash)
✓ test-install-script
✓ test-lol-claude-clean
✓ test-lol-command-functions-loaded
✓ test-lol-complete-commands
✓ test-lol-complete-flags
✓ test-lol-help-text
✓ test-lol-python-cli
✓ test-lol-serve-graphql
✓ test-lol-serve
✓ test-lol-usage
✓ test-lol-version
✓ test-test-all-strict-shells
✓ test-wt-bare-repo-required
✓ test-wt-clone-basic
✓ test-wt-complete-commands
✓ test-wt-complete-flags
✓ test-wt-complete-goto-targets
✓ test-wt-goto
✓ test-wt-pathto
✓ test-wt-purge
✓ test-wt-rebase
✓ test-wt-spawn-claims-status
✓ test-wt-spawn-headless-nonblocking
✓ test-wt-zsh-completion-crash
✓ test-external-consensus-doc-planning
✓ test-lol-zsh-completion-file
✓ test-makefile-setup-zsh-completion
✓ test-plugin-manifest-valid
✓ test-wt-zsh-completion-file
✗ test-ccr-logs-permission FAILED
✓ test-external-consensus-issue-interface
✗ test-gh-credential FAILED
✓ test-lint-documentation
✓ test-lol-project-associate
✓ test-lol-project-auto-field
✓ test-lol-project-automation-write
✓ test-lol-project-automation
✓ test-lol-project-create-user
✓ test-lol-project-create
✓ test-lol-project-help
✓ test-lol-project-metadata-preservation
✓ test-lol-project-missing-metadata
✓ test-open-issue-draft-non-plan
✓ test-open-issue-update-maintains-format
✓ test-open-issue-update-mode
✓ test-open-issue-with-draft
✓ test-open-issue-without-draft
✗ test-sandbox-build FAILED
✗ test-sandbox-run-cmd-option FAILED
✓ test-sandbox-run
✗ test-sandbox-session-management FAILED
✗ test-sandbox-volume-permissions FAILED
✓ test-worktree-flag-order-after-issue
✓ test-worktree-reject-description-arg
✓ test-worktree-spawn-yolo-no-agent
✓ test-wt-cross-init-creates-main
✓ test-wt-cross-invalid-agentize-home
✓ test-wt-cross-missing-agentize-home
✓ test-wt-cross-spawn-from-linked

======================================
Test Summary for zsh
======================================
Total:  71
Passed: 63
Failed: 8
======================================

Some tests failed in zsh!

### Estimated Total Time: 80-100 minutes

### Files to Modify

| File | Purpose |
|------|---------|
|  | Update status from "To test" to validated, populate all TBD fields |

### Related Issues

- #38 - Original code-review agent creation
- #489 - Recent PR that can serve as dogfooding target

## Test Strategy

1. **Dogfooding-first validation**: Use agentize to validate itself
2. **Real PR testing**: Test on actual merged PRs with meaningful changes
3. **Structured validation**: Follow TC-1 through TC-7 systematically
4. **Documentation update**: Populate all validation fields for future reference

## Risks and Mitigations

| Risk | Likelihood | Impact | Mitigation |
|------|------------|--------|------------|
| Agent fails to start | L | M | Verify Task tool configuration before testing |
| Review takes too long | M | L | Use timeout and document observed duration |
| Review quality poor | M | M | Compare with command-based review (TC-7) |
| Edge cases not covered | L | L | Document as "known limitations" in findings |


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[plan][test]: Code Review Agent Dogfooding Validation #490

Problem Statement

Current Status

Dogfooding Validation Fields (All TBD)

Proposed Solution

Goal

Success Criteria

Implementation Steps

Test Execution Commands

./tests/test-all.sh

Running all Agentize SDK tests
Shell: bash

======================================
Test Summary for bash

Total: 71
Passed: 62
Failed: 9

======================================
Running all Agentize SDK tests
Shell: zsh

======================================
Test Summary for zsh

Total: 71
Passed: 63
Failed: 8

Estimated Total Time: 80-100 minutes

Files to Modify

Related Issues

Test Strategy

Risks and Mitigations

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Risk	Likelihood	Impact	Mitigation
Agent fails to start	L	M	Verify Task tool configuration before testing
Review takes too long	M	L	Use timeout and document observed duration
Review quality poor	M	M	Compare with command-based review (TC-7)
Edge cases not covered	L	L	Document as "known limitations" in findings

[plan][test]: Code Review Agent Dogfooding Validation #490

Description

Problem Statement

Current Status

Dogfooding Validation Fields (All TBD)

Proposed Solution

Goal

Success Criteria

Implementation Steps

Test Execution Commands

./tests/test-all.sh

Running all Agentize SDK tests Shell: bash

====================================== Test Summary for bash

Total: 71 Passed: 62 Failed: 9

====================================== Running all Agentize SDK tests Shell: zsh

====================================== Test Summary for zsh

Total: 71 Passed: 63 Failed: 8

Estimated Total Time: 80-100 minutes

Files to Modify

Related Issues

Test Strategy

Risks and Mitigations

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Running all Agentize SDK tests
Shell: bash

======================================
Test Summary for bash

Total: 71
Passed: 62
Failed: 9

======================================
Running all Agentize SDK tests
Shell: zsh

======================================
Test Summary for zsh

Total: 71
Passed: 63
Failed: 8