Skip to content

[plan][test]: Code Review Agent Dogfooding Validation #490

@Entropy-xcy

Description

@Entropy-xcy

Problem Statement

The code-review agent test document () indicates all test cases are untested and marked as "To test (dogfooding)". This means the agent created in issue #38 has not been validated through real-world usage.

Current Status

All test cases (TC-1 through TC-7) have unchecked validation boxes:

  • TC-1: Agent Configuration - NOT verified
  • TC-2: Agent Invocation - NOT verified
  • TC-3: Isolated Context Execution - NOT verified
  • TC-4: Review Execution - NOT verified
  • TC-5: Error Handling - NOT verified
  • TC-6: Long Context Handling - NOT verified
  • TC-7: Comparison with Command - NOT verified

Dogfooding Validation Fields (All TBD)

  • First Use Date: TBD
  • PR Tested: TBD
  • Agent initialization: TBD
  • Review execution: TBD
  • Report quality: TBD
  • Context handling: TBD
  • Issues Found: TBD
  • Validation Notes: TBD

Proposed Solution

Goal

Complete dogfooding validation of the agent by running actual code reviews on real PRs and documenting the findings.

Success Criteria

  • TC-1: Agent YAML frontmatter correctly configured (name, description, tools, model, skills)
  • TC-2: Agent successfully invoked via Task tool
  • TC-3: Agent operates in isolated context (no parent conversation access)
  • TC-4: Agent performs complete code review (all 3 phases of review-standard skill)
  • TC-5: Agent provides helpful error messages for edge cases
  • TC-6: Agent handles large diffs (>10 files, >500 lines) within timeout
  • TC-7: Agent review quality comparable to command review
  • All validation fields in docs/test/code-review-agent.md populated with actual findings

Implementation Steps

Step 1: Select PR for Testing (Estimated: 5 min)

Step 2: Run Agent Review (Estimated: 15 min)

  • Invoke agent via Task tool
  • Provide the merged PR as context
  • Capture full review output

Step 3: Validate TC-1 (Configuration) (Estimated: 5 min)

  • Verify agent file
  • Check YAML frontmatter: name, description, tools, model, skills

Step 4: Validate TC-2 (Invocation) (Estimated: 5 min)

  • Confirm agent starts successfully
  • Verify Opus model is used
  • Confirm review-standard skill is loaded

Step 5: Validate TC-3 (Isolated Context) (Estimated: 5 min)

  • Verify agent workspace is clean
  • Confirm no parent conversation history leak
  • Agent returns only final report

Step 6: Validate TC-4 (Review Execution) (Estimated: 10 min)

  • Confirm branch validation (not main)
  • Check changed files retrieved
  • Verify full diff obtained
  • Confirm all 3 review phases executed
  • Review structured report quality

Step 7: Validate TC-5 (Error Handling) (Estimated: 10 min)

  • Test main branch detection
  • Test no-changes detection
  • Test non-git-repo detection
  • Verify error messages are clear

Step 8: Validate TC-6 (Long Context) (Estimated: 15 min)

  • Use large PR (>10 files or >500 lines)
  • Verify completion within 600s timeout
  • Assess thoroughness of analysis

Step 9: Validate TC-7 (Comparison) (Estimated: 15 min)

  • Run command-based review on same PR
  • Compare Phase 1, 2, 3 findings
  • Assess agent's context handling advantage

Step 10: Document Findings (Estimated: 15 min)

  • Update docs/test/code-review-agent.md with:
    • First Use Date (actual date)
    • PR Tested (actual PR number)
    • Agent initialization findings
    • Review execution findings
    • Report quality assessment
    • Context handling assessment
    • Actual issues found (if any)
    • Validation notes

Test Execution Commands

./tests/test-all.sh

Running all Agentize SDK tests
Shell: bash

✓ test-agentize-cli-invalid-agentize-home
✓ test-agentize-cli-missing-agentize-home
✓ test-agentize-server-feat-request
✓ test-agentize-server-filtering
✗ test-agentize-server-module-exports FAILED
✓ test-agentize-server-pr-discovery
✓ test-agentize-server-refinement
✓ test-agentize-server-session-lookup
✓ test-agentize-server-telegram-notify
✗ test-agentize-server-workers FAILED
✓ test-handsoff-session-path
✗ test-hook-permission-matching FAILED
✓ test-install-script
✓ test-lol-claude-clean
✓ test-lol-command-functions-loaded
✓ test-lol-complete-commands
✓ test-lol-complete-flags
✓ test-lol-help-text
✓ test-lol-python-cli
✓ test-lol-serve-graphql
✓ test-lol-serve
✓ test-lol-usage
✓ test-lol-version
✓ test-test-all-strict-shells
✓ test-wt-bare-repo-required
✓ test-wt-clone-basic
✓ test-wt-complete-commands
✓ test-wt-complete-flags
✓ test-wt-complete-goto-targets
✓ test-wt-goto
✓ test-wt-pathto
✓ test-wt-purge
✓ test-wt-rebase
✓ test-wt-spawn-claims-status
✓ test-wt-spawn-headless-nonblocking
✓ test-wt-zsh-completion-crash
✓ test-external-consensus-doc-planning
✓ test-lol-zsh-completion-file
✓ test-makefile-setup-zsh-completion
✓ test-plugin-manifest-valid
✓ test-wt-zsh-completion-file
✗ test-ccr-logs-permission FAILED
✓ test-external-consensus-issue-interface
✗ test-gh-credential FAILED
✓ test-lint-documentation
✓ test-lol-project-associate
✓ test-lol-project-auto-field
✓ test-lol-project-automation-write
✓ test-lol-project-automation
✓ test-lol-project-create-user
✓ test-lol-project-create
✓ test-lol-project-help
✓ test-lol-project-metadata-preservation
✓ test-lol-project-missing-metadata
✓ test-open-issue-draft-non-plan
✓ test-open-issue-update-maintains-format
✓ test-open-issue-update-mode
✓ test-open-issue-with-draft
✓ test-open-issue-without-draft
✗ test-sandbox-build FAILED
✗ test-sandbox-run-cmd-option FAILED
✓ test-sandbox-run
✗ test-sandbox-session-management FAILED
✗ test-sandbox-volume-permissions FAILED
✓ test-worktree-flag-order-after-issue
✓ test-worktree-reject-description-arg
✓ test-worktree-spawn-yolo-no-agent
✓ test-wt-cross-init-creates-main
✓ test-wt-cross-invalid-agentize-home
✓ test-wt-cross-missing-agentize-home
✓ test-wt-cross-spawn-from-linked

======================================
Test Summary for bash

Total: 71
Passed: 62
Failed: 9

Some tests failed in bash!

======================================
Running all Agentize SDK tests
Shell: zsh

✓ test-agentize-cli-invalid-agentize-home
✓ test-agentize-cli-missing-agentize-home
✓ test-agentize-server-feat-request
✓ test-agentize-server-filtering
✗ test-agentize-server-module-exports FAILED
✓ test-agentize-server-pr-discovery
✓ test-agentize-server-refinement
✓ test-agentize-server-session-lookup
✓ test-agentize-server-telegram-notify
✗ test-agentize-server-workers FAILED
✓ test-handsoff-session-path
✓ test-hook-permission-matching (skipped: hook tests only run in bash)
✓ test-install-script
✓ test-lol-claude-clean
✓ test-lol-command-functions-loaded
✓ test-lol-complete-commands
✓ test-lol-complete-flags
✓ test-lol-help-text
✓ test-lol-python-cli
✓ test-lol-serve-graphql
✓ test-lol-serve
✓ test-lol-usage
✓ test-lol-version
✓ test-test-all-strict-shells
✓ test-wt-bare-repo-required
✓ test-wt-clone-basic
✓ test-wt-complete-commands
✓ test-wt-complete-flags
✓ test-wt-complete-goto-targets
✓ test-wt-goto
✓ test-wt-pathto
✓ test-wt-purge
✓ test-wt-rebase
✓ test-wt-spawn-claims-status
✓ test-wt-spawn-headless-nonblocking
✓ test-wt-zsh-completion-crash
✓ test-external-consensus-doc-planning
✓ test-lol-zsh-completion-file
✓ test-makefile-setup-zsh-completion
✓ test-plugin-manifest-valid
✓ test-wt-zsh-completion-file
✗ test-ccr-logs-permission FAILED
✓ test-external-consensus-issue-interface
✗ test-gh-credential FAILED
✓ test-lint-documentation
✓ test-lol-project-associate
✓ test-lol-project-auto-field
✓ test-lol-project-automation-write
✓ test-lol-project-automation
✓ test-lol-project-create-user
✓ test-lol-project-create
✓ test-lol-project-help
✓ test-lol-project-metadata-preservation
✓ test-lol-project-missing-metadata
✓ test-open-issue-draft-non-plan
✓ test-open-issue-update-maintains-format
✓ test-open-issue-update-mode
✓ test-open-issue-with-draft
✓ test-open-issue-without-draft
✗ test-sandbox-build FAILED
✗ test-sandbox-run-cmd-option FAILED
✓ test-sandbox-run
✗ test-sandbox-session-management FAILED
✗ test-sandbox-volume-permissions FAILED
✓ test-worktree-flag-order-after-issue
✓ test-worktree-reject-description-arg
✓ test-worktree-spawn-yolo-no-agent
✓ test-wt-cross-init-creates-main
✓ test-wt-cross-invalid-agentize-home
✓ test-wt-cross-missing-agentize-home
✓ test-wt-cross-spawn-from-linked

======================================
Test Summary for zsh

Total: 71
Passed: 63
Failed: 8

Some tests failed in zsh!

Estimated Total Time: 80-100 minutes

Files to Modify

File Purpose
Update status from "To test" to validated, populate all TBD fields

Related Issues

Test Strategy

  1. Dogfooding-first validation: Use agentize to validate itself
  2. Real PR testing: Test on actual merged PRs with meaningful changes
  3. Structured validation: Follow TC-1 through TC-7 systematically
  4. Documentation update: Populate all validation fields for future reference

Risks and Mitigations

Risk Likelihood Impact Mitigation
Agent fails to start L M Verify Task tool configuration before testing
Review takes too long M L Use timeout and document observed duration
Review quality poor M M Compare with command-based review (TC-7)
Edge cases not covered L L Document as "known limitations" in findings

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions