Problem
Claude marks tasks as "completed" without visual verification. Browser testing only occurs in:
- Phase 3: Quality Check (optional reviewer agents)
- Phase 4: Ship It (screenshots for PR)
- Later via
/workflows:review
By then, multiple tasks may be "done" but actually broken. Claude is "blind" during implementation and guesses completion.
Current flow
Task loop → mark complete → next task → ... → Quality Check → Ship It
↑ ↑
(blind guess) (first time seeing UI)
Proposed
Add browser verification IN the task loop for UI tasks:
Task loop:
- Implement
- Run tests
- IF UI task: agent-browser verify → fix if broken
- THEN mark complete
Suggested Change
In Phase 2 "Task Execution Loop", after "Run tests after changes", add:
- For UI tasks: run agent-browser to verify visually before marking complete
- If visual issues found, fix immediately (don't mark complete yet)
This catches issues per-task instead of accumulating blind failures until Phase 4.