Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 2 additions & 7 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -49,9 +49,8 @@ jobs:
fi
done

# TODO: Re-enable after fixing failing tests
# - name: Run tests
# run: bun test
- name: Run tests
run: bun test

- name: Run typecheck
run: bun run typecheck
Expand Down Expand Up @@ -84,10 +83,6 @@ jobs:
cp -r .claude config-staging/
cp -r .opencode config-staging/
mkdir -p config-staging/.github
cp -r .github/agents config-staging/.github/
cp -r .github/hooks config-staging/.github/
cp -r .github/prompts config-staging/.github/ 2>/dev/null || true
cp -r .github/scripts config-staging/.github/
cp -r .github/skills config-staging/.github/
cp CLAUDE.md config-staging/
cp AGENTS.md config-staging/
Expand Down
4 changes: 0 additions & 4 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -23,10 +23,6 @@
"src",
".claude",
".opencode",
".github/agents",
".github/hooks",
".github/prompts",
".github/scripts",
".github/skills",
"CLAUDE.md",
"AGENTS.md"
Expand Down
255 changes: 255 additions & 0 deletions research/docs/2026-02-12-bun-test-failures-root-cause-analysis.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,255 @@
---
date: 2026-02-12 03:57:25 UTC
researcher: Copilot CLI
git_commit: f9603b88b96c47859073b0647d2c6b7d95057f8d
branch: lavaman131/hotfix/opentui-distribution
repository: atomic
topic: "Root cause analysis of 104 bun test failures across 6 error categories"
tags: [research, testing, bun-errors, theme, agents, claude-sdk, tool-renderers, ui]
status: complete
last_updated: 2026-02-12
last_updated_by: Copilot CLI
---

# Research: Bun Test Failures Root Cause Analysis

## Research Question

Research how to resolve the 104 failing `bun test` errors documented in `bun_errors.txt`, by exploring the codebase in detail to identify root causes for each error category.

## Summary

104 tests fail across 6 distinct categories. In every case, **the source code was updated but the corresponding tests were not**. The tests contain stale expectations that no longer match the current implementation. Below is a category-by-category root cause analysis with exact file paths and line numbers.

## Detailed Findings

### Category 1: Builtin Agent `model` Field Missing (~30 tests)

**Root Cause**: The `AgentDefinition` interface defines an optional `model?: AgentModel` field, but **none of the builtin agent definitions include a `model` property**. Tests expect `agent.model` to be `"opus"` but it's `undefined`.

**Source**: [`src/ui/commands/agent-commands.ts`](https://github.com/flora131/atomic/blob/f9603b88b96c47859073b0647d2c6b7d95057f8d/src/ui/commands/agent-commands.ts)

- **Interface**: Lines 175-225 β€” `AgentDefinition` has `model?: AgentModel` (line ~205)
- **Agent Definitions**: The `BUILTIN_AGENTS` array contains agents like `debugger` (line ~1085), `codebase-analyzer`, `codebase-locator`, `codebase-pattern-finder`, `codebase-online-researcher`, `codebase-research-analyzer`, `codebase-research-locator` β€” **none include `model: "opus"`**
- **`getBuiltinAgent()`**: Lines 1158-1163 β€” correctly finds agents by name, but since `model` is absent from definitions, `agent.model` is `undefined`

**Affected Tests**:
- `tests/e2e/subagent-debugger.test.ts` β€” 14 tests checking `agent.model === "opus"`
- `tests/e2e/subagent-codebase-analyzer.test.ts` β€” 6 tests checking `agent.model === "opus"`
- `tests/ui/commands/agent-commands.test.ts` β€” ~10 tests across all agent types

**Resolution**: Either add `model: "opus"` to each builtin agent definition in `BUILTIN_AGENTS`, or update the tests to not expect `model` to be defined (if model selection is intended to be dynamic).

---

### Category 2: Sub-agent `sentMessages` Empty (~20 tests)

**Root Cause**: `createAgentCommand().execute()` calls `context.spawnSubagent()` (fire-and-forget via `void`) and returns `{ success: true }` immediately. It **never calls `context.sendMessage()` or `context.sendSilentMessage()`**. The mock context's `sentMessages` array only tracks those two methods, so it remains empty.

**Source**: [`src/ui/commands/agent-commands.ts`](https://github.com/flora131/atomic/blob/f9603b88b96c47859073b0647d2c6b7d95057f8d/src/ui/commands/agent-commands.ts)

- **`createAgentCommand()`**: Lines 1495-1532
```typescript
execute: (args, context) => {
void context.spawnSubagent({...}).then(...).catch(...);
return { success: true }; // Returns immediately
}
```
- **Mock Context**: `tests/e2e/subagent-debugger.test.ts` lines 121-191
- `sentMessages` tracks `sendMessage()` / `sendSilentMessage()` calls
- `spawnSubagent()` resolves successfully but doesn't add to `sentMessages`

**Affected Tests**:
- `tests/e2e/subagent-debugger.test.ts` β€” tests asserting `context.sentMessages.length > 0` or `context.sentMessages[0].toContain(...)`
- `tests/e2e/subagent-codebase-analyzer.test.ts` β€” same pattern

**Resolution**: Tests need to be updated to check `spawnSubagent` was called (e.g., via a spy or tracking array) instead of checking `sentMessages`. Alternatively, the mock's `spawnSubagent` could populate `sentMessages`.

---

### Category 3: Theme Color Mismatches (~12 tests)

**Root Cause**: The source theme uses **Catppuccin palette** colors, but tests expect **Tailwind CSS palette** colors. The theme was changed but tests were not updated.

**Source**: [`src/ui/theme.tsx`](https://github.com/flora131/atomic/blob/f9603b88b96c47859073b0647d2c6b7d95057f8d/src/ui/theme.tsx)

| Property | Source (Catppuccin) | Test Expected (Tailwind) |
|----------|-------------------|------------------------|
| **Dark Theme** | | |
| `background` | `#1e1e2e` (Mocha Base) | `black` |
| `foreground` | `#cdd6f4` (Mocha Text) | `#ecf2f8` |
| `error` | `#f38ba8` (Mocha Red) | `#fb7185` (Rose 400) |
| `success` | `#a6e3a1` (Mocha Green) | `#4ade80` (Green 400) |
| `warning` | `#f9e2af` (Mocha Yellow) | `#fbbf24` (Amber 400) |
| `userMessage` | `#89b4fa` (Mocha Blue) | `#60a5fa` (Blue 400) |
| `assistantMessage` | `#94e2d5` (Mocha Teal) | `#2dd4bf` |
| `systemMessage` | `#cba6f7` (Mocha Mauve) | `#a78bfa` (Violet 400) |
| `userBubbleBg` | `#313244` (Mocha Surface0) | `#3f3f46` |
| **Light Theme** | | |
| `background` | `#eff1f5` (Latte Base) | `white` |
| `foreground` | `#4c4f69` (Latte Text) | `#0f172a` |
| `error` | `#d20f39` (Latte Red) | `#e11d48` (Rose 600) |
| `success` | `#40a02b` (Latte Green) | `#16a34a` (Green 600) |
| `warning` | `#df8e1d` (Latte Yellow) | `#d97706` (Amber 600) |
| `userMessage` | `#1e66f5` (Latte Blue) | `#2563eb` (Blue 600) |
| `assistantMessage` | `#179299` (Latte Teal) | `#0d9488` |
| `systemMessage` | `#8839ef` (Latte Mauve) | `#7c3aed` (Violet 600) |
| `userBubbleBg` | `#e6e9ef` (Latte Mantle) | `#e2e8f0` |

**Affected Tests**:
- `tests/ui/theme.test.ts` β€” lines 59-155 (dark/light theme color assertions, getMessageColor)
- `tests/ui/components/tool-result.test.tsx` β€” lines 61-67 (error color assertions)

**Resolution**: Update all test color values to match the Catppuccin palette values from the source.

---

### Category 4: Tool Renderer Icon Mismatches (~8 tests)

**Root Cause**: Tool renderers use **ASCII/Unicode symbols** in source, but tests expect **emoji icons**. The source was changed but tests were not updated.

**Source**: [`src/ui/tools/registry.ts`](https://github.com/flora131/atomic/blob/f9603b88b96c47859073b0647d2c6b7d95057f8d/src/ui/tools/registry.ts)

| Tool | Source Icon (Actual) | Test Expected Icon |
|------|---------------------|-------------------|
| Read | `≑` (line 64) | `πŸ“„` |
| Edit | `β–³` (line 133) | `β–³` βœ… Match |
| Bash | `$` (line 187) | `πŸ’»` |
| Write | `β–Ί` (line 258) | `πŸ“` |
| Glob | `β—†` (line 314) | `πŸ”` |
| Grep | `β˜…` (line 402) | `πŸ”Ž` |
| Default | `β–Ά` (line 465) | `πŸ”§` |

**Affected Tests**:
- `tests/ui/tools/registry.test.ts` β€” lines 34, 134, 187, 249, 291, 331
- `tests/ui/components/tool-result.test.tsx` β€” lines 306, 314, 322, 330, 338, 346

**Resolution**: Update test expectations to match the actual Unicode symbol icons, or update the source to use emojis if that was the intended design.

---

### Category 5: Claude SDK / HITL Integration (~6 tests)

**Root Cause**: `createSession()` **no longer calls `query()`** internally. A previous refactoring removed the initial empty-prompt query to fix a leaked subprocess issue. The comment in source explains: _"Don't create an initial query here β€” send()/stream() each create their own query with the actual user message. Previously an empty-prompt query was spawned here, which leaked a Claude Code subprocess that was never consumed."_

**Source**: [`src/sdk/claude-client.ts`](https://github.com/flora131/atomic/blob/f9603b88b96c47859073b0647d2c6b7d95057f8d/src/sdk/claude-client.ts)

- **`createSession()`**: Lines 752-768 β€” calls `this.wrapQuery(null, sessionId, config)` without invoking `query()`
- **`query()` only called by**: `send()` (line 392), `stream()` (line 454), `summarize()` (line 599), `resumeSession()` (line 805)
- **`canUseTool` callback**: Created inside `buildSdkOptions()` (lines 237-297), only attached when `query()` is called
- **Mock setup**: `tests/sdk/claude-client.test.ts` line 166 β€” expects `mockQuery` called after `createSession()`, which no longer happens
- **HITL mock**: `tests/sdk/ask-user-question-hitl.test.ts` β€” captures `canUseToolCallback` during `query()` setup, but since `query()` isn't called during `createSession()`, callback remains `null`

**Affected Tests**:
- `tests/sdk/claude-client.test.ts` β€” 3 tests expecting `mockQuery.toHaveBeenCalled()` after `createSession()`
- `tests/sdk/ask-user-question-hitl.test.ts` β€” 3 tests expecting `canUseToolCallback` not null after `createSession()`

**Resolution**: Tests need to call `session.send()` or `session.stream()` after `createSession()` to trigger `query()`. Mock responses need to be set up so `query()` completes properly.

---

### Category 6: Misc UI Test Failures (~8 tests)

#### 6a. `truncate()` Function (2 tests)

**Root Cause**: `truncateText()` uses `"..."` (three periods) instead of `"…"` (single ellipsis character), and uses `maxLength - 3` for the slice which breaks at small limits.

**Source**: [`src/ui/utils/format.ts`](https://github.com/flora131/atomic/blob/f9603b88b96c47859073b0647d2c6b7d95057f8d/src/ui/utils/format.ts) lines 144-147
```typescript
export function truncateText(text: string, maxLength: number = 40): string {
if (text.length <= maxLength) return text;
return `${text.slice(0, maxLength - 3)}...`;
}
```
- Export alias at line 57: `export const truncate = truncateText;`

**Test**: `src/ui/__tests__/task-list-indicator.test.ts` lines 89-101
- Expects `truncate("Hello, World!", 5)` β†’ `"Hell…"` (gets `"He..."`)
- Expects `truncate("ab", 1)` β†’ `"…"` (gets `"..."` with negative slice)

**Resolution**: Either update the function to use `"…"` and handle edge cases, or update tests to match current `"..."` behavior.

#### 6b. `buildDisplayParts()` Duration Formatting (2 tests)

**Root Cause**: `formatDuration()` uses `Math.floor()` for seconds, discarding sub-second precision.

**Source**: [`src/ui/utils/format.ts`](https://github.com/flora131/atomic/blob/f9603b88b96c47859073b0647d2c6b7d95057f8d/src/ui/utils/format.ts) line ~68
```typescript
const seconds = Math.floor(ms / 1000); // 1500 β†’ 1, not 1.5
return { text: `${seconds}s`, ms };
```

**Test**: `tests/ui/components/timestamp-display.test.tsx` lines 74-87
- Expects `buildDisplayParts(ts, 1500)` to include `"1.5s"` β€” actually returns `"1s"`
- Expects `buildDisplayParts(ts, 1000)` edge case handling

**Resolution**: Either update `formatDuration()` to show decimal seconds (e.g., `(ms / 1000).toFixed(1)`), or update tests to match current floor-based behavior.

#### 6c. Command Registration (2 tests)

**Root Cause**: Tests expect a "commit" command with alias "ci" to be registered, but no such command exists in the codebase.

**Source**:
- [`src/ui/commands/builtin-commands.ts`](https://github.com/flora131/atomic/blob/f9603b88b96c47859073b0647d2c6b7d95057f8d/src/ui/commands/builtin-commands.ts) lines 551-560 β€” registered commands: help, theme, clear, compact, exit, model, mcp, context
- [`src/ui/commands/skill-commands.ts`](https://github.com/flora131/atomic/blob/f9603b88b96c47859073b0647d2c6b7d95057f8d/src/ui/commands/skill-commands.ts) lines 1113-1135 β€” skills: research-codebase, create-spec, explain-code

**Test**: `tests/ui/commands/index.test.ts` line 79
- Expects `globalRegistry.has("ci")` to be `true`
- Expects `globalRegistry.has("commit")` to be `true`

**Resolution**: Either add a "commit" command with "ci" alias, or remove the test assertion for a command that doesn't exist.

#### 6d. `globalRegistry` Population (1 test)

**Root Cause**: Related to 6c β€” `tests/ui/index.test.ts` line 596 expects `globalRegistry` to be populated with specific commands that may not all be registered.

**Resolution**: Align test expectations with actually registered commands.

---

## Code References

- `src/ui/commands/agent-commands.ts:175-225` β€” `AgentDefinition` interface with optional `model` field
- `src/ui/commands/agent-commands.ts:1085-1150` β€” Debugger agent definition (no `model` property)
- `src/ui/commands/agent-commands.ts:1158-1163` β€” `getBuiltinAgent()` function
- `src/ui/commands/agent-commands.ts:1495-1532` β€” `createAgentCommand()` function
- `src/ui/theme.tsx:219-271` β€” Dark and light theme color definitions (Catppuccin)
- `src/ui/tools/registry.ts:64-465` β€” Tool renderer definitions with ASCII icons
- `src/sdk/claude-client.ts:237-297` β€” `buildSdkOptions()` with `canUseTool`
- `src/sdk/claude-client.ts:752-768` β€” `createSession()` without `query()` call
- `src/ui/utils/format.ts:144-147` β€” `truncateText()` function
- `src/ui/utils/format.ts:68` β€” `formatDuration()` with `Math.floor()`
- `src/ui/commands/builtin-commands.ts:551-560` β€” Registered builtin commands
- `src/ui/commands/skill-commands.ts:1113-1135` β€” Skill definitions
- `src/ui/commands/registry.ts:64-118` β€” `CommandContext` interface

## Architecture Documentation

The test failures reveal a pattern of source code evolution without test synchronization:

1. **Theme system** migrated from Tailwind CSS palette to Catppuccin palette
2. **Tool renderers** changed from emoji icons to Unicode symbols for better terminal compatibility
3. **Claude SDK integration** was refactored to fix subprocess leaks by deferring `query()` to message sending
4. **Agent command system** moved from `sendMessage`-based to `spawnSubagent`-based execution
5. **Agent model field** was added to the type system but not populated in definitions
6. **Command registry** evolved but new/removed commands weren't reflected in tests

## Historical Context (from research/)

- `research/docs/2026-02-04-agent-subcommand-parity-audit.md` β€” Audit of agent subcommand parity
- `research/docs/2026-02-03-command-migration-notes.md` β€” Notes on command system migration
- `research/docs/2026-01-31-claude-agent-sdk-research.md` β€” Claude Agent SDK research
- `research/docs/2026-01-31-claude-implementation-analysis.md` β€” Claude implementation analysis
- `research/docs/2026-02-05-subagent-ui-opentui-independent-context.md` β€” Sub-agent UI context design

## Related Research

- `research/docs/2026-02-01-chat-tui-parity-implementation.md`
- `research/docs/2026-02-05-model-command-header-update-research.md`

## Open Questions

1. **Agent model intent**: Should all builtin agents have `model: "opus"`, or is model selection intended to be dynamic/configurable at runtime?
2. **Icon design**: Were emojis intentionally replaced with Unicode symbols for terminal compatibility, or was this an incomplete migration?
3. **Commit command**: Was the "commit"/"ci" command removed intentionally, or is it planned but not yet implemented?
4. **Duration precision**: Should `formatDuration()` show sub-second precision (e.g., `1.5s`), or is integer seconds the desired behavior?
Loading
Loading