-
-
Notifications
You must be signed in to change notification settings - Fork 308
perf(ccusage): add timestamp cache for faster file processing #766
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Add a persistent timestamp cache to dramatically improve performance when loading usage data from JSONL files. Previously, every run needed to read all files to extract timestamps for sorting. Now timestamps are cached and only updated when files change. Key optimizations: - Cache file timestamps to ~/.config/claude/.ccusage/timestamp-cache.json - Use file mtime to detect when cache entries are stale - Early filter files by date range before sorting (using --since/--until) - Only read first 4KB of files to extract timestamps - Batch process files with controlled concurrency (50 at a time) Performance improvement on 8600+ files: - Full data query: 28.2s → 8.4s (3.4x faster) - With --since filter: 11.4s → 8.1s (1.4x faster) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
📝 WalkthroughWalkthroughA new timestamp-cache module is introduced to optimize JSONL file processing with a persistent in-memory and disk-backed cache for per-file timestamp metadata. The data-loader module is updated to leverage this cache for date-range filtering and cached sorting across multiple data loading paths. Changes
Sequence DiagramsequenceDiagram
participant DataLoader as Data Loader
participant TimestampCache as Timestamp Cache
participant FileSystem as File System
DataLoader->>TimestampCache: filterFilesByDateRange(files, since, until)
activate TimestampCache
TimestampCache->>TimestampCache: Load or initialize cache from disk
loop For each file (batch with concurrency limit)
TimestampCache->>FileSystem: stat(file) to check mtime
alt Cache valid (mtime matches)
TimestampCache->>TimestampCache: Use cached timestamps
else Cache stale or missing
TimestampCache->>FileSystem: Read JSONL file (limited portions)
TimestampCache->>TimestampCache: Extract first and last timestamps
TimestampCache->>TimestampCache: Update in-memory cache entry
end
end
TimestampCache->>FileSystem: Debounced save of updated cache to disk
TimestampCache-->>DataLoader: Return filtered files within date range
deactivate TimestampCache
DataLoader->>TimestampCache: sortFilesByTimestampCached(files)
activate TimestampCache
TimestampCache->>TimestampCache: Use cached timestamp data from prior load
TimestampCache-->>DataLoader: Return files sorted by timestamp
deactivate TimestampCache
DataLoader->>DataLoader: Process sorted, filtered files
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
apps/ccusage/src/_timestamp-cache.ts (2)
42-43: Cache location doesn't respectCLAUDE_CONFIG_DIRenvironment variable.The cache path is hardcoded to
~/.config/claude/.ccusage/usingDEFAULT_CLAUDE_CONFIG_PATH. If users setCLAUDE_CONFIG_DIRto a custom location, the cache will still be stored in the default location rather than alongside their data.Consider deriving the cache location dynamically based on the first valid Claude path, or document this as a known limitation.
245-247: Consider the implications ofDate.now()fallback for mtime.If
statfails, usingDate.now()as mtime means this entry will never be cache-hit on subsequent calls (sinceDate.now()will differ each time). This is likely acceptable as stat failures are rare, but worth noting that such files won't benefit from caching.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
apps/ccusage/src/_timestamp-cache.tsapps/ccusage/src/data-loader.ts
🧰 Additional context used
📓 Path-based instructions (7)
apps/ccusage/src/**/*.ts
📄 CodeRabbit inference engine (apps/ccusage/CLAUDE.md)
apps/ccusage/src/**/*.ts: Write tests in-source usingif (import.meta.vitest != null)blocks instead of separate test files
Use Vitest globals (describe,it,expect) without imports in test blocks
In tests, use current Claude 4 models (sonnet-4, opus-4)
Usefs-fixturewithcreateFixture()to simulate Claude data in tests
Only export symbols that are actually used by other modules
Do not use console.log; use the logger utilities fromsrc/logger.tsinstead
Files:
apps/ccusage/src/_timestamp-cache.tsapps/ccusage/src/data-loader.ts
apps/ccusage/**/*.ts
📄 CodeRabbit inference engine (apps/ccusage/CLAUDE.md)
apps/ccusage/**/*.ts: NEVER useawait import()dynamic imports anywhere (especially in tests)
Prefer@praha/byethrowResult type for error handling instead of try-catch
Use.tsextensions for local imports (e.g.,import { foo } from './utils.ts')
Files:
apps/ccusage/src/_timestamp-cache.tsapps/ccusage/src/data-loader.ts
**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{ts,tsx,js,jsx}: Use ESLint for linting and formatting with tab indentation and double quotes
No console.log allowed except where explicitly disabled with eslint-disable; use logger.ts instead
Use file paths with Node.js path utilities for cross-platform compatibility
Use variables starting with lowercase (camelCase) for variable names
Can use UPPER_SNAKE_CASE for constants
Files:
apps/ccusage/src/_timestamp-cache.tsapps/ccusage/src/data-loader.ts
**/*.ts{,x}
📄 CodeRabbit inference engine (CLAUDE.md)
Use TypeScript with strict mode and bundler module resolution
Files:
apps/ccusage/src/_timestamp-cache.tsapps/ccusage/src/data-loader.ts
**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{ts,tsx}: Use.tsextensions for local file imports (e.g.,import { foo } from './utils.ts')
Prefer @praha/byethrow Result type over traditional try-catch for functional error handling
UseResult.try()for wrapping operations that may throw (JSON parsing, etc.)
UseResult.isFailure()for checking errors (more readable than!Result.isSuccess())
Use early return pattern (if (Result.isFailure(result)) continue;) instead of ternary operators when checking Results
Keep traditional try-catch only for file I/O with complex error handling or legacy code that's hard to refactor
Always useResult.isFailure()andResult.isSuccess()type guards for better code clarity
Use uppercase (PascalCase) for type names
Only export constants, functions, and types that are actually used by other modules - internal constants used only within the same file should NOT be exported
In-source testing pattern: write tests directly in source files usingif (import.meta.vitest != null)blocks
CRITICAL: DO NOT useawait import()dynamic imports anywhere in the codebase - this causes tree-shaking issues
CRITICAL: Never use dynamic imports withawait import()in vitest test blocks - this is particularly problematic for test execution
Vitest globals (describe,it,expect) are enabled and available without imports since globals are configured
Create mock data usingfs-fixturewithcreateFixture()for Claude data directory simulation in tests
All test files must use current Claude 4 models (claude-sonnet-4-20250514, claude-opus-4-20250514), not outdated Claude 3 models
Model names in tests must exactly match LiteLLM's pricing database entries
Files:
apps/ccusage/src/_timestamp-cache.tsapps/ccusage/src/data-loader.ts
**/*.{ts,tsx,json}
📄 CodeRabbit inference engine (CLAUDE.md)
Claude model naming convention:
claude-{model-type}-{generation}-{date}(e.g.,claude-sonnet-4-20250514, NOTclaude-4-sonnet-20250514)
Files:
apps/ccusage/src/_timestamp-cache.tsapps/ccusage/src/data-loader.ts
**/data-loader.ts
📄 CodeRabbit inference engine (CLAUDE.md)
Silently skip malformed JSONL lines during parsing in data loading operations
Files:
apps/ccusage/src/data-loader.ts
🧠 Learnings (10)
📓 Common learnings
Learnt from: CR
Repo: ryoppippi/ccusage PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-25T14:42:34.734Z
Learning: Applies to **/data-loader.ts : Silently skip malformed JSONL lines during parsing in data loading operations
📚 Learning: 2025-09-18T16:06:37.474Z
Learnt from: CR
Repo: ryoppippi/ccusage PR: 0
File: apps/ccusage/CLAUDE.md:0-0
Timestamp: 2025-09-18T16:06:37.474Z
Learning: Applies to apps/ccusage/src/**/*.ts : Use `fs-fixture` with `createFixture()` to simulate Claude data in tests
Applied to files:
apps/ccusage/src/_timestamp-cache.tsapps/ccusage/src/data-loader.ts
📚 Learning: 2025-11-25T14:42:34.734Z
Learnt from: CR
Repo: ryoppippi/ccusage PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-25T14:42:34.734Z
Learning: Applies to **/*.{ts,tsx} : Create mock data using `fs-fixture` with `createFixture()` for Claude data directory simulation in tests
Applied to files:
apps/ccusage/src/_timestamp-cache.ts
📚 Learning: 2025-09-18T16:06:37.474Z
Learnt from: CR
Repo: ryoppippi/ccusage PR: 0
File: apps/ccusage/CLAUDE.md:0-0
Timestamp: 2025-09-18T16:06:37.474Z
Learning: Applies to apps/ccusage/src/**/*.ts : Write tests in-source using `if (import.meta.vitest != null)` blocks instead of separate test files
Applied to files:
apps/ccusage/src/_timestamp-cache.ts
📚 Learning: 2025-09-18T16:07:16.293Z
Learnt from: CR
Repo: ryoppippi/ccusage PR: 0
File: apps/codex/CLAUDE.md:0-0
Timestamp: 2025-09-18T16:07:16.293Z
Learning: Tests should use fs-fixture with using to ensure cleanup
Applied to files:
apps/ccusage/src/_timestamp-cache.ts
📚 Learning: 2025-11-25T14:42:34.734Z
Learnt from: CR
Repo: ryoppippi/ccusage PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-25T14:42:34.734Z
Learning: Applies to **/*.{ts,tsx} : In-source testing pattern: write tests directly in source files using `if (import.meta.vitest != null)` blocks
Applied to files:
apps/ccusage/src/_timestamp-cache.ts
📚 Learning: 2025-09-17T18:29:15.764Z
Learnt from: CR
Repo: ryoppippi/ccusage PR: 0
File: apps/mcp/CLAUDE.md:0-0
Timestamp: 2025-09-17T18:29:15.764Z
Learning: Applies to apps/mcp/**/*.{test,spec}.ts : Use `fs-fixture` for mock data in tests of MCP server functionality
Applied to files:
apps/ccusage/src/_timestamp-cache.ts
📚 Learning: 2025-09-18T16:06:37.474Z
Learnt from: CR
Repo: ryoppippi/ccusage PR: 0
File: apps/ccusage/CLAUDE.md:0-0
Timestamp: 2025-09-18T16:06:37.474Z
Learning: Applies to apps/ccusage/src/**/*.ts : Use Vitest globals (`describe`, `it`, `expect`) without imports in test blocks
Applied to files:
apps/ccusage/src/_timestamp-cache.ts
📚 Learning: 2025-09-18T16:06:37.474Z
Learnt from: CR
Repo: ryoppippi/ccusage PR: 0
File: apps/ccusage/CLAUDE.md:0-0
Timestamp: 2025-09-18T16:06:37.474Z
Learning: Applies to apps/ccusage/**/*.ts : Use `.ts` extensions for local imports (e.g., `import { foo } from './utils.ts'`)
Applied to files:
apps/ccusage/src/data-loader.ts
📚 Learning: 2025-09-18T16:06:37.474Z
Learnt from: CR
Repo: ryoppippi/ccusage PR: 0
File: apps/ccusage/CLAUDE.md:0-0
Timestamp: 2025-09-18T16:06:37.474Z
Learning: Applies to apps/ccusage/src/**/*.ts : Do not use console.log; use the logger utilities from `src/logger.ts` instead
Applied to files:
apps/ccusage/src/data-loader.ts
🧬 Code graph analysis (2)
apps/ccusage/src/_timestamp-cache.ts (1)
apps/ccusage/src/_consts.ts (1)
DEFAULT_CLAUDE_CONFIG_PATH(56-56)
apps/ccusage/src/data-loader.ts (1)
apps/ccusage/src/_timestamp-cache.ts (2)
filterFilesByDateRange(291-337)sortFilesByTimestampCached(342-364)
🔇 Additional comments (8)
apps/ccusage/src/data-loader.ts (4)
47-50: LGTM!The import follows coding guidelines with
.tsextension and imports only the required functions from the new timestamp-cache module.
771-780: LGTM!Good optimization pattern: filtering by date range first reduces the number of files that need to be sorted. The cached timestamp sorting then benefits from not needing to re-read files on subsequent runs.
920-939: LGTM!Good use of Set and Map for O(1) lookups when filtering and rejoining the files with their base directories. The pattern efficiently preserves the file-to-baseDir association through the filtering and sorting operations.
1368-1376: LGTM!Consistent application of the early date-range filtering and cached sorting pattern across all data loading paths.
apps/ccusage/src/_timestamp-cache.ts (4)
12-17: LGTM!Imports follow the established patterns in the codebase. Uses
.tsextension for local imports as per coding guidelines.
54-82: LGTM!Good use of
Result.tryfor error handling and version checking for cache compatibility. The lazy loading pattern with null check is appropriate.
291-337: LGTM!The date range filtering logic correctly handles:
- Early return when no filters specified
- File date range overlaps with filter range
- Files without timestamps (included for safety)
The YYYYMMDD parsing relies on caller validation, which is consistent with the existing codebase pattern.
374-444: LGTM!Tests follow coding guidelines:
- In-source testing with
if (import.meta.vitest != null)block- Uses Vitest globals without imports
- Uses
await usingwithfs-fixturefor proper cleanup- Clears memory cache in
beforeEachfor test isolation
| async function extractFirstTimestamp(filePath: string): Promise<string | null> { | ||
| const readResult = await Result.try({ | ||
| try: async () => { | ||
| // Read only first 4KB - should contain multiple lines | ||
| const fd = await readFile(filePath, { encoding: 'utf-8', flag: 'r' }); | ||
| const firstChunk = fd.slice(0, 4096); | ||
| const lines = firstChunk.split('\n').filter(l => l.trim().length > 0); | ||
|
|
||
| for (const line of lines) { | ||
| try { | ||
| const json = JSON.parse(line) as Record<string, unknown>; | ||
| if (json.timestamp != null && typeof json.timestamp === 'string') { | ||
| return json.timestamp; | ||
| } | ||
| } | ||
| catch { | ||
| // Skip invalid JSON lines | ||
| } | ||
| } | ||
| return null; | ||
| }, | ||
| catch: () => null, | ||
| })(); | ||
|
|
||
| return Result.isSuccess(readResult) ? readResult.value : null; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
readFile loads entire file into memory, defeating the 4KB optimization.
The comment states "Read only first 4KB" but readFile reads the entire file content, then slices it. For a 600MB JSONL file, this loads all 600MB into memory.
To truly read only the first 4KB, use a file handle with a limited read:
🔎 Proposed fix using file handle
async function extractFirstTimestamp(filePath: string): Promise<string | null> {
const readResult = await Result.try({
try: async () => {
- // Read only first 4KB - should contain multiple lines
- const fd = await readFile(filePath, { encoding: 'utf-8', flag: 'r' });
- const firstChunk = fd.slice(0, 4096);
+ // Read only first 4KB using file handle
+ const { open } = await import('node:fs/promises');
+ const handle = await open(filePath, 'r');
+ try {
+ const buffer = Buffer.alloc(4096);
+ const { bytesRead } = await handle.read(buffer, 0, 4096, 0);
+ const firstChunk = buffer.toString('utf-8', 0, bytesRead);
+ } finally {
+ await handle.close();
+ }
const lines = firstChunk.split('\n').filter(l => l.trim().length > 0);Note: The open import could be hoisted to the top-level imports to avoid dynamic import. As per coding guidelines, avoid await import().
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In apps/ccusage/src/_timestamp-cache.ts around lines 157 to 182, the code
currently calls readFile which loads the entire file into memory and then slices
it; replace that with opening the file via fs.open (imported at top), create a
Buffer of 4096 bytes, call handle.read to read up to 4096 bytes into the buffer,
convert only the bytesRead portion to string, split and parse lines to find the
timestamp, then close the handle in a finally block; ensure errors still return
null like before and avoid dynamic import by hoisting the fs/promises open
import to top-level.
| async function extractLastTimestamp(filePath: string): Promise<string | null> { | ||
| const readResult = await Result.try({ | ||
| try: async () => { | ||
| const content = await readFile(filePath, 'utf-8'); | ||
| // Read last 4KB for latest timestamp | ||
| const lastChunk = content.slice(-4096); | ||
| const lines = lastChunk.split('\n').filter(l => l.trim().length > 0).reverse(); | ||
|
|
||
| for (const line of lines) { | ||
| try { | ||
| const json = JSON.parse(line) as Record<string, unknown>; | ||
| if (json.timestamp != null && typeof json.timestamp === 'string') { | ||
| return json.timestamp; | ||
| } | ||
| } | ||
| catch { | ||
| // Skip invalid JSON lines | ||
| } | ||
| } | ||
| return null; | ||
| }, | ||
| catch: () => null, | ||
| })(); | ||
|
|
||
| return Result.isSuccess(readResult) ? readResult.value : null; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue: reads entire file to extract last 4KB.
Like extractFirstTimestamp, this reads the entire file content. For the last timestamp, you could use file stats to get the size, then seek to size - 4096 and read from there.
This is the same pattern issue as the first timestamp extraction.
🤖 Prompt for AI Agents
In apps/ccusage/src/_timestamp-cache.ts around lines 187 to 212, the current
extractLastTimestamp implementation reads the entire file into memory then
slices the last 4KB; change it to use file system calls to avoid loading the
whole file: use fs.promises.stat to get file size, compute start = Math.max(0,
size - 4096), open the file with fs.promises.open, allocate a Buffer of length
(size - start) or 4096, use filehandle.read to read only that range into the
buffer, close the handle, convert the buffer to string, split into lines and
scan in reverse for a JSON-parsable object with a string timestamp, and preserve
the existing Result.try/Result.isSuccess error handling (return null on errors).
Summary
This PR adds a persistent timestamp cache to dramatically improve performance when loading usage data from JSONL files.
Problem: With large numbers of JSONL files (8600+ files, 863MB), ccusage was very slow because it needed to read every file to extract timestamps for sorting on each run.
Solution:
~/.config/claude/.ccusage/timestamp-cache.json--since/--until)Performance Results
Tested on 8642 JSONL files (863MB):
--sincefilterChanges
apps/ccusage/src/_timestamp-cache.ts- New cache module with tests (445 lines)apps/ccusage/src/data-loader.ts- Integrate cache into data loading functionsTest plan
🤖 Generated with Claude Code
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.