Skip to content

Add self-contained error tracking system#21

Merged
mensfeld merged 3 commits intomasterfrom
error-tracking-system
Jan 28, 2026
Merged

Add self-contained error tracking system#21
mensfeld merged 3 commits intomasterfrom
error-tracking-system

Conversation

@mensfeld
Copy link
Owner

Overview

Implements a comprehensive error tracking system for hookshot that captures application errors (not operational dispatch failures), with deduplication, aggregation, and an admin UI with resolution workflow.

Key Principle: This tracker focuses on unexpected application failures in controllers, non-dispatch jobs, and services. Dispatch errors remain operational cases handled by existing Delivery model logic.

Features

Core Components

  1. Error Capture Service (Admin::Errors::Capture)

    • Generates fingerprint from error class + normalized message
    • Deduplicates by fingerprint with occurrence counting
    • Cleans backtraces (removes gem paths, limits to 50 lines)
    • Sanitizes context (removes passwords/tokens, 10KB limit)
    • Defensive error handling prevents capture failures from breaking app
  2. Background Job (Admin::ErrorCaptureJob)

    • Async error capture via Solid Queue (non-blocking)
    • Uses discard_on to prevent infinite retry loops
    • Survives Puma restarts and is monitored via Mission Control
  3. Database Model (Admin::ErrorRecord)

    • Stores: error_class, message, backtrace, context (JSON), fingerprint
    • Tracks: occurrences_count, first_occurred_at, last_occurred_at, resolved_at
    • Scopes: unresolved, resolved, recent_first
    • Indexes on fingerprint (unique), resolved_at, last_occurred_at
  4. Admin UI (Available at /errors)

    • Three-tab view: Unresolved / Resolved / All
    • List view with occurrence badges (red if >10, yellow otherwise)
    • Detail view with backtrace highlighting, context JSON, timeline
    • Actions: Resolve, Unresolve, Delete, Delete All Resolved
  5. Application Integration

    • Controllers: rescue_from in ApplicationController (production only)
    • Jobs: around_perform in ApplicationJob (excludes DispatchJob)
    • Exclusions: DispatchJob errors are operational, tracked via Delivery model

Maintenance

  • Rake task: rake errors:cleanup deletes resolved errors older than 30 days
  • Suggested cron: 0 2 * * * cd /app && bundle exec rake errors:cleanup

Implementation Details

Fingerprint Normalization

  • Numbers → N
  • UUIDs → UUID
  • Hex addresses → 0xHEX
  • Temp paths → /tmp/PATH

Safety Measures

  1. Defensive rescue blocks in capture service
  2. Async capture via Solid Queue (non-blocking)
  3. Job discard_on prevents retry loops
  4. Context sanitization removes sensitive keys
  5. Controller capture only in production mode

Database Migration

Creates error_records table with:

  • Fingerprint (unique index for deduplication)
  • Occurrence tracking (count + timestamps)
  • Resolution workflow (resolved_at timestamp)
  • Indexes for query performance

Testing

Verified functionality:

  • ✅ Error capture and storage
  • ✅ Deduplication by fingerprint
  • ✅ Occurrence count increments
  • ✅ Async job processing
  • ✅ Rake cleanup task
  • ✅ Routes and UI paths

Routes

  • GET /errors - List errors with tab filtering
  • GET /errors/:id - Error detail view
  • POST /errors/:id/resolve - Mark as resolved
  • POST /errors/:id/unresolve - Mark as unresolved
  • DELETE /errors/:id - Delete single error
  • DELETE /errors/destroy_all - Bulk delete resolved errors

Future Enhancements

  • Email notifications on critical errors (occurrences > threshold)
  • Statistics badge showing unresolved count in navbar
  • Search/filtering by error_class or date range
  • CSV export for reporting

Implements comprehensive error tracking for application-level errors
with deduplication, aggregation, and admin UI workflow.

## Core Features

- **Error Capture Service**: Fingerprints errors by class + normalized
  message, deduplicates by fingerprint, increments occurrence counts
- **Async Background Job**: Non-blocking error capture via Solid Queue
- **Database Model**: Stores error class, message, backtrace, context,
  occurrence count, timestamps, and resolution status
- **Admin UI**: Three-tab view (Unresolved/Resolved/All) with pagination,
  detail views with backtrace highlighting, resolve/unresolve actions
- **Application Integration**: Captures errors from controllers (production
  only) and background jobs (excluding DispatchJob operational errors)
- **Maintenance**: Rake task for cleanup of resolved errors older than 30 days

## Implementation Details

- Fingerprinting normalizes IDs, UUIDs, hex addresses, temp paths
- Backtrace cleaned of gem paths, limited to 50 lines
- Context sanitized (removes passwords/tokens, truncates at 10KB)
- Defensive error handling prevents capture failures from breaking app
- Job discard_on prevents infinite retry loops
- Available at `/errors` path with HTTP Basic Auth

## Database

- Migration adds error_records table with indexes on fingerprint,
  resolved_at, and last_occurred_at for query performance
- JSON context field stores metadata (controller/action, job args, etc.)

Addresses application error visibility without duplicating existing
operational dispatch error tracking in Delivery model.
Simplifies naming to match the /errors route path:
- Admin::ErrorRecordsController → Admin::ErrorsController
- app/views/admin/error_records/ → app/views/admin/errors/
- Routes now point to admin/errors#action
- Layout checks controller_name == 'errors'

The model remains Admin::ErrorRecord (singular) as it represents
individual error records in the database.
Major changes:
- **Rails 8 Integration**: Replace manual rescue_from/around_perform with
  Rails.error.subscribe() using Admin::Errors::Subscriber
- **Model Namespace**: Move ErrorRecord out of Admin namespace to match
  other models (Webhook, Target, etc.)
- **Comprehensive Tests**: Add 73 specs covering models, services, jobs,
  and controllers with 94.3% line coverage
- **Documentation**: Add YARD docs to all public methods and classes
- **Bug Fixes**: Fix fingerprint normalization order (UUIDs before numbers)

## Rails 8 Error Subscriber Benefits
- Automatic capture from ALL Rails executions (controllers, jobs, console)
- Centralized error handling in one subscriber
- No need to manually instrument each error source
- Survives through Rails upgrades

## Test Coverage
- ErrorRecord model: validations, scopes, methods
- Admin::Errors::Capture: deduplication, sanitization, fingerprinting
- Admin::Errors::Subscriber: Rails error reporting integration
- Admin::ErrorCaptureJob: async processing
- Admin::ErrorsController: CRUD + resolution workflow
- Request specs: authentication, tab filtering, bulk operations

## Code Quality
- All tests pass (256 examples, 0 failures)
- Rubocop clean (81 files, 0 offenses)
- YARD-lint compliant
- 94.3% line coverage, 86.11% branch coverage
@mensfeld mensfeld merged commit 0258728 into master Jan 28, 2026
5 checks passed
@mensfeld mensfeld deleted the error-tracking-system branch January 28, 2026 09:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant