Skip to content

Conversation

@ghinks
Copy link
Owner

@ghinks ghinks commented Jan 12, 2026

Summary

Implements statistical outlier detection to identify unusual PR reviews using z-scores calculated per repository on both raw metrics and engineered features.

This PR adds comprehensive outlier detection capabilities to analyze PR review quality and identify potential issues such as:

  • Rushed reviews (short review duration)
  • Oversized PRs (excessive code changes)
  • Under-reviewed code (low comment density)
  • Statistical anomalies across multiple dimensions

Key Features

1. Feature Engineering

  • Review duration: Time from PR creation to merge (detects rushed reviews)
  • Code churn: Total lines changed (additions + deletions)
  • Comment density: Comments per file and per line changed
  • Automatic computation and caching for all PRs

2. Statistical Analysis

  • Per-repository mean and standard deviation calculation
  • Z-score computation for 9 features (5 raw + 4 engineered)
  • Configurable threshold (default: |z-score| > 2 for 95% confidence)
  • Minimum sample size requirement (default: 30 merged PRs)

3. Outlier Detection

  • Identifies PRs that are statistical outliers
  • Tracks which specific features triggered outlier status
  • Stores z-scores for all features in database
  • Provides maximum absolute z-score for prioritization

4. New CLI Command

review-classify detect-outliers REPOSITORY [OPTIONS]

Options:
  --threshold, -t FLOAT    Z-score threshold (default: 2.0)
  --min-samples INT        Minimum PRs required (default: 30)
  --format, -f TEXT        Output: table, json, csv (default: table)
  --verbose, -v            Enable verbose output

5. Multiple Output Formats

  • Table: Human-readable ASCII table with PR numbers and outlier features
  • JSON: Structured data with all z-scores for programmatic analysis
  • CSV: Simple export format for spreadsheets

Database Schema

New Tables

  • prfeatures: Stores computed features for each PR
  • proutlierscore: Stores z-scores and outlier flags

Testing

  • ✅ 33 unit and integration tests, all passing
  • ✅ Type checking (mypy --strict): passing
  • ✅ Linting (ruff): passing
  • ✅ Tested with real expressjs/express data

Example Usage

# Detect outliers with default settings
review-classify detect-outliers expressjs/express --min-samples 5 --verbose

# Output:
# Analyzing outliers for expressjs/express...
# Computing features...
# Computed features for 44 PRs
# Detecting outliers...
# Found 2 outliers out of 7 PRs (28.6%)
#
# Outlier Pull Requests
# ====================================================================================================
# PR #       Max |Z|      Outlier Features
# ----------------------------------------------------------------------------------------------------
# #6236      2.27         additions, deletions, changed_files, code_churn
# #6211      2.10         comments, comment_density_per_file
# ----------------------------------------------------------------------------------------------------
# Total outliers: 2 out of 7 PRs (28.6%)

Technical Implementation

Modules Added

  • src/review_classification/features/engineering.py: Feature computation
  • src/review_classification/analysis/statistics.py: Statistical functions
  • src/review_classification/analysis/outlier_detector.py: Detection logic
  • src/review_classification/cli/output.py: Result formatting

Database Functions

  • save_pr_features(): Upsert computed features
  • get_pr_features(): Retrieve features for a PR
  • get_outlier_scores(): Query outlier results

Verification

After merging, users can:

  1. Run outlier detection on any repository with sufficient data
  2. Query results via MCP JDBC tools for analysis
  3. Export results in JSON/CSV for further processing
  4. Adjust threshold for different sensitivity levels

🤖 Generated with Claude Code

ghinks and others added 3 commits January 12, 2026 07:49
Implement statistical outlier detection to identify unusual PR reviews using
z-scores calculated per repository on both raw metrics and engineered features.

Key components:
- Feature engineering: review_duration, code_churn, comment_density
- Statistical analysis: per-repository mean/std calculation, z-score computation
- Outlier detection: flags PRs when |z-score| > 2 (configurable threshold)
- CLI command: detect-outliers with table/json/csv output formats
- Comprehensive test suite: 33 unit and integration tests, all passing

This enables data-driven identification of rushed reviews, oversized PRs,
under-reviewed code, and other review quality issues.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Added type ignore comment for max_abs_z_score ordering and improved
type annotations in test fixtures for better type safety.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Copy link
Owner Author

@ghinks ghinks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initial review good

@ghinks ghinks merged commit b0608a7 into main Jan 13, 2026
1 check passed
@ghinks ghinks deleted the claud/z-score-calculator branch January 13, 2026 11:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants