feat: Strengthen comprehension pattern matching validation (fixes #44) #89

richardwooding · 2025-10-31T06:34:13Z

Summary

This PR significantly strengthens the CEL comprehension pattern matching implementation to address the brittleness identified in issue #44. While theoretically possible for custom hand-written accumulator expressions to be misidentified, the improvements make this significantly less likely and provide better observability when edge cases occur.

Problem Statement

From issue #44: The analyzeComprehensionPattern() function uses fragile AST pattern matching to identify CEL comprehension macros. Since CEL expands comprehensions (all, exists, exists_one, filter, map) into accumulator-based expressions at parse time with no metadata, we must infer the original macro from structural patterns. This could theoretically misidentify custom accumulator expressions.

Changes Made

1. Result Expression Validation ✅

Priority: High Impact

All comprehension types now validate their result expressions:

all(), exists(), map(), filter(): Result must be accu (identity)
exists_one(): Result must be accu == 1
Added isIdentityResult() helper function
Enhanced isEqualsOneResult() to validate exact structure
Logs warnings when result expressions don't match expected patterns

Example:

// Before: Only checked step pattern, not result
if con.isLogicalAndStep(...) { return ComprehensionAll }

// After: Validates both step AND result
if con.isLogicalAndStep(...) {
    if !con.isIdentityResult(result, accuVar) {
        // Log warning
    }
    return ComprehensionAll
}

2. Strengthened Conditional Validation ✅

Priority: High Impact

Made conditional pattern validation much more specific:

isConditionalCountStep() (for exists_one):

Before: Only checked operators.Conditional && len(args) == 3
After: Validates then-branch is exactly accu + 1 (not accu + 2 or other)
After: Validates else-branch is exactly accu

isConditionalAppendStep() (for map with filter):

Before: Checked for any single-element list append
After: Rejects if appending just [iterVar] (that's a filter, not map)
After: Only matches if appending a transform: [expression]

isConditionalFilterStep() (for filter):

Before: Only checked for conditional structure
After: Validates then-branch appends exactly [iterVar]
After: Validates else-branch is accu

3. Comprehensive Documentation ✅

Priority: High Value

Added detailed comments explaining CEL's macro expansion:

// IMPORTANT: This relies on CEL's stable macro expansion patterns:
// - all(x, pred)        → accuInit=true,  step=accu&&pred,              result=accu
// - exists(x, pred)     → accuInit=false, step=accu||pred,              result=accu
// - exists_one(x, pred) → accuInit=0,     step=?:(pred,accu+1,accu),    result=accu==1
// - map(x, t)           → accuInit=[],    step=accu+[t],                result=accu
// - map(x, t, filter)   → accuInit=[],    step=?:(filter,accu+[t],accu), result=accu
// - filter(x, pred)     → accuInit=[],    step=?:(pred,accu+[x],accu),  result=accu

Makes implicit knowledge explicit for future maintainers.

4. Enhanced Edge Case Tests ✅

Priority: Medium Impact

New file: comprehensions_edge_cases_test.go with comprehensive coverage:

TestComprehensionPatternDetectionOrder (5 tests)

Verifies correct type detection order
Tests map(x, x) vs filter(x, true) disambiguation
Tests multiple AND/OR conditions
Validates complex predicates

TestComprehensionWithComplexNestedExpressions (3 tests)

Map with nested ternary transforms
Filter with nested logical expressions
All with deep parenthesis nesting

TestComprehensionEdgeCasesWithEmptyLists (4 tests)

All comprehension types with potentially empty lists
Ensures correct SQL generation for edge cases

TestComprehensionWithChainedOperations (4 tests)

Chained comprehensions: filter().map()
Negated comprehensions: !exists(), !all()

TestComprehensionWithVariableNameEdgeCases (3 tests)

Single-letter variables
Long variable names
Underscores in variable names

TestComprehensionWithMapFilter (2 tests)

Distinguishes map() from filter()
Tests simulated map-with-filter pattern

Verification

All Tests Pass ✅

$ make test
PASS
coverage: 90.8% of statements
ok      github.com/spandigital/cel2sql/v3       22.916s
PASS
coverage: 92.1% of statements
ok      github.com/spandigital/cel2sql/v3/pg    13.509s

Linting Passes ✅

$ make lint
golangci-lint run
0 issues.

Code Formatted ✅

$ make fmt
go fmt ./...
goimports -w .

Impact Assessment

Likelihood of Misidentification:

Standard CEL usage: ~~Very Low~~ → Extremely Low (multi-layer validation)
Hand-written ASTs: ~~Low-Medium~~ → Low (result validation catches anomalies)
Future CEL versions: Low (documented assumptions, stable patterns)

Benefits:

✅ Early detection of malformed patterns via validation
✅ Warning logs for production monitoring
✅ Comprehensive edge case test coverage
✅ Clear documentation of assumptions
✅ Maintained backward compatibility (all existing tests pass)

Potential Issues:

⚠️ Slightly stricter validation could theoretically reject valid but unusual patterns
- Mitigation: Logs warnings instead of failing for result mismatches
- Mitigation: Comprehensive test coverage verifies standard patterns work

Related Issues

Fixes #44 - Comprehension Pattern Matching is Brittle

Checklist

Code follows project style guidelines (make fmt passes)
Linting passes (make lint with 0 issues)
All tests pass (make test)
New tests added for edge cases (6 test suites, 22 test cases)
Documentation updated (comprehensive comments added)
Backward compatible (all existing tests pass)
Issue Comprehension Pattern Matching is Brittle #44 updated with findings and PR link

🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

This commit significantly improves the robustness of CEL comprehension pattern matching to address the brittleness identified in issue #44. ## Changes Made ### 1. Result Expression Validation - Added `isIdentityResult()` helper to validate result expressions - Enhanced `isEqualsOneResult()` to verify exact `accu == 1` pattern - All comprehension types now validate their result expressions match expected patterns (all, exists, map, filter, exists_one) - Logs warnings when result expressions don't match expected patterns ### 2. Strengthened Conditional Validation - `isConditionalCountStep()`: Now validates then-branch is exactly `accu + 1` and else-branch is `accu` - `isConditionalAppendStep()`: Now validates then-branch appends a TRANSFORM (not just iterVar), distinguishing map-with-filter from filter - `isConditionalFilterStep()`: Now validates then-branch appends exactly the iteration variable `[iterVar]` ### 3. Comprehensive Documentation - Added detailed comments explaining CEL's stable macro expansion patterns - Documents assumptions about CEL comprehension structure - Notes that these patterns are stable across CEL versions ### 4. Enhanced Edge Case Tests - New file: `comprehensions_edge_cases_test.go` with 6 test suites - Tests pattern detection order (map vs filter disambiguation) - Tests complex nested expressions (ternary, logical operations) - Tests edge cases with empty lists - Tests chained comprehensions - Tests various variable naming patterns - Tests map-with-filter vs regular filter/map distinction ## Impact - **Robustness**: Multi-layered validation catches malformed patterns early - **Observability**: Warning logs help identify uncertain pattern matches - **Maintainability**: Clear documentation of CEL expansion assumptions - **Test Coverage**: Comprehensive edge case coverage prevents regressions ## Related Issues Fixes #44 - Comprehension Pattern Matching is Brittle 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

richardwooding mentioned this pull request Oct 31, 2025

Comprehension Pattern Matching is Brittle #44

Closed

richardwooding merged commit cdd65df into main Oct 31, 2025
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Strengthen comprehension pattern matching validation (fixes #44) #89

feat: Strengthen comprehension pattern matching validation (fixes #44) #89

Uh oh!

richardwooding commented Oct 31, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: Strengthen comprehension pattern matching validation (fixes #44) #89

feat: Strengthen comprehension pattern matching validation (fixes #44) #89

Uh oh!

Conversation

richardwooding commented Oct 31, 2025

Summary

Problem Statement

Changes Made

1. Result Expression Validation ✅

2. Strengthened Conditional Validation ✅

3. Comprehensive Documentation ✅

4. Enhanced Edge Case Tests ✅

Verification

All Tests Pass ✅

Linting Passes ✅

Code Formatted ✅

Impact Assessment

Related Issues

Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant