Skip to content

Conversation

@richardwooding
Copy link
Contributor

Summary

This PR significantly strengthens the CEL comprehension pattern matching implementation to address the brittleness identified in issue #44. While theoretically possible for custom hand-written accumulator expressions to be misidentified, the improvements make this significantly less likely and provide better observability when edge cases occur.

Problem Statement

From issue #44: The analyzeComprehensionPattern() function uses fragile AST pattern matching to identify CEL comprehension macros. Since CEL expands comprehensions (all, exists, exists_one, filter, map) into accumulator-based expressions at parse time with no metadata, we must infer the original macro from structural patterns. This could theoretically misidentify custom accumulator expressions.

Changes Made

1. Result Expression Validation ✅

Priority: High Impact

All comprehension types now validate their result expressions:

  • all(), exists(), map(), filter(): Result must be accu (identity)
  • exists_one(): Result must be accu == 1
  • Added isIdentityResult() helper function
  • Enhanced isEqualsOneResult() to validate exact structure
  • Logs warnings when result expressions don't match expected patterns

Example:

// Before: Only checked step pattern, not result
if con.isLogicalAndStep(...) { return ComprehensionAll }

// After: Validates both step AND result
if con.isLogicalAndStep(...) {
    if !con.isIdentityResult(result, accuVar) {
        // Log warning
    }
    return ComprehensionAll
}

2. Strengthened Conditional Validation ✅

Priority: High Impact

Made conditional pattern validation much more specific:

isConditionalCountStep() (for exists_one):

  • Before: Only checked operators.Conditional && len(args) == 3
  • After: Validates then-branch is exactly accu + 1 (not accu + 2 or other)
  • After: Validates else-branch is exactly accu

isConditionalAppendStep() (for map with filter):

  • Before: Checked for any single-element list append
  • After: Rejects if appending just [iterVar] (that's a filter, not map)
  • After: Only matches if appending a transform: [expression]

isConditionalFilterStep() (for filter):

  • Before: Only checked for conditional structure
  • After: Validates then-branch appends exactly [iterVar]
  • After: Validates else-branch is accu

3. Comprehensive Documentation ✅

Priority: High Value

Added detailed comments explaining CEL's macro expansion:

// IMPORTANT: This relies on CEL's stable macro expansion patterns:
// - all(x, pred)        → accuInit=true,  step=accu&&pred,              result=accu
// - exists(x, pred)     → accuInit=false, step=accu||pred,              result=accu
// - exists_one(x, pred) → accuInit=0,     step=?:(pred,accu+1,accu),    result=accu==1
// - map(x, t)           → accuInit=[],    step=accu+[t],                result=accu
// - map(x, t, filter)   → accuInit=[],    step=?:(filter,accu+[t],accu), result=accu
// - filter(x, pred)     → accuInit=[],    step=?:(pred,accu+[x],accu),  result=accu

Makes implicit knowledge explicit for future maintainers.

4. Enhanced Edge Case Tests ✅

Priority: Medium Impact

New file: comprehensions_edge_cases_test.go with comprehensive coverage:

TestComprehensionPatternDetectionOrder (5 tests)

  • Verifies correct type detection order
  • Tests map(x, x) vs filter(x, true) disambiguation
  • Tests multiple AND/OR conditions
  • Validates complex predicates

TestComprehensionWithComplexNestedExpressions (3 tests)

  • Map with nested ternary transforms
  • Filter with nested logical expressions
  • All with deep parenthesis nesting

TestComprehensionEdgeCasesWithEmptyLists (4 tests)

  • All comprehension types with potentially empty lists
  • Ensures correct SQL generation for edge cases

TestComprehensionWithChainedOperations (4 tests)

  • Chained comprehensions: filter().map()
  • Negated comprehensions: !exists(), !all()

TestComprehensionWithVariableNameEdgeCases (3 tests)

  • Single-letter variables
  • Long variable names
  • Underscores in variable names

TestComprehensionWithMapFilter (2 tests)

  • Distinguishes map() from filter()
  • Tests simulated map-with-filter pattern

Verification

All Tests Pass ✅

$ make test
PASS
coverage: 90.8% of statements
ok      github.com/spandigital/cel2sql/v3       22.916s
PASS
coverage: 92.1% of statements
ok      github.com/spandigital/cel2sql/v3/pg    13.509s

Linting Passes ✅

$ make lint
golangci-lint run
0 issues.

Code Formatted ✅

$ make fmt
go fmt ./...
goimports -w .

Impact Assessment

Likelihood of Misidentification:

  • Standard CEL usage: Very LowExtremely Low (multi-layer validation)
  • Hand-written ASTs: Low-MediumLow (result validation catches anomalies)
  • Future CEL versions: Low (documented assumptions, stable patterns)

Benefits:

  • ✅ Early detection of malformed patterns via validation
  • ✅ Warning logs for production monitoring
  • ✅ Comprehensive edge case test coverage
  • ✅ Clear documentation of assumptions
  • ✅ Maintained backward compatibility (all existing tests pass)

Potential Issues:

  • ⚠️ Slightly stricter validation could theoretically reject valid but unusual patterns
    • Mitigation: Logs warnings instead of failing for result mismatches
    • Mitigation: Comprehensive test coverage verifies standard patterns work

Related Issues

Fixes #44 - Comprehension Pattern Matching is Brittle

Checklist

  • Code follows project style guidelines (make fmt passes)
  • Linting passes (make lint with 0 issues)
  • All tests pass (make test)
  • New tests added for edge cases (6 test suites, 22 test cases)
  • Documentation updated (comprehensive comments added)
  • Backward compatible (all existing tests pass)
  • Issue Comprehension Pattern Matching is Brittle #44 updated with findings and PR link

🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

This commit significantly improves the robustness of CEL comprehension
pattern matching to address the brittleness identified in issue #44.

## Changes Made

### 1. Result Expression Validation
- Added `isIdentityResult()` helper to validate result expressions
- Enhanced `isEqualsOneResult()` to verify exact `accu == 1` pattern
- All comprehension types now validate their result expressions match
  expected patterns (all, exists, map, filter, exists_one)
- Logs warnings when result expressions don't match expected patterns

### 2. Strengthened Conditional Validation
- `isConditionalCountStep()`: Now validates then-branch is exactly
  `accu + 1` and else-branch is `accu`
- `isConditionalAppendStep()`: Now validates then-branch appends a
  TRANSFORM (not just iterVar), distinguishing map-with-filter from filter
- `isConditionalFilterStep()`: Now validates then-branch appends exactly
  the iteration variable `[iterVar]`

### 3. Comprehensive Documentation
- Added detailed comments explaining CEL's stable macro expansion patterns
- Documents assumptions about CEL comprehension structure
- Notes that these patterns are stable across CEL versions

### 4. Enhanced Edge Case Tests
- New file: `comprehensions_edge_cases_test.go` with 6 test suites
- Tests pattern detection order (map vs filter disambiguation)
- Tests complex nested expressions (ternary, logical operations)
- Tests edge cases with empty lists
- Tests chained comprehensions
- Tests various variable naming patterns
- Tests map-with-filter vs regular filter/map distinction

## Impact

- **Robustness**: Multi-layered validation catches malformed patterns early
- **Observability**: Warning logs help identify uncertain pattern matches
- **Maintainability**: Clear documentation of CEL expansion assumptions
- **Test Coverage**: Comprehensive edge case coverage prevents regressions

## Related Issues

Fixes #44 - Comprehension Pattern Matching is Brittle

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@richardwooding richardwooding merged commit cdd65df into main Oct 31, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Comprehension Pattern Matching is Brittle

1 participant