Skip to content

Conversation

@LunaStev
Copy link
Member

This PR significantly refactors the expression parsing logic to implement a standard C-style operator precedence hierarchy. It replaces the previous parsing structure with a dedicated recursive descent approach, ensuring that complex expressions are evaluated in the correct order. Additionally, it introduces support for binary literals and fixes several type-related issues in the LLVM code generation for logical and bitwise operations.

Key Changes

1. Expression Parsing & Precedence

  • 13-Level Precedence Hierarchy: Implemented a complete top-down parsing structure following industry standards (Assignment → Logical → Bitwise → Equality → Relational → Shift → Additive → Multiplicative → Unary → Primary).
  • Unary Refactoring: Consolidated prefix operators (!, ~, &, deref) into a dedicated recursive function, allowing for nested unary expressions (e.g., !!x).
  • Grouped Expressions: Added Expression::Grouped to properly handle and preserve parenthesized sub-expressions.

2. Lexer Improvements

  • Binary Literals: Added support for the 0b prefix (e.g., 0b1010), using from_str_radix for accurate conversion to i64.
  • Shift Operator Parsing: Moved << and >> from keyword-based matching to character-level matching in the lexer. Fixed the collision where < was matched before <<.

3. LLVM Code Generation & Type Safety

  • Logical Coercion: Added a to_bool() helper to handle integer-to-boolean coercion, ensuring logical AND/OR operations work correctly with multi-bit integers.
  • Bitwise Operations: Fully implemented BitwiseAnd, BitwiseOr, and BitwiseXor using native LLVM instructions.
  • Shift Type Casting: Fixed type mismatch errors by ensuring the shift amount is explicitly cast to match the type of the value being shifted.
  • Unary NOT Logic:
    • Enhanced LogicalNot (!) to correctly compare integers with zero.
    • Implemented BitwiseNot (~) using the LLVM not (XOR with -1) logic.

4. Code Quality & Consistency

  • AST Cleanup: Added Operator::Not for consistency across unary operations.
  • Signature Simplification: Cleaned up parser function signatures using std::iter::Peekable for better readability and performance.

Operator Precedence (Highest to Lowest)

  1. Primary: Literals, Identifiers, Grouped ()
  2. Unary: !, ~, &, deref
  3. Multiplicative: *, /, %
  4. Additive: +, -
  5. Shift: <<, >>
  6. Relational: <, <=, >, >=
  7. Equality: ==, !=
  8. Bitwise AND: &
  9. Bitwise XOR: ^
  10. Bitwise OR: |
  11. Logical AND: &&
  12. Logical OR: ||
  13. Assignment: =, +=, -=, *=, /=, %=

Examples of Improved Behavior

// Now parsed correctly as: a + (b * c)
var result = a + b * c; 

// Now parsed correctly as: (a << 2) + 1
var shift = a << 2 + 1; 

// Boolean coercion: converts 1 to true before &&
if (1 && true) { ... } 

// Binary literal support
var mask = 0b1100_1010;

Benefits

  • Compliance: Matches C/C++ operator standards.
  • Maintainability: Clear separation of concerns within the parsing logic makes it easier to add new operators.
  • Robustness: Explicit type casting and boolean coercion prevent common LLVM IR validation errors.

Restructure expression parsing to follow standard C operator precedence
with dedicated functions for each precedence level.

Changes:
- Implement complete operator precedence hierarchy:
  1. Assignment (=, +=, -=, *=, /=)
  2. Logical OR (||)
  3. Logical AND (&&)
  4. Bitwise OR (|)
  5. Bitwise XOR (^)
  6. Bitwise AND (&)
  7. Equality (==, !=)
  8. Relational (<, <=, >, >=)
  9. Shift (<<, >>)
  10. Additive (+, -)
  11. Multiplicative (*, /, %)
  12. Unary (!, ~, &, deref)
  13. Primary (literals, identifiers, function calls)
- Add binary literal support in lexer:
  - Parse 0b prefix for binary numbers (0b1010, 0b0101)
  - Convert binary strings to i64 using from_str_radix
  - Format lexeme as "0b{binary_digits}"
- Move shift operators (<<, >>) to character-level parsing:
  - Parse << and >> directly in lexer char matching
  - Remove from identifier-based keyword matching
  - Check for << before <= in '<' handler
  - Check for >> before >= in '>' handler
- Refactor unary expression parsing:
  - Move all prefix operators to parse_unary_expression
  - Handle !, ~, &, deref in single dedicated function
  - Parse unary operators recursively (e.g., !!x, ~!x)
- Fix logical operator code generation:
  - Add to_bool() helper for boolean coercion
  - Convert integer values to i1 before logical AND/OR
  - Handle i1 types without unnecessary conversions
- Improve bitwise operator codegen:
  - Add missing BitwiseAnd, BitwiseOr, BitwiseXor implementations
  - Generate and, or, xor LLVM instructions
  - Properly handle operator in binary expression match
- Fix shift operation type casting:
  - Cast shift amount to match shifted value type
  - Prevent type mismatch errors in build_left_shift/build_right_shift
  - Use build_int_cast for explicit type conversion
- Enhance unary NOT operators:
  - LogicalNot (!): Compare with zero for multi-bit integers
  - BitwiseNot (~): Use LLVM's build_not instruction
  - Handle i1 boolean types specially in logical NOT
- Add Operator::Not variant to AST for consistency
- Add Expression::Grouped for parenthesized expressions
- Simplify parser function signatures with std::iter::Peekable

Benefits:
- Correct operator precedence matching C/C++ standards
- Clear separation of concerns in parsing logic
- Easier to maintain and extend with new operators
- Proper type handling in all binary/unary operations

Example precedence:
  a + b * c      // * before +
  a << 2 + 1     // + before
  a & b == c     // == before &
  !a || b && c   // ! > && > ||

Signed-off-by: LunaStev <luna@lunastev.org>
@LunaStev LunaStev self-assigned this Dec 24, 2025
@LunaStev LunaStev merged commit c52c394 into wavefnd:master Dec 24, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant