Skip to content

Conversation

@teerthsharma
Copy link

Summary

This PR adds GeometricSparseAttention, a new modular layer that enables data-dependent sparse attention using geometric upper bounds.
Unlike static sparse patterns (e.g., Sliding Window, BigBird), this layer uses AETHER (Adaptive Event-driven Threshold Hybrid Entangled Rendering) logic to dynamically prune blocks at runtime based on the Cauchy-Schwarz inequality.

Mathematical Guarantee

The pruning is safe because it relies on the geometric upper bound:
$$\max_{k \in B} (q \cdot k) \le q \cdot \mu_B + |q| \cdot r_B$$
If this upper bound is below the threshold $\tau$, the entire block $B$ can be skipped with mathematical certainty that no high-scoring keys exist within it.

Key Features

  • Drop-in Replacement: Can be swapped into existing Gemma/LLaMA configs using pz.select().at_instances_of(pz.nn.Attention).apply(...).
  • Adaptive Threshold: Includes the epsilon and phi state parameters that self-tune the sparsity level based on input entropy.
  • JAX/Penzai Native: Fully compatible with NamedArray and Treescope visualization.

Verification

  • Added tests/nn/geometric_attention_test.py with 13 comprehensive tests.
  • Verified exact match with dense attention when threshold=0.
  • Confirmed JAX transformations (jit, vmap) work correctly.

Implements GeometricSparseAttention layer with Cauchy-Schwarz block scoring for sub-linear attention complexity. Includes full named-axis support, adaptive thresholding, and comprehensive tests.
@google-cla
Copy link

google-cla bot commented Jan 20, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant