feat: Add Geometric Sparse Attention (AETHER) #134

teerthsharma · 2026-01-20T06:41:08Z

Summary

This PR adds GeometricSparseAttention, a new modular layer that enables data-dependent sparse attention using geometric upper bounds.
Unlike static sparse patterns (e.g., Sliding Window, BigBird), this layer uses AETHER (Adaptive Event-driven Threshold Hybrid Entangled Rendering) logic to dynamically prune blocks at runtime based on the Cauchy-Schwarz inequality.

Mathematical Guarantee

The pruning is safe because it relies on the geometric upper bound:
$$\max_{k \in B} (q \cdot k) \le q \cdot \mu_B + |q| \cdot r_B$$
If this upper bound is below the threshold $\tau$, the entire block $B$ can be skipped with mathematical certainty that no high-scoring keys exist within it.

Key Features

Drop-in Replacement: Can be swapped into existing Gemma/LLaMA configs using pz.select().at_instances_of(pz.nn.Attention).apply(...).
Adaptive Threshold: Includes the epsilon and phi state parameters that self-tune the sparsity level based on input entropy.
JAX/Penzai Native: Fully compatible with NamedArray and Treescope visualization.

Verification

Added tests/nn/geometric_attention_test.py with 13 comprehensive tests.
Verified exact match with dense attention when threshold=0.
Confirmed JAX transformations (jit, vmap) work correctly.

Implements GeometricSparseAttention layer with Cauchy-Schwarz block scoring for sub-linear attention complexity. Includes full named-axis support, adaptive thresholding, and comprehensive tests.

google-cla · 2026-01-20T06:41:13Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

feat: Add AETHER Geometric Sparse Attention module

6698595

Implements GeometricSparseAttention layer with Cauchy-Schwarz block scoring for sub-linear attention complexity. Includes full named-axis support, adaptive thresholding, and comprehensive tests.

teerthsharma mentioned this pull request Jan 20, 2026

[RFC] Geometric Sparse Attention Layer (AETHER) #133

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add Geometric Sparse Attention (AETHER) #134

feat: Add Geometric Sparse Attention (AETHER) #134

Uh oh!

teerthsharma commented Jan 20, 2026

Uh oh!

google-cla bot commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: Add Geometric Sparse Attention (AETHER) #134

Are you sure you want to change the base?

feat: Add Geometric Sparse Attention (AETHER) #134

Uh oh!

Conversation

teerthsharma commented Jan 20, 2026

Summary

Mathematical Guarantee

Key Features

Verification

Uh oh!

google-cla bot commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant