A thread-safe Python caching decorator backed by a Rust extension. Through a
series of optimizations — eliminating serialization, moving the call wrapper
into Rust, applying link-time optimization, and using direct C API calls — we
achieve 0.55-0.66x of lru_cache's single-threaded throughput while
providing native thread safety that delivers 1.3-1.4x higher throughput
under concurrent load — and 18-24x faster than pure-Python cachetools.
- Drop-in replacement for
functools.lru_cache— same decorator pattern and hashable-argument requirement, with added thread safety, TTL, eviction strategies, and async support - Thread-safe out of the box (
parking_lot::RwLockin Rust) - Async support: works with
async deffunctions — zero overhead on sync path - Shared memory backend: cross-process caching via mmap
- Multiple eviction strategies: LRU, MRU, FIFO, LFU
- TTL support: optional time-to-live expiration
- Single FFI crossing: entire cache lookup happens in Rust, no Python wrapper overhead
- 12-18M ops/s single-threaded, 16M ops/s under concurrent load, 18-24x faster than
cachetools
Prebuilt wheels are available for Linux (x86_64, aarch64), macOS (x86_64, arm64), and Windows (x86_64):
pip install warp_cacheIf no wheel is available for your platform, pip will fall back to the source distribution (requires a Rust toolchain).
from warp_cache import cache
@cache()
def expensive(x, y):
return x + y
expensive(1, 2) # computes and caches
expensive(1, 2) # returns cached resultIf you're already using functools.lru_cache, switching is a one-line change:
-from functools import lru_cache
+from warp_cache import cache
-@lru_cache(maxsize=128)
+@cache(max_size=128)
def expensive(x, y):
return x + yLike lru_cache, all arguments must be hashable. See the usage guide for details.
| Metric | warp_cache | cachetools | lru_cache |
|---|---|---|---|
| Single-threaded | 12-18M ops/s | 0.6-1.2M ops/s | 21-40M ops/s |
| Multi-threaded (8T) | 16M ops/s | 770K ops/s (with Lock) | 12M ops/s (with Lock) |
| Thread-safe | Yes (RwLock) | No (manual Lock) | No |
| Async support | Yes | No | No |
| Cross-process (shared) | ~7.8M ops/s (mmap) | No | No |
| TTL support | Yes | Yes | No |
| Eviction strategies | LRU, MRU, FIFO, LFU | LRU, LFU, FIFO, RR | LRU only |
| Implementation | Rust (PyO3) | Pure Python | C (CPython) |
Under concurrent load, warp_cache delivers 1.3-1.4x higher throughput than lru_cache + Lock and 18-24x higher than cachetools. See full benchmarks for details.
- Usage guide — eviction strategies, async, TTL, shared memory, decorator parameters
- Performance — benchmarks, architecture deep-dive, optimization journey
- Alternatives — comparison with cachebox, moka-py, cachetools, lru_cache
- Examples — runnable scripts for every feature (
uv run examples/<name>.py) - llms.txt / llms-full.txt — project info for LLMs and AI agents (spec)
Contributions are welcome! See CONTRIBUTING.md for setup instructions, coding standards, and PR guidelines.
For security issues, please see SECURITY.md.
MIT