diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md
index 4897a4b..270628b 100644
--- a/docs/ARCHITECTURE.md
+++ b/docs/ARCHITECTURE.md
@@ -535,21 +535,9 @@ User Code
 - **In-memory**: Monitor-based locking (minimal contention)
 - **Rails.cache**: Redis atomic operations (high concurrency)
 
-## Future Enhancements
+## See Also
 
-See [docs/future-enhancements/](future-enhancements/) for detailed designs:
-
-- **Stale-While-Revalidate**: Background cache refresh for even lower latency
-- **Cost Tracking**: Automatic LLM cost calculation
-- **Automatic LLM Client Wrappers**: Zero-boilerplate tracing for OpenAI, Anthropic
-
-## Additional Resources
-
-- [Main README](../README.md) - Getting started guide
-- [Caching Guide](CACHING.md) - Detailed caching documentation
-- [Tracing Guide](TRACING.md) - LLM observability guide
-- [Rails Integration](RAILS.md) - Rails-specific patterns
-
-## Questions?
-
-Open an issue on [GitHub](https://github.com/langfuse/langfuse-ruby/issues) if you have architecture questions.
+- [Caching Guide](CACHING.md) - Cache backends, SWR, and stampede protection
+- [Tracing Guide](TRACING.md) - LLM observability and nested spans
+- [Rails Integration](RAILS.md) - Rails-specific patterns and testing
+- [API Reference](API_REFERENCE.md) - Complete method reference
diff --git a/docs/CACHING.md b/docs/CACHING.md
index e0285e5..cab14d2 100644
--- a/docs/CACHING.md
+++ b/docs/CACHING.md
@@ -668,14 +668,9 @@ end
 2. Verify Redis is accessible
 3. Check `cache_lock_timeout` is sufficient
 
-## Additional Resources
+## See Also
 
-- [Main README](../README.md) - SDK overview
 - [Configuration Reference](CONFIGURATION.md) - All config options including SWR
-- [Rails Integration Guide](RAILS.md) - Rails-specific patterns
-- [Tracing Guide](TRACING.md) - LLM observability
-- [Architecture Guide](ARCHITECTURE.md) - Design decisions
-
-## Questions?
-
-Open an issue on [GitHub](https://github.com/langfuse/langfuse-ruby/issues) if you have questions or need help with caching.
+- [Rails Integration](RAILS.md) - Rails-specific cache patterns
+- [Architecture Guide](ARCHITECTURE.md) - Cache design decisions
+- [Error Handling](ERROR_HANDLING.md) - Cache miss fallback behavior
diff --git a/docs/GETTING_STARTED.md b/docs/GETTING_STARTED.md
index c10d9e2..e194c59 100644
--- a/docs/GETTING_STARTED.md
+++ b/docs/GETTING_STARTED.md
@@ -302,14 +302,14 @@ end
 
 See [ERROR_HANDLING.md](ERROR_HANDLING.md) for complete error reference.
 
-## Next Steps
-
-- **[PROMPTS.md](PROMPTS.md)** - Chat prompts, versioning, Mustache templating
-- **[TRACING.md](TRACING.md)** - Nested observations, RAG patterns, OpenTelemetry
-- **[SCORING.md](SCORING.md)** - Add quality scores to traces
-- **[DATASETS.md](DATASETS.md)** - Create and manage evaluation datasets
-- **[EXPERIMENTS.md](EXPERIMENTS.md)** - Run systematic evaluations with the experiment runner
-- **[CACHING.md](CACHING.md)** - Optimize performance with caching
-- **[RAILS.md](RAILS.md)** - Rails-specific patterns and testing
-- **[CONFIGURATION.md](CONFIGURATION.md)** - All configuration options
-- **[API_REFERENCE.md](API_REFERENCE.md)** - Complete method reference
+## See Also
+
+- [Prompts Guide](PROMPTS.md) - Chat prompts, versioning, Mustache templating
+- [Tracing Guide](TRACING.md) - Nested observations, RAG patterns, OpenTelemetry
+- [Scoring Guide](SCORING.md) - Add quality scores to traces
+- [Datasets Guide](DATASETS.md) - Create and manage evaluation datasets
+- [Experiments Guide](EXPERIMENTS.md) - Run evaluations against datasets
+- [Caching Guide](CACHING.md) - In-memory and Rails.cache backends, SWR
+- [Configuration Reference](CONFIGURATION.md) - All configuration options
+- [Rails Integration](RAILS.md) - Rails-specific patterns and testing
+- [API Reference](API_REFERENCE.md) - Complete method reference
diff --git a/docs/MIGRATION.md b/docs/MIGRATION.md
index 9f43048..5ffe988 100644
--- a/docs/MIGRATION.md
+++ b/docs/MIGRATION.md
@@ -717,9 +717,10 @@ prompt.compile("name" => "Alice")
 3. Use fallbacks for critical paths
 4. Check network latency to Langfuse API
 
-## Additional Resources
+## See Also
 
-- [Main README](../README.md) - SDK overview
-- [Rails Integration Guide](RAILS.md) - Rails-specific patterns
-- [Tracing Guide](TRACING.md) - LLM observability
-- [Langfuse Documentation](https://langfuse.com/docs) - Official docs
+- [Getting Started](GETTING_STARTED.md) - Installation and first trace
+- [Rails Integration](RAILS.md) - Rails-specific patterns
+- [Prompts Guide](PROMPTS.md) - Versioning and Mustache templating
+- [Caching Guide](CACHING.md) - Cache backends and SWR
+- [Langfuse Documentation](https://langfuse.com/docs) - Official Langfuse docs
diff --git a/docs/RAILS.md b/docs/RAILS.md
index 0914289..5fc6878 100644
--- a/docs/RAILS.md
+++ b/docs/RAILS.md
@@ -626,9 +626,10 @@ If experiencing high memory usage:
 1. **Reduce cache_max_size**: Default is 1000, reduce if needed
 2. **Enable cache cleanup**: Implement periodic cache cleanup in background job
 
-## Additional Resources
+## See Also
 
-- [Main README](../README.md) - SDK overview and basic usage
-- [Tracing Guide](TRACING.md) - Deep dive on LLM tracing
+- [Getting Started](GETTING_STARTED.md) - Installation and first trace
+- [Configuration Reference](CONFIGURATION.md) - All config options
+- [Tracing Guide](TRACING.md) - Nested spans and OpenTelemetry
 - [Migration Guide](MIGRATION.md) - Migrating from hardcoded prompts
 - [Langfuse Documentation](https://langfuse.com/docs) - Official Langfuse docs
diff --git a/docs/README.md b/docs/README.md
new file mode 100644
index 0000000..2e4afd6
--- /dev/null
+++ b/docs/README.md
@@ -0,0 +1,37 @@
+# Langfuse Ruby SDK — Documentation
+
+## Foundations
+
+Core concepts you need before using any feature.
+
+- **[Getting Started](GETTING_STARTED.md)** — Install the gem, configure credentials, send your first trace
+- **[Configuration](CONFIGURATION.md)** — All `Langfuse.configure` options: keys, timeouts, cache backends, SWR
+
+## Core Features
+
+The three primitives of the SDK.
+
+- **[Prompts](PROMPTS.md)** — Fetch, compile, and version-manage text and chat prompts
+- **[Tracing](TRACING.md)** — Nested spans, RAG patterns, OpenTelemetry integration
+- **[Scoring](SCORING.md)** — Attach quality scores to traces and observations
+
+## Evaluation
+
+Systematic testing of LLM behavior.
+
+- **[Datasets](DATASETS.md)** — Create and manage evaluation datasets
+- **[Experiments](EXPERIMENTS.md)** — Run evaluations against datasets with the experiment runner
+
+## Production
+
+Patterns for real-world deployments.
+
+- **[Caching](CACHING.md)** — In-memory and Rails.cache backends, SWR, stampede protection
+- **[Error Handling](ERROR_HANDLING.md)** — Exception types, retry behavior, fallback strategies
+- **[Rails Integration](RAILS.md)** — Initializers, controller tracing, testing helpers
+- **[Migration Guide](MIGRATION.md)** — Move from hardcoded prompts to Langfuse-managed prompts
+
+## Reference
+
+- **[API Reference](API_REFERENCE.md)** — Complete method reference for every public class
+- **[Architecture](ARCHITECTURE.md)** — Internal design: layers, threading, cache architecture
diff --git a/docs/TRACING.md b/docs/TRACING.md
index abecf2e..0446983 100644
--- a/docs/TRACING.md
+++ b/docs/TRACING.md
@@ -668,8 +668,12 @@ This allows traces to flow seamlessly across:
 - Go services
 - Any service that implements W3C Trace Context
 
-## Resources
+## See Also
 
+- [Scoring Guide](SCORING.md) - Add quality scores to traces
+- [Prompts Guide](PROMPTS.md) - Managed prompts with tracing integration
+- [Rails Integration](RAILS.md) - Controller-level tracing patterns
+- [API Reference](API_REFERENCE.md) - Complete method reference
 - [Langfuse Documentation](https://langfuse.com/docs)
 - [OpenTelemetry Ruby Documentation](https://opentelemetry.io/docs/instrumentation/ruby/)
 - [W3C Trace Context Specification](https://www.w3.org/TR/trace-context/)
diff --git a/docs/design-history/TRACING_DESIGN.md b/docs/design-history/TRACING_DESIGN.md
deleted file mode 100644
index f742fff..0000000
--- a/docs/design-history/TRACING_DESIGN.md
+++ /dev/null
@@ -1,1668 +0,0 @@
-# Langfuse Ruby SDK - Tracing & Observability Design (OpenTelemetry-Based)
-
-**Status:** Draft Design Document (Revised for OpenTelemetry)
-**Created:** 2025-10-15
-**Revised:** 2025-10-15
-**Author:** Noah Fisher
-
----
-
-## Table of Contents
-
-1. [Overview](#overview)
-2. [Why OpenTelemetry?](#why-opentelemetry)
-3. [Design Principles](#design-principles)
-4. [Architecture](#architecture)
-5. [API Design](#api-design)
-6. [Data Model (OTel + Langfuse)](#data-model-otel--langfuse)
-7. [OpenTelemetry Integration](#opentelemetry-integration)
-8. [Ingestion Architecture](#ingestion-architecture)
-9. [Distributed Tracing](#distributed-tracing)
-10. [Prompt-to-Trace Linking](#prompt-to-trace-linking)
-11. [Cost & Token Tracking](#cost--token-tracking)
-12. [APM Integration](#apm-integration)
-13. [Error Handling & Resilience](#error-handling--resilience)
-14. [Implementation Phases](#implementation-phases)
-
----
-
-## Overview
-
-This document defines the tracing and observability features for the Langfuse Ruby SDK. These features complement the existing prompt management functionality (Phases 0-5 complete) by providing comprehensive LLM observability built on **OpenTelemetry**, the CNCF standard for distributed tracing.
-
-### Goals
-
-1. **OpenTelemetry Foundation**: Build on industry-standard OTel SDK for tracing
-2. **Rails-Friendly**: Seamless integration with Rails applications and ActiveJob
-3. **Distributed Tracing**: Automatic context propagation across services
-4. **APM Integration**: Traces appear in Datadog, New Relic, Honeycomb, etc.
-5. **Ruby-First API**: Idiomatic blocks and patterns, despite OTel underneath
-6. **Automatic Linking**: Connect prompts to traces automatically
-7. **Production-Ready**: Batching, retries, circuit breakers, graceful degradation
-
-### Non-Goals (for v1.0)
-
-- Real-time streaming of traces (future enhancement)
-- Client-side tracing (browser SDK)
-- Custom OTel instrumentations (use existing gems)
-
----
-
-## Why OpenTelemetry?
-
-### Rationale
-
-After researching Langfuse's Python SDK, it became clear that **Langfuse is built on top of OpenTelemetry**, not as a separate system:
-
-> "Context Propagation: **OpenTelemetry automatically handles** the propagation of the current trace and span context." - Langfuse Python SDK docs
-
-**Benefits of OTel Foundation:**
-
-1. **Industry Standard** - CNCF standard, used by every major APM vendor
-2. **Context Propagation** - Automatic distributed tracing via W3C Trace Context
-3. **Ecosystem Integration** - Works with existing Ruby instrumentation (Rails, Sidekiq, HTTP)
-4. **Less Code** - Use OTel's span lifecycle, we add Langfuse-specific attributes
-5. **Consistency** - Matches Python/TypeScript SDK architecture
-6. **APM Correlation** - Langfuse traces appear alongside infrastructure traces
-
-**Trade-offs:**
-
-- ✅ More robust, future-proof
-- ✅ Automatic distributed tracing
-- ✅ Industry ecosystem support
-- ❌ Adds ~10 OTel gem dependencies
-- ❌ Slightly more complex setup
-- ⚖️ Basic usage stays simple for developers
-
----
-
-## Design Principles
-
-### 1. OpenTelemetry Foundation
-
-- **Build on OTel SDK** for span/trace management
-- **Create custom Exporter** to convert OTel spans → Langfuse events
-- **Use OTel Context** for propagation (not custom thread-local)
-- **Add Langfuse extensions** as span attributes (model, tokens, prompts, costs)
-
-### 2. Consistency with Existing Architecture
-
-Follow the same patterns established in prompt management:
-- **Flat API**: Methods on `Client`, not nested managers
-- **Global Configuration**: `Langfuse.configure` pattern
-- **Thread-Safe**: OTel handles this for us
-- **Minimal Dependencies**: Only add what's necessary
-- **Ruby Conventions**: snake_case, keyword arguments, blocks
-
-### 3. Ruby-First API (Hide OTel Complexity)
-
-```ruby
-# ✅ GOOD - Ruby idioms (OTel underneath)
-Langfuse.trace("user-query") do |trace|
-  trace.generation("llm-call", model: "gpt-4") do |gen|
-    gen.input = [{ role: "user", content: "Hello" }]
-    gen.output = call_openai(...)
-  end
-end
-
-# ❌ AVOID - Exposing OTel internals
-tracer = OpenTelemetry.tracer_provider.tracer("langfuse")
-span = tracer.start_span("user-query")
-```
-
-### 4. Async by Default
-
-- Background processing via ActiveJob (works with Sidekiq, Resque, Delayed Job, GoodJob, etc.)
-- Batching to reduce API calls
-- Graceful degradation if ActiveJob is unavailable
-- Sync mode for debugging/testing
-- Configurable queue name
-
-### 5. Developer Experience
-
-- Simple for basic use cases (hide OTel)
-- Powerful for advanced scenarios (expose OTel when needed)
-- Clear error messages
-- Automatic metadata capture
-- Minimal boilerplate
-
----
-
-## Architecture
-
-### High-Level Architecture
-
-```
-┌─────────────────────────────────────────────────────────────┐
-│                   Application Code                          │
-│  Langfuse.trace(...) { |t| t.generation(...) }              │
-└────────────────────┬────────────────────────────────────────┘
-                     │
-                     ▼
-┌─────────────────────────────────────────────────────────────┐
-│              Langfuse Ruby API (Block-based)                │
-│  • Langfuse::Client                                         │
-│  • Langfuse::Tracer (wrapper around OTel)                   │
-│  • Langfuse::Generation (adds model, tokens, prompts)       │
-└────────────────────┬────────────────────────────────────────┘
-                     │
-                     ▼
-┌─────────────────────────────────────────────────────────────┐
-│           OpenTelemetry SDK (Core Tracing)                  │
-│  • Tracer: Creates spans                                    │
-│  • Context: Propagates trace/span context                   │
-│  • Span: Time-bounded operations                            │
-│  • Attributes: Key-value metadata                           │
-└────────────────────┬────────────────────────────────────────┘
-                     │
-                     ▼
-┌─────────────────────────────────────────────────────────────┐
-│        Langfuse Exporter (OTel → Langfuse Events)           │
-│  • Converts OTel spans → Langfuse ingestion format          │
-│  • Adds Langfuse-specific fields (prompt, costs)            │
-│  • Batches events for ingestion API                         │
-└────────────────────┬────────────────────────────────────────┘
-                     │
-        ┌────────────┴───────────┐
-        │                        │
-        ▼                        ▼
-┌──────────────┐      ┌──────────────────────┐
-│ Sync Export  │      │  Async Export        │
-│ (Test/Debug) │      │  (ActiveJob)         │
-└──────┬───────┘      └──────────┬───────────┘
-       │                         │
-       └────────────┬────────────┘
-                    │
-                    ▼
-┌─────────────────────────────────────────────────────────────┐
-│         Langfuse Ingestion API                              │
-│  POST /api/public/ingestion (batch events)                  │
-└─────────────────────────────────────────────────────────────┘
-```
-
-### Component Breakdown
-
-1. **Langfuse Ruby API** - Idiomatic Ruby blocks, hide OTel complexity
-2. **OpenTelemetry SDK** - Handle span lifecycle, context propagation
-3. **Langfuse Exporter** - Custom OTel exporter that converts spans to Langfuse events
-4. **Ingestion Client** - HTTP client with batching, retries, circuit breaker
-5. **ActiveJob Worker** - Async background processing (optional, works with any ActiveJob backend)
-
----
-
-## API Design
-
-### Core Concepts (Same as Before)
-
-**Traces** → The root container for an LLM interaction (OTel trace)
-**Spans** → Time-bounded operations (OTel span)
-**Generations** → LLM calls (OTel span with extra attributes)
-**Events** → Point-in-time markers (OTel span events)
-**Scores** → Evaluations (custom Langfuse concept, sent separately)
-
-### Hierarchy (OTel Spans)
-
-```
-OTel Trace
-├── OTel Span (type: "span", name: "document-retrieval")
-│   └── OTel Span (type: "generation", model: "text-embedding-ada-002")
-├── OTel Span (type: "span", name: "llm-processing")
-│   └── OTel Span (type: "generation", model: "gpt-4")
-├── OTel Event (name: "user-feedback")
-└── Custom Score (sent via Langfuse API)
-```
-
----
-
-## API Examples
-
-### Example 1: Basic Trace with Generation
-
-```ruby
-# Simple block-based API (OTel underneath)
-Langfuse.trace(name: "chat-completion", user_id: "user-123") do |trace|
-  trace.generation(
-    name: "openai-call",
-    model: "gpt-4",
-    input: [{ role: "user", content: "Hello!" }]
-  ) do |gen|
-    response = openai_client.chat(...)
-
-    gen.output = response.choices.first.message.content
-    gen.usage = {
-      prompt_tokens: response.usage.prompt_tokens,
-      completion_tokens: response.usage.completion_tokens,
-      total_tokens: response.usage.total_tokens
-    }
-  end
-end
-
-# Under the hood:
-# 1. OTel creates a root span with name "chat-completion"
-# 2. OTel creates child span with name "openai-call"
-# 3. Langfuse adds attributes: model="gpt-4", type="generation", usage={...}
-# 4. Langfuse Exporter converts to ingestion events
-```
-
-### Example 2: Nested Spans for RAG Pipeline
-
-```ruby
-Langfuse.trace(name: "rag-query", session_id: "session-456") do |trace|
-  # Document retrieval span (OTel span)
-  docs = trace.span(name: "retrieval", input: { query: "ML basics" }) do |span|
-    # Embedding generation (OTel span with type="generation")
-    embedding = span.generation(
-      name: "embed-query",
-      model: "text-embedding-ada-002",
-      input: "ML basics"
-    ) do |gen|
-      result = openai_client.embeddings(...)
-      gen.output = result.embedding
-      gen.usage = { total_tokens: result.usage.total_tokens }
-      result.embedding
-    end
-
-    # Retrieve from vector DB
-    vector_db.search(embedding, limit: 5)
-  end
-
-  # LLM generation span
-  trace.span(name: "llm-generation") do |span|
-    prompt = build_rag_prompt(docs)
-
-    span.generation(
-      name: "gpt4-completion",
-      model: "gpt-4",
-      input: prompt,
-      metadata: { num_docs: docs.size }
-    ) do |gen|
-      response = openai_client.chat(...)
-      gen.output = response.choices.first.message.content
-      gen.usage = {
-        prompt_tokens: response.usage.prompt_tokens,
-        completion_tokens: response.usage.completion_tokens
-      }
-    end
-  end
-
-  # Track user feedback event (OTel span event)
-  trace.event(
-    name: "user-feedback",
-    input: { rating: "thumbs_up" }
-  )
-
-  # Add quality score (custom Langfuse concept)
-  trace.score(name: "helpfulness", value: 0.95)
-end
-
-# OTel automatically handles:
-# - Parent-child span relationships
-# - Timestamps (start_time, end_time)
-# - Context propagation within the trace
-```
-
-### Example 3: Distributed Tracing Across Services
-
-```ruby
-# Service A (API Gateway)
-def handle_request
-  Langfuse.trace(name: "api-request", user_id: "user-123") do |trace|
-    # Make HTTP request to Service B
-    # OTel automatically injects W3C Trace Context headers!
-    response = HTTParty.get(
-      "http://service-b/process",
-      headers: trace.inject_context  # W3C traceparent header
-    )
-
-    trace.event(name: "downstream-call", output: response.code)
-  end
-end
-
-# Service B (Processing Service)
-def process_request
-  # Extract context from headers (W3C Trace Context)
-  context = Langfuse.extract_context(request.headers)
-
-  # This trace is automatically linked to the parent trace in Service A!
-  Langfuse.trace(name: "process-data", context: context) do |trace|
-    trace.generation(name: "llm-call", model: "gpt-4") do |gen|
-      # ... LLM processing
-    end
-  end
-end
-
-# Result: Single unified trace across both services!
-# Service A → Service B (parent-child relationship preserved)
-```
-
-### Example 4: APM Integration (Datadog Example)
-
-```ruby
-# When both Langfuse and Datadog are configured:
-
-Langfuse.trace(name: "user-query") do |trace|
-  trace.generation(name: "gpt4", model: "gpt-4") do |gen|
-    # Call external API
-    response = HTTParty.get("https://api.example.com/data")
-    # ... LLM processing
-  end
-end
-
-# Result in Datadog APM:
-# ┌─ Trace: user-query (Langfuse + Datadog)
-# │  ├─ Span: gpt4 (Langfuse generation)
-# │  │  └─ Attributes: model=gpt-4, tokens=225, cost=0.00525
-# │  └─ Span: http.request (Datadog automatic instrumentation)
-# │     └─ URL: https://api.example.com/data
-# └─ All correlated with the same trace_id!
-```
-
-### Example 5: Advanced - Direct OTel Access
-
-```ruby
-# For advanced users who need OTel directly
-Langfuse.trace(name: "complex-workflow") do |trace|
-  # Access underlying OTel span
-  otel_span = trace.current_span
-
-  # Add custom OTel attributes
-  otel_span.set_attribute("custom.metric", 42)
-
-  # Use OTel status
-  otel_span.status = OpenTelemetry::Trace::Status.error("Failed")
-
-  # Still use Langfuse convenience methods
-  trace.generation(name: "gpt4", model: "gpt-4") do |gen|
-    # ...
-  end
-end
-```
-
----
-
-## Data Model (OTel + Langfuse)
-
-### How OTel Spans Map to Langfuse Concepts
-
-| Langfuse Concept | OpenTelemetry Representation | Langfuse-Specific Attributes |
-|------------------|------------------------------|------------------------------|
-| **Trace** | OTel Trace (root span) | `user_id`, `session_id`, `tags`, `public` |
-| **Span** | OTel Span | `langfuse.type="span"`, `input`, `output`, `level` |
-| **Generation** | OTel Span | `langfuse.type="generation"`, `model`, `usage`, `prompt_name`, `prompt_version` |
-| **Event** | OTel Span Event | `name`, `input`, `output` |
-| **Score** | Custom (not OTel) | Sent separately via Langfuse API |
-
-### OTel Span Attributes for Langfuse
-
-**Common Attributes (all spans):**
-```ruby
-{
-  "langfuse.type" => "span",  # or "generation"
-  "langfuse.trace_id" => "trace-abc123",
-  "langfuse.user_id" => "user-456",
-  "langfuse.session_id" => "session-789",
-  "langfuse.metadata" => { ... },  # JSON
-  "langfuse.input" => { ... },     # JSON
-  "langfuse.output" => { ... },    # JSON
-  "langfuse.level" => "default"    # debug, default, warning, error
-}
-```
-
-**Generation-Specific Attributes:**
-```ruby
-{
-  "langfuse.type" => "generation",
-  "langfuse.model" => "gpt-4",
-  "langfuse.model_parameters" => { temperature: 0.7 },  # JSON
-  "langfuse.usage.prompt_tokens" => 100,
-  "langfuse.usage.completion_tokens" => 50,
-  "langfuse.usage.total_tokens" => 150,
-  "langfuse.usage.total_cost" => 0.00525,  # Auto-calculated
-  "langfuse.prompt_name" => "support-assistant",  # Auto-linked
-  "langfuse.prompt_version" => 3,  # Auto-linked
-  "langfuse.completion_start_time" => "2025-10-15T10:00:02.5Z"  # Streaming
-}
-```
-
-### OTel Event Format
-
-```ruby
-# Span event (for user feedback, etc.)
-span.add_event(
-  "user-feedback",
-  attributes: {
-    "langfuse.input" => { feedback_type: "thumbs_up" }.to_json,
-    "langfuse.level" => "default"
-  },
-  timestamp: Time.now
-)
-```
-
-### Score (Separate from OTel)
-
-Scores are sent as separate events to Langfuse API (not OTel):
-
-```ruby
-{
-  type: "score-create",
-  body: {
-    id: "score-xyz",
-    trace_id: "trace-abc",  # Link to OTel trace
-    observation_id: "span-123",  # Link to OTel span
-    name: "helpfulness",
-    value: 0.95,
-    comment: "Very helpful",
-    data_type: "numeric"
-  }
-}
-```
-
----
-
-## OpenTelemetry Integration
-
-### OTel Components We'll Use
-
-1. **opentelemetry-sdk** - Core tracing SDK
-2. **opentelemetry-api** - Public API
-3. **opentelemetry-exporter-otlp** - (Optional) For OTel Collector
-4. **opentelemetry-instrumentation-all** - (Optional) Auto-instrumentation
-
-### Initialization
-
-```ruby
-require 'opentelemetry/sdk'
-require 'langfuse/exporter'
-
-# Initialize OpenTelemetry with Langfuse exporter
-OpenTelemetry::SDK.configure do |c|
-  c.service_name = 'my-rails-app'
-  c.service_version = ENV['APP_VERSION']
-
-  # Add Langfuse exporter
-  c.add_span_processor(
-    OpenTelemetry::SDK::Trace::Export::BatchSpanProcessor.new(
-      Langfuse::Exporter.new(
-        public_key: ENV['LANGFUSE_PUBLIC_KEY'],
-        secret_key: ENV['LANGFUSE_SECRET_KEY']
-      )
-    )
-  )
-
-  # Optionally add OTLP exporter for APM (Datadog, etc.)
-  if ENV['OTEL_EXPORTER_OTLP_ENDPOINT']
-    c.add_span_processor(
-      OpenTelemetry::SDK::Trace::Export::BatchSpanProcessor.new(
-        OpenTelemetry::Exporter::OTLP::Exporter.new
-      )
-    )
-  end
-end
-```
-
-### Langfuse Wrapper Around OTel
-
-```ruby
-module Langfuse
-  class Tracer
-    def initialize(otel_tracer)
-      @otel_tracer = otel_tracer
-    end
-
-    def trace(name:, **attributes, &block)
-      # Create OTel span
-      @otel_tracer.in_span(name, attributes: otel_attributes(attributes)) do |span|
-        # Wrap in Langfuse::Trace for Ruby API
-        trace_obj = Trace.new(span, attributes)
-        yield(trace_obj)
-      end
-    end
-
-    private
-
-    def otel_attributes(attrs)
-      # Convert Langfuse attributes to OTel format
-      {
-        "langfuse.type" => "trace",
-        "langfuse.user_id" => attrs[:user_id],
-        "langfuse.session_id" => attrs[:session_id],
-        "langfuse.metadata" => attrs[:metadata].to_json
-      }.compact
-    end
-  end
-end
-```
-
-### Langfuse::Trace Class
-
-```ruby
-module Langfuse
-  class Trace
-    attr_reader :otel_span
-
-    def initialize(otel_span, attributes = {})
-      @otel_span = otel_span
-      @attributes = attributes
-    end
-
-    def span(name:, **attrs, &block)
-      # Create child OTel span
-      tracer = OpenTelemetry.tracer_provider.tracer('langfuse')
-      tracer.in_span(name, attributes: otel_attributes(attrs, type: "span")) do |span|
-        span_obj = Span.new(span, attrs)
-        yield(span_obj)
-      end
-    end
-
-    def generation(name:, model:, **attrs, &block)
-      tracer = OpenTelemetry.tracer_provider.tracer('langfuse')
-      tracer.in_span(name, attributes: otel_attributes(attrs, type: "generation", model: model)) do |span|
-        gen_obj = Generation.new(span, attrs.merge(model: model))
-        yield(gen_obj)
-      end
-    end
-
-    def event(name:, **attrs)
-      @otel_span.add_event(name, attributes: {
-        "langfuse.input" => attrs[:input].to_json,
-        "langfuse.level" => attrs[:level] || "default"
-      }.compact)
-    end
-
-    def score(name:, value:, **attrs)
-      # Scores are sent separately (not OTel)
-      ScoreBuffer.push(
-        trace_id: @otel_span.context.trace_id.hex_id,
-        name: name,
-        value: value,
-        **attrs
-      )
-    end
-
-    def inject_context
-      # For distributed tracing - inject W3C headers
-      carrier = {}
-      OpenTelemetry.propagation.inject(carrier)
-      carrier
-    end
-
-    def current_span
-      # For advanced users who need OTel directly
-      @otel_span
-    end
-
-    private
-
-    def otel_attributes(attrs, type:, model: nil)
-      {
-        "langfuse.type" => type,
-        "langfuse.model" => model,
-        "langfuse.input" => attrs[:input].to_json,
-        "langfuse.metadata" => attrs[:metadata].to_json
-      }.compact
-    end
-  end
-end
-```
-
----
-
-## Ingestion Architecture
-
-### Langfuse Exporter (OTel Custom Exporter)
-
-```ruby
-module Langfuse
-  class Exporter
-    def initialize(public_key:, secret_key:, **options)
-      @public_key = public_key
-      @secret_key = secret_key
-      @buffer = EventBuffer.new
-      @ingestion_client = IngestionClient.new(public_key, secret_key)
-    end
-
-    # Called by OTel BatchSpanProcessor
-    def export(span_data_list, timeout: nil)
-      events = span_data_list.map { |span| convert_span_to_event(span) }
-
-      # Buffer events
-      events.each { |event| @buffer.push(event) }
-
-      # Trigger batch send if buffer is full
-      flush_if_needed
-
-      OpenTelemetry::SDK::Trace::Export::SUCCESS
-    rescue StandardError => e
-      Rails.logger.error("Langfuse export failed: #{e.message}")
-      OpenTelemetry::SDK::Trace::Export::FAILURE
-    end
-
-    def force_flush(timeout: nil)
-      events = @buffer.drain_all
-      return if events.empty?
-
-      @ingestion_client.send_batch(events)
-    end
-
-    def shutdown(timeout: nil)
-      force_flush(timeout: timeout)
-    end
-
-    private
-
-    def convert_span_to_event(span)
-      attrs = span.attributes || {}
-      type = attrs["langfuse.type"] || "span"
-
-      case type
-      when "trace"
-        create_trace_event(span, attrs)
-      when "span"
-        create_span_event(span, attrs)
-      when "generation"
-        create_generation_event(span, attrs)
-      end
-    end
-
-    def create_trace_event(span, attrs)
-      {
-        id: SecureRandom.uuid,
-        timestamp: span.start_timestamp,
-        type: "trace-create",
-        body: {
-          id: span.trace_id.hex_id,
-          name: span.name,
-          user_id: attrs["langfuse.user_id"],
-          session_id: attrs["langfuse.session_id"],
-          metadata: parse_json(attrs["langfuse.metadata"]),
-          tags: attrs["langfuse.tags"],
-          timestamp: span.start_timestamp
-        }.compact
-      }
-    end
-
-    def create_generation_event(span, attrs)
-      {
-        id: SecureRandom.uuid,
-        timestamp: span.start_timestamp,
-        type: "generation-create",
-        body: {
-          id: span.span_id.hex_id,
-          trace_id: span.trace_id.hex_id,
-          parent_observation_id: span.parent_span_id&.hex_id,
-          name: span.name,
-          model: attrs["langfuse.model"],
-          input: parse_json(attrs["langfuse.input"]),
-          output: parse_json(attrs["langfuse.output"]),
-          model_parameters: parse_json(attrs["langfuse.model_parameters"]),
-          usage: extract_usage(attrs),
-          prompt_name: attrs["langfuse.prompt_name"],
-          prompt_version: attrs["langfuse.prompt_version"],
-          start_time: span.start_timestamp,
-          end_time: span.end_timestamp,
-          completion_start_time: attrs["langfuse.completion_start_time"],
-          level: attrs["langfuse.level"] || "default",
-          status_message: span.status&.description
-        }.compact
-      }
-    end
-
-    def extract_usage(attrs)
-      return nil unless attrs["langfuse.usage.total_tokens"]
-
-      {
-        prompt_tokens: attrs["langfuse.usage.prompt_tokens"],
-        completion_tokens: attrs["langfuse.usage.completion_tokens"],
-        total_tokens: attrs["langfuse.usage.total_tokens"],
-        total_cost: attrs["langfuse.usage.total_cost"]
-      }.compact
-    end
-
-    def parse_json(json_string)
-      JSON.parse(json_string) if json_string
-    rescue JSON::ParserError
-      nil
-    end
-
-    def flush_if_needed
-      return unless @buffer.size >= config.batch_size
-
-      events = @buffer.drain(max: config.batch_size)
-
-      # Use ActiveJob if available, otherwise sync
-      if async_enabled?
-        IngestionJob.perform_later(events: events)
-      else
-        @ingestion_client.send_batch(events)
-      end
-    end
-
-    def async_enabled?
-      defined?(ActiveJob) && config.tracing_async
-    end
-  end
-end
-```
-
-### ActiveJob Worker
-
-```ruby
-module Langfuse
-  class IngestionJob < ActiveJob::Base
-    queue_as { Langfuse.config.job_queue }
-
-    retry_on StandardError, wait: :exponentially_longer, attempts: 3
-
-    def perform(events:)
-      client = IngestionClient.new(
-        Langfuse.config.public_key,
-        Langfuse.config.secret_key
-      )
-
-      client.send_batch(events)
-    rescue StandardError => e
-      Rails.logger.error("Langfuse ingestion failed: #{e.message}")
-      # Re-raise to trigger ActiveJob retry
-      raise
-    end
-  end
-end
-```
-
-### Batch Request Format (Same as Before)
-
-```ruby
-# POST /api/public/ingestion
-{
-  batch: [
-    {
-      id: "event-123",
-      timestamp: "2025-10-15T10:00:00.000Z",
-      type: "trace-create",
-      body: {
-        id: "abc123def456",  # OTel trace_id
-        name: "user-query",
-        user_id: "user-456",
-        metadata: { ... }
-      }
-    },
-    {
-      id: "event-124",
-      timestamp: "2025-10-15T10:00:01.000Z",
-      type: "generation-create",
-      body: {
-        id: "789xyz",  # OTel span_id
-        trace_id: "abc123def456",  # OTel trace_id
-        parent_observation_id: "parent-span-id",
-        name: "openai-call",
-        model: "gpt-4",
-        input: [...],
-        output: "...",
-        usage: { ... }
-      }
-    }
-  ]
-}
-```
-
----
-
-## Distributed Tracing
-
-### W3C Trace Context Propagation
-
-OpenTelemetry automatically handles distributed tracing via W3C Trace Context headers:
-
-**Header Format:**
-```
-traceparent: 00-<trace-id>-<span-id>-<flags>
-tracestate: langfuse=<langfuse-specific-data>
-```
-
-### Automatic Propagation (HTTP Calls)
-
-```ruby
-# With opentelemetry-instrumentation-http installed:
-
-Langfuse.trace(name: "api-request") do |trace|
-  # OTel automatically injects traceparent header!
-  response = HTTParty.get("http://service-b/api")
-
-  # Downstream service sees:
-  # traceparent: 00-abc123def456-789xyz-01
-end
-```
-
-### Manual Context Injection/Extraction
-
-```ruby
-# Service A - Inject context
-Langfuse.trace(name: "parent") do |trace|
-  headers = trace.inject_context
-  # => { "traceparent" => "00-abc123...", "tracestate" => "..." }
-
-  HTTParty.get(url, headers: headers)
-end
-
-# Service B - Extract context
-def handle_request
-  context = Langfuse.extract_context(request.headers)
-
-  Langfuse.trace(name: "child", context: context) do |trace|
-    # Automatically linked to parent trace!
-  end
-end
-```
-
-### Implementation
-
-```ruby
-module Langfuse
-  def self.extract_context(headers)
-    carrier = headers.to_h
-    OpenTelemetry.propagation.extract(carrier)
-  end
-
-  def self.trace(name:, context: nil, **attrs, &block)
-    if context
-      # Use extracted context as parent
-      OpenTelemetry::Context.with_current(context) do
-        tracer.trace(name: name, **attrs, &block)
-      end
-    else
-      # Create new root trace
-      tracer.trace(name: name, **attrs, &block)
-    end
-  end
-end
-```
-
----
-
-## Prompt-to-Trace Linking
-
-### Automatic Linking (Same as Before)
-
-When a prompt is used in a generation, automatically capture as OTel attributes:
-
-```ruby
-prompt = Langfuse.client.get_prompt("support-assistant", version: 3)
-
-Langfuse.trace(name: "support-query") do |trace|
-  trace.generation(
-    name: "response",
-    model: "gpt-4",
-    prompt: prompt  # ← Automatic linking
-  ) do |gen|
-    messages = prompt.compile(customer: "Alice")
-    response = call_llm(messages)
-    gen.output = response
-  end
-end
-
-# OTel span attributes:
-# {
-#   "langfuse.type": "generation",
-#   "langfuse.model": "gpt-4",
-#   "langfuse.prompt_name": "support-assistant",    # Auto-captured
-#   "langfuse.prompt_version": 3,                   # Auto-captured
-#   "langfuse.input": "[{\"role\":\"system\"...}]"  # Compiled prompt
-# }
-```
-
-### Implementation
-
-```ruby
-class Generation
-  def initialize(otel_span, attributes = {})
-    @otel_span = otel_span
-    @attributes = attributes
-
-    # Auto-detect prompt
-    if attributes[:prompt].is_a?(Langfuse::TextPromptClient) ||
-       attributes[:prompt].is_a?(Langfuse::ChatPromptClient)
-      @otel_span.set_attribute("langfuse.prompt_name", attributes[:prompt].name)
-      @otel_span.set_attribute("langfuse.prompt_version", attributes[:prompt].version)
-    end
-  end
-end
-```
-
----
-
-## Cost & Token Tracking
-
-### Automatic Cost Calculation (Same as Before)
-
-```ruby
-# Model pricing database (built-in)
-LANGFUSE_MODEL_PRICING = {
-  "gpt-4" => {
-    prompt_tokens: 0.03 / 1000,
-    completion_tokens: 0.06 / 1000
-  },
-  "gpt-4-turbo" => {
-    prompt_tokens: 0.01 / 1000,
-    completion_tokens: 0.03 / 1000
-  }
-  # ... more models
-}
-```
-
-### Usage in Generation
-
-```ruby
-class Generation
-  def usage=(usage_hash)
-    model = @attributes[:model]
-
-    # Set token counts as OTel attributes
-    @otel_span.set_attribute("langfuse.usage.prompt_tokens", usage_hash[:prompt_tokens])
-    @otel_span.set_attribute("langfuse.usage.completion_tokens", usage_hash[:completion_tokens])
-    @otel_span.set_attribute("langfuse.usage.total_tokens", usage_hash[:total_tokens])
-
-    # Auto-calculate cost if not provided
-    unless usage_hash[:total_cost]
-      cost = CostCalculator.calculate(
-        model: model,
-        prompt_tokens: usage_hash[:prompt_tokens],
-        completion_tokens: usage_hash[:completion_tokens]
-      )
-      @otel_span.set_attribute("langfuse.usage.total_cost", cost)
-    end
-  end
-end
-```
-
----
-
-## APM Integration
-
-### How It Works
-
-When multiple OTel exporters are configured, **the same trace appears in both Langfuse and your APM**:
-
-```ruby
-# config/initializers/opentelemetry.rb
-OpenTelemetry::SDK.configure do |c|
-  c.service_name = 'rails-app'
-
-  # Langfuse exporter (LLM-specific details)
-  c.add_span_processor(
-    OpenTelemetry::SDK::Trace::Export::BatchSpanProcessor.new(
-      Langfuse::Exporter.new(...)
-    )
-  )
-
-  # Datadog exporter (infrastructure details)
-  c.add_span_processor(
-    OpenTelemetry::SDK::Trace::Export::BatchSpanProcessor.new(
-      OpenTelemetry::Exporter::OTLP::Exporter.new(
-        endpoint: 'http://datadog-agent:4318'
-      )
-    )
-  )
-end
-```
-
-### Result in Datadog APM
-
-```
-Trace ID: abc123def456
-├─ Span: chat-completion (Langfuse trace)
-│  ├─ Duration: 3.2s
-│  ├─ Service: rails-app
-│  ├─ Attributes:
-│  │  ├─ langfuse.user_id: "user-123"
-│  │  └─ langfuse.session_id: "session-456"
-│  │
-│  ├─ Span: openai-call (Langfuse generation)
-│  │  ├─ Duration: 2.8s
-│  │  ├─ Attributes:
-│  │  │  ├─ langfuse.model: "gpt-4"
-│  │  │  ├─ langfuse.usage.total_tokens: 225
-│  │  │  └─ langfuse.usage.total_cost: 0.00675
-│  │
-│  └─ Span: http.request (Datadog auto-instrumentation)
-│     ├─ Duration: 2.7s
-│     ├─ URL: https://api.openai.com/v1/chat/completions
-│     └─ Status: 200
-```
-
-### Benefits
-
-1. **Unified View**: See LLM calls alongside database queries, HTTP requests
-2. **Performance Analysis**: Identify slow LLM calls impacting response time
-3. **Error Correlation**: Link LLM failures to infrastructure issues
-4. **Cost Attribution**: Correlate costs with specific users/features
-
----
-
-## Error Handling & Resilience
-
-### 1. Circuit Breaker (Same Pattern)
-
-```ruby
-class Langfuse::IngestionClient
-  def initialize
-    @circuit_breaker = Stoplight("langfuse-ingestion")
-      .with_threshold(5)
-      .with_timeout(30)
-      .with_cool_off_time(10)
-      .with_data_store(Stoplight::DataStore::Redis.new(Redis.current))
-  end
-
-  def send_batch(events)
-    @circuit_breaker.run do
-      connection.post("/api/public/ingestion", { batch: events })
-    end
-  rescue Stoplight::Error::RedLight => e
-    Rails.logger.warn("Langfuse circuit open: #{e.message}")
-    # Drop events or store for retry
-  end
-end
-```
-
-### 2. OTel Export Failures
-
-```ruby
-# If Langfuse exporter fails, OTel continues normally
-class Langfuse::Exporter
-  def export(span_data_list, timeout: nil)
-    # Try to export
-    events = convert_spans(span_data_list)
-    send_events(events)
-
-    OpenTelemetry::SDK::Trace::Export::SUCCESS
-  rescue StandardError => e
-    # Log but don't crash app
-    Rails.logger.error("Langfuse export failed: #{e.message}")
-
-    # Other exporters (Datadog) still work!
-    OpenTelemetry::SDK::Trace::Export::FAILURE
-  end
-end
-```
-
-### 3. Graceful Degradation
-
-```ruby
-# Master kill switch
-Langfuse.configure do |config|
-  config.tracing_enabled = ENV.fetch("LANGFUSE_TRACING", "true") == "true"
-end
-
-# Disable exporter if tracing is off
-def initialize_otel
-  return unless Langfuse.config.tracing_enabled
-
-  OpenTelemetry::SDK.configure do |c|
-    c.add_span_processor(
-      OpenTelemetry::SDK::Trace::Export::BatchSpanProcessor.new(
-        Langfuse::Exporter.new
-      )
-    )
-  end
-end
-```
-
----
-
-## Implementation Phases
-
-### Phase T0: OpenTelemetry Setup (Week 1, Days 1-2)
-
-**Goal:** Get OpenTelemetry working with basic tracing
-
-#### T0.1 OTel Dependencies
-- [ ] Add `opentelemetry-sdk` gem
-- [ ] Add `opentelemetry-api` gem
-- [ ] Add `opentelemetry-instrumentation-all` (optional)
-- [ ] Add `opentelemetry-exporter-otlp` (optional, for APM)
-- [ ] Update Gemfile and bundle install
-
-#### T0.2 Basic OTel Configuration
-- [ ] Create `config/initializers/opentelemetry.rb`
-- [ ] Configure service name, version
-- [ ] Add console exporter for testing
-- [ ] Write basic trace/span test
-
-#### T0.3 Verify OTel Works
-- [ ] Create simple trace in test
-- [ ] Verify spans are exported
-- [ ] Test context propagation
-- [ ] Document OTel setup
-
-**Dependencies Added:**
-- `opentelemetry-sdk ~> 1.4`
-- `opentelemetry-api ~> 1.2`
-- `opentelemetry-common ~> 0.21`
-- `opentelemetry-exporter-otlp ~> 0.27` (optional)
-
-**Milestone:** OTel tracing works!
-
----
-
-### Phase T1: Langfuse Exporter (Week 1, Days 3-4)
-
-**Goal:** Custom OTel exporter that converts spans to Langfuse events
-
-#### T1.1 Exporter Skeleton
-- [ ] Create `Langfuse::Exporter` class
-- [ ] Implement `export(span_data_list)` method
-- [ ] Implement `force_flush` and `shutdown`
-- [ ] Register with OTel SpanProcessor
-
-#### T1.2 Span Conversion
-- [ ] Implement `convert_span_to_event(span)`
-- [ ] Extract Langfuse attributes from OTel span
-- [ ] Handle trace-create, span-create, generation-create
-- [ ] Write conversion tests
-
-#### T1.3 Ingestion Client (Sync)
-- [ ] Create `Langfuse::IngestionClient`
-- [ ] Implement `POST /api/public/ingestion`
-- [ ] Add Basic Auth
-- [ ] Add retry logic (Faraday)
-- [ ] Write tests with WebMock
-
-**Milestone:** OTel spans → Langfuse API!
-
----
-
-### Phase T2: Ruby API Wrapper (Week 2)
-
-**Goal:** Idiomatic Ruby API that wraps OTel
-
-#### T2.1 Langfuse::Tracer
-- [ ] Create wrapper around OTel tracer
-- [ ] Implement `Langfuse.trace { |t| ... }`
-- [ ] Map Ruby kwargs to OTel attributes
-- [ ] Write tests
-
-#### T2.2 Trace/Span/Generation Classes
-- [ ] Create `Langfuse::Trace` wrapper
-- [ ] Create `Langfuse::Span` wrapper
-- [ ] Create `Langfuse::Generation` class
-- [ ] Handle input/output/metadata
-- [ ] Write comprehensive tests
-
-#### T2.3 Global Configuration
-- [ ] Add tracing config to `Langfuse::Config`
-- [ ] Integrate with `Langfuse.configure`
-- [ ] Auto-initialize OTel on configure
-- [ ] Write tests
-
-**Milestone:** Ruby block API works!
-
----
-
-### Phase T3: Async Processing (Week 2-3)
-
-**Goal:** Background processing via Sidekiq
-
-#### T3.1 Event Buffer
-- [ ] Create `Langfuse::EventBuffer`
-- [ ] Implement thread-safe push/drain
-- [ ] Add overflow handling
-- [ ] Write concurrency tests
-
-#### T3.2 Batch Span Processor
-- [ ] Configure OTel BatchSpanProcessor
-- [ ] Set batch size (50 spans)
-- [ ] Set flush interval (10s)
-- [ ] Test batching behavior
-
-#### T3.3 Sidekiq Worker
-- [ ] Create `Langfuse::IngestionWorker`
-- [ ] Accept batch of events
-- [ ] Send via IngestionClient
-- [ ] Add error handling
-
-#### T3.4 Async Configuration
-- [ ] Add `tracing_async` config option
-- [ ] Toggle between sync/async export
-- [ ] Auto-detect Sidekiq availability
-- [ ] Write tests
-
-**Milestone:** Async batching works!
-
----
-
-### Phase T4: Prompt Linking (Week 3)
-
-**Goal:** Automatic prompt-to-trace linking
-
-#### T4.1 Prompt Detection
-- [ ] Detect `prompt:` kwarg in generation
-- [ ] Extract name and version from PromptClient
-- [ ] Add as OTel attributes
-
-#### T4.2 OTel Attribute Mapping
-- [ ] Add `langfuse.prompt_name` attribute
-- [ ] Add `langfuse.prompt_version` attribute
-- [ ] Include in exporter conversion
-
-#### T4.3 Integration Tests
-- [ ] Test end-to-end prompt linking
-- [ ] Test with TextPromptClient
-- [ ] Test with ChatPromptClient
-
-**Milestone:** Automatic prompt linking!
-
----
-
-### Phase T5: Cost Tracking (Week 3)
-
-**Goal:** Automatic cost calculation
-
-#### T5.1 Model Pricing Database
-- [ ] Create pricing hash (same as before)
-- [ ] Add OpenAI, Anthropic models
-- [ ] Add custom pricing support
-
-#### T5.2 Cost Calculator
-- [ ] Create `Langfuse::CostCalculator`
-- [ ] Calculate from tokens + model
-- [ ] Handle unknown models
-
-#### T5.3 Usage Enhancement
-- [ ] Auto-calculate costs in `Generation#usage=`
-- [ ] Add cost as OTel attribute
-- [ ] Include in exporter
-
-**Milestone:** Automatic cost calculation!
-
----
-
-### Phase T6: Distributed Tracing (Week 4)
-
-**Goal:** W3C Trace Context support
-
-#### T6.1 Context Injection
-- [ ] Implement `trace.inject_context`
-- [ ] Use OTel propagation API
-- [ ] Return W3C headers hash
-
-#### T6.2 Context Extraction
-- [ ] Implement `Langfuse.extract_context(headers)`
-- [ ] Use OTel propagation API
-- [ ] Link child traces to parent
-
-#### T6.3 HTTP Instrumentation
-- [ ] Add `opentelemetry-instrumentation-http`
-- [ ] Test automatic header injection
-- [ ] Test cross-service tracing
-
-**Milestone:** Distributed tracing works!
-
----
-
-### Phase T7: APM Integration (Week 4)
-
-**Goal:** Multi-exporter configuration
-
-#### T7.1 Multiple Exporters
-- [ ] Document multi-exporter setup
-- [ ] Test with Datadog + Langfuse
-- [ ] Test with OTLP + Langfuse
-- [ ] Ensure independent failures
-
-#### T7.2 Correlation
-- [ ] Verify trace IDs match across exporters
-- [ ] Test unified traces in Datadog
-- [ ] Document APM integration
-
-**Milestone:** APM integration!
-
----
-
-### Phase T8: Advanced Features (Week 5)
-
-**Goal:** Events, scores, manual API
-
-#### T8.1 Events
-- [ ] Implement `trace.event(name, ...)`
-- [ ] Use OTel span events
-- [ ] Map to Langfuse event format
-- [ ] Test event export
-
-#### T8.2 Scores
-- [ ] Implement `trace.score(name, value)`
-- [ ] Buffer scores separately
-- [ ] Send as score-create events
-- [ ] Test score export
-
-#### T8.3 Manual API
-- [ ] Expose `trace.current_span` (OTel span)
-- [ ] Support manual span start/end
-- [ ] Document advanced usage
-
-**Milestone:** Full feature parity!
-
----
-
-### Phase T9: Rails Integration (Week 5)
-
-**Goal:** Automatic Rails tracing
-
-#### T9.1 Middleware
-- [ ] Create `Langfuse::Middleware`
-- [ ] Auto-wrap requests in traces
-- [ ] Capture request metadata
-- [ ] Use OTel Rack instrumentation
-
-#### T9.2 ActiveJob Integration
-- [ ] Auto-wrap jobs in traces
-- [ ] Link job traces to request traces
-- [ ] Use existing OTel ActiveJob instrumentation
-
-**Milestone:** Automatic Rails tracing!
-
----
-
-### Phase T10: Documentation & Polish (Week 6)
-
-**Goal:** Production-ready release
-
-#### T10.1 Documentation
-- [ ] Complete API documentation (YARD)
-- [ ] Write comprehensive README section
-- [ ] Document OTel integration
-- [ ] Write APM integration guide
-- [ ] Document distributed tracing
-
-#### T10.2 Performance Testing
-- [ ] Benchmark OTel overhead
-- [ ] Optimize exporter
-- [ ] Memory profiling
-
-#### T10.3 Final Polish
-- [ ] Ensure >90% test coverage
-- [ ] Fix all Rubocop issues
-- [ ] Review error messages
-- [ ] Final security review
-
-**Milestone:** Tracing features ready for 1.0! 🚀
-
----
-
-## Configuration Example
-
-```ruby
-# config/initializers/langfuse.rb
-Langfuse.configure do |config|
-  # Authentication
-  config.public_key = ENV["LANGFUSE_PUBLIC_KEY"]
-  config.secret_key = ENV["LANGFUSE_SECRET_KEY"]
-
-  # Tracing
-  config.tracing_enabled = true
-  config.tracing_async = true
-  config.batch_size = 50
-  config.flush_interval = 10
-  config.job_queue = :default  # ActiveJob queue name (default: :default)
-
-  # Model pricing
-  config.model_pricing["custom-model"] = {
-    prompt_tokens: 0.005 / 1000,
-    completion_tokens: 0.01 / 1000
-  }
-end
-
-# config/initializers/opentelemetry.rb
-OpenTelemetry::SDK.configure do |c|
-  c.service_name = 'rails-app'
-  c.service_version = ENV['APP_VERSION']
-
-  # Langfuse exporter (LLM observability)
-  c.add_span_processor(
-    OpenTelemetry::SDK::Trace::Export::BatchSpanProcessor.new(
-      Langfuse::Exporter.new,
-      max_queue_size: 1000,
-      max_export_batch_size: 50,
-      schedule_delay: 10_000  # 10 seconds
-    )
-  )
-
-  # Optional: Datadog exporter (APM)
-  if ENV['DD_AGENT_HOST']
-    c.add_span_processor(
-      OpenTelemetry::SDK::Trace::Export::BatchSpanProcessor.new(
-        OpenTelemetry::Exporter::OTLP::Exporter.new(
-          endpoint: "http://#{ENV['DD_AGENT_HOST']}:4318"
-        )
-      )
-    )
-  end
-end
-```
-
----
-
-## Dependencies
-
-### Core OTel Dependencies
-- `opentelemetry-sdk ~> 1.4` - Core tracing SDK
-- `opentelemetry-api ~> 1.2` - Public API
-- `opentelemetry-common ~> 0.21` - Common utilities
-
-### Optional OTel Dependencies
-- `opentelemetry-exporter-otlp ~> 0.27` - For APM integration
-- `opentelemetry-instrumentation-http ~> 0.23` - Automatic HTTP tracing
-- `opentelemetry-instrumentation-rails ~> 0.30` - Automatic Rails tracing
-- `opentelemetry-instrumentation-active_job ~> 0.7` - ActiveJob tracing
-- `opentelemetry-instrumentation-sidekiq ~> 0.25` - Sidekiq tracing
-
-### Existing Dependencies (from prompt management)
-- `faraday ~> 2.0` - HTTP client
-- `faraday-retry ~> 2.0` - Retry logic
-
-### New Dependencies (tracing-specific)
-- `stoplight ~> 4.0` - Circuit breaker (if not added in prompt Phase 7)
-
----
-
-## Open Questions
-
-1. **OTel Instrumentation Scope**: Should we auto-install OTel instrumentations?
-   - **Recommendation**: Make them optional, document in README
-
-2. **Sampling Strategy**: Use OTel sampler or custom?
-   - **Recommendation**: Use OTel ParentBasedSampler with configurable rate
-
-3. **Score Timing**: When to send scores (immediate vs batched)?
-   - **Recommendation**: Batch with other events for efficiency
-
-4. **OTel Collector**: Support OTLP Collector as intermediary?
-   - **Recommendation**: Yes, document as option for high-volume deployments
-
-5. **Context Storage**: Use OTel Context API or custom?
-   - **Recommendation**: Use OTel Context API (thread-safe, distributed-ready)
-
----
-
-## Future Enhancements (Post-v1.0)
-
-These features are not required for v1.0 but could be added in future releases based on user feedback:
-
-### Automatic LLM Client Wrappers
-
-**Motivation:** TypeScript and Python SDKs provide automatic wrappers for popular LLM clients (OpenAI, Anthropic) that eliminate boilerplate by auto-capturing inputs, outputs, and usage.
-
-**TypeScript Example:**
-```typescript
-import { observeOpenAI } from '@langfuse/openai';
-const openai = observeOpenAI(new OpenAI());
-
-// All OpenAI calls automatically traced!
-const completion = await openai.chat.completions.create({
-  model: "gpt-4",
-  messages: [{ role: "user", content: "Hello" }]
-});
-```
-
-**Python Example:**
-```python
-from langfuse.openai import openai  # Wrapped client
-
-# All calls automatically traced!
-response = openai.chat.completions.create(
-    model="gpt-4",
-    messages=[{"role": "user", "content": "Hello"}]
-)
-```
-
-**Proposed Ruby Implementation:**
-
-```ruby
-# gem install langfuse-openai (optional extension gem)
-require 'langfuse/integrations/openai'
-
-# Wrap OpenAI client for automatic tracing
-openai = Langfuse::OpenAI.wrap(OpenAI::Client.new)
-
-# Inside a Langfuse trace, OpenAI calls are auto-traced
-Langfuse.trace(name: "user-query") do |trace|
-  # Automatic generation span created!
-  response = openai.chat(
-    parameters: {
-      model: "gpt-4",
-      messages: [{ role: "user", content: "Hello" }]
-    }
-  )
-
-  # Usage, tokens, cost automatically captured
-  # Input/output automatically logged
-  # Model name automatically detected
-end
-```
-
-**Implementation Approach:**
-
-1. **Separate Extension Gem** (optional dependency)
-   - `langfuse-openai` gem for OpenAI integration
-   - `langfuse-anthropic` gem for Anthropic integration
-   - Keeps core `langfuse` gem lightweight
-
-2. **Monkey-patching with Module Prepend**
-   ```ruby
-   module Langfuse
-     module OpenAI
-       def self.wrap(client)
-         client.singleton_class.prepend(ClientExtensions)
-         client
-       end
-
-       module ClientExtensions
-         def chat(parameters:)
-           # Extract trace context from OTel
-           current_trace = Langfuse.current_trace
-
-           if current_trace
-             current_trace.generation(
-               name: "openai-chat",
-               model: parameters[:model],
-               input: parameters[:messages]
-             ) do |gen|
-               response = super(parameters: parameters)
-
-               gen.output = response.choices.first.message.content
-               gen.usage = {
-                 prompt_tokens: response.usage.prompt_tokens,
-                 completion_tokens: response.usage.completion_tokens,
-                 total_tokens: response.usage.total_tokens
-               }
-
-               response
-             end
-           else
-             super(parameters: parameters)
-           end
-         end
-       end
-     end
-   end
-   ```
-
-3. **OTel Context Detection**
-   - Check if inside a Langfuse trace (via OTel context)
-   - Only trace if within active trace
-   - Pass through normally if not tracing
-
-**Benefits:**
-- ✅ Zero boilerplate for common use cases
-- ✅ Matches TypeScript/Python SDK experience
-- ✅ Automatic input/output/usage capture
-- ✅ Optional (doesn't bloat core gem)
-
-**Trade-offs:**
-- ⚠️ Requires additional gem dependencies
-- ⚠️ Monkey-patching risks (mitigated by Module#prepend)
-- ⚠️ Needs maintenance for each LLM provider
-- ⚠️ May not cover all edge cases
-
-**Recommendation:** Implement as separate extension gems after v1.0, starting with `langfuse-openai` based on user demand.
-
----
-
-## Success Metrics
-
-After implementation, the SDK should achieve:
-
-1. **Performance**: <2ms OTel overhead per trace/span creation
-2. **Reliability**: >99.9% event delivery (with retry)
-3. **Throughput**: Handle 10,000+ traces/second (async mode)
-4. **Test Coverage**: >90% code coverage
-5. **Memory**: <15MB memory overhead (OTel + Langfuse)
-6. **Developer Experience**: <10 lines of code for typical use case
-7. **APM Compatibility**: Works alongside Datadog, New Relic, etc.
-
----
-
-## References
-
-- [Langfuse Tracing Docs](https://langfuse.com/docs/tracing)
-- [Langfuse Python SDK](https://langfuse.com/docs/sdk/python/low-level-sdk) (uses OTel)
-- [OpenTelemetry Ruby](https://opentelemetry.io/docs/instrumentation/ruby/)
-- [OpenTelemetry Specification](https://opentelemetry.io/docs/specs/otel/)
-- [W3C Trace Context](https://www.w3.org/TR/trace-context/)
-- [Datadog OpenTelemetry](https://docs.datadoghq.com/tracing/setup_overview/open_standards/otel/)
-
----
-
-## Key Advantages of OTel-Based Design
-
-### vs Custom Implementation
-
-| Feature | OTel-Based | Custom Implementation |
-|---------|------------|----------------------|
-| **Context Propagation** | Automatic (W3C Trace Context) | Manual thread-local storage |
-| **Distributed Tracing** | Built-in across services | Complex custom solution |
-| **APM Integration** | Native support | Requires custom exporters |
-| **Industry Adoption** | CNCF standard, widely used | SDK-specific |
-| **Code Maintenance** | Less custom code | More code to maintain |
-| **Learning Curve** | OTel patterns (well-documented) | Custom patterns |
-| **Instrumentation** | Rich ecosystem (auto-instrument) | Manual instrumentation |
-| **Future-Proof** | Industry direction | May diverge from standards |
-
-### Key Decision Points
-
-**Choose OTel-Based Design If:**
-- ✅ You want distributed tracing across microservices
-- ✅ You use APM tools (Datadog, New Relic, Honeycomb)
-- ✅ You want automatic instrumentation (HTTP, Rails, Sidekiq)
-- ✅ You value industry standards over custom solutions
-- ✅ You want unified observability (infrastructure + LLM)
-
-**Avoid OTel If:**
-- ❌ You need minimal dependencies (OTel adds ~10 gems)
-- ❌ You only trace within a single app (no distributed tracing)
-- ❌ You don't use APM tools
-- ❌ You want complete control over internals
-
-**For SimplePractice (100 microservices):** OTel-based design is **strongly recommended** due to distributed architecture and existing APM tooling.
-
----
-
-**END OF DESIGN DOCUMENT**
diff --git a/docs/design-history/langfuse-ruby-prompt-management-design.md b/docs/design-history/langfuse-ruby-prompt-management-design.md
deleted file mode 100644
index 6ea3d47..0000000
--- a/docs/design-history/langfuse-ruby-prompt-management-design.md
+++ /dev/null
@@ -1,3556 +0,0 @@
-# Langfuse Ruby SDK - Prompt Management Technical Design
-
-**Document Version:** 1.0
-**Date:** 2025-10-02
-**Author:** Technical Architecture Team
-**Status:** Design Document
-
----
-
-## Table of Contents
-
-1. [Executive Summary](#executive-summary)
-2. [Problem Statement](#problem-statement)
-3. [Architecture Overview](#architecture-overview)
-4. [Component Design](#component-design)
-5. [API Design](#api-design)
-6. [Caching Strategy](#caching-strategy)
-7. [REST API Integration](#rest-api-integration)
-8. [Variable Substitution](#variable-substitution)
-9. [Implementation Phases](#implementation-phases)
-10. [Testing Strategy](#testing-strategy)
-11. [Dependencies](#dependencies)
-12. [Code Examples](#code-examples)
-13. [Migration Strategy](#migration-strategy)
-14. [Trade-offs and Alternatives](#trade-offs-and-alternatives)
-15. [Open Questions](#open-questions)
-
----
-
-## Executive Summary
-
-This document outlines the technical design for adding prompt management functionality to the `langfuse-ruby` gem, achieving feature parity with the JavaScript SDK's prompt management capabilities while adhering to Ruby idioms and best practices.
-
-### Key Objectives
-
-- **Feature Parity**: Match JavaScript SDK's prompt management functionality
-- **Ruby Conventions**: Follow Ruby/Rails conventions (snake_case, blocks, Rails.cache integration)
-- **Thread Safety**: Ensure concurrent request safety for Rails applications
-- **Performance**: Implement intelligent caching with stale-while-revalidate pattern
-- **Developer Experience**: Provide intuitive, well-documented API
-
-### Success Metrics
-
-- All JavaScript SDK prompt features available in Ruby
-- Sub-100ms cache hits for prompt retrieval
-- Zero breaking changes to existing langfuse-ruby API
-- Comprehensive test coverage (>90%)
-- Thread-safe for production Rails applications
-
----
-
-## Design Philosophy: LaunchDarkly-Inspired API
-
-### Why LaunchDarkly as a Model?
-
-The LaunchDarkly Ruby SDK is widely regarded as one of the best-designed Ruby gems, with exceptional developer ergonomics. This design incorporates several key patterns from LaunchDarkly:
-
-**1. Flat API Surface**
-- LaunchDarkly: `client.variation('flag', user, default)`
-- Langfuse: `client.get_prompt('name', fallback: "...")`
-- Benefit: Minimal cognitive overhead, everything on the client
-
-**2. Required Defaults for Resilience**
-- LaunchDarkly: Every call requires a default value, never throws
-- Langfuse: Encourage fallbacks, gracefully degrade on errors
-- Benefit: Production resilience built-in
-
-**3. Configuration Object Pattern**
-- LaunchDarkly: `Config` class with block initialization
-- Langfuse: `Langfuse::Config` with global configuration
-- Benefit: Clean Rails initialization, centralized settings
-
-**4. Simple Return Values**
-- LaunchDarkly: `variation` returns value, `variation_detail` adds metadata
-- Langfuse: `get_prompt` returns client, `get_prompt_detail` adds metadata
-- Benefit: Simple common case, detailed when needed
-
-**5. Global Singleton Pattern**
-- LaunchDarkly: Initialize once, use everywhere
-- Langfuse: `Langfuse.client` for Rails convenience
-- Benefit: No prop-drilling, simpler service objects
-
-### API Comparison
-
-| Feature | LaunchDarkly | Langfuse (This Design) |
-|---------|-------------|------------------------|
-| Initialization | `LDClient.new(sdk_key, config)` | `Client.new(config)` |
-| Global config | `Config.new { \|c\| ... }` | `Langfuse.configure { \|c\| ... }` |
-| Global client | Manual singleton | `Langfuse.client` |
-| Primary method | `variation(key, user, default)` | `get_prompt(name, fallback: ...)` |
-| Detail variant | `variation_detail(key, user, default)` | `get_prompt_detail(name, ...)` |
-| Error handling | Returns default, logs error | Returns fallback or raises |
-| State check | `initialized?` | `initialized?` |
-
----
-
-## Problem Statement
-
-### Current State
-
-The `langfuse-ruby` gem (v0.1.4) provides:
-- Tracing functionality (trace, span, generation, event, score)
-- Basic configuration and authentication
-- Async processing via Sidekiq integration
-
-**Missing capabilities:**
-- Prompt retrieval and management
-- Prompt creation and updates
-- Variable substitution/compilation
-- Intelligent caching
-- Placeholder support for chat prompts
-- LangChain integration helpers
-
-### Business Context
-
-Langfuse prompts enable:
-- **Centralized Prompt Management**: Single source of truth for LLM prompts
-- **Version Control**: Track prompt changes over time
-- **A/B Testing**: Multiple prompt versions with labels
-- **Rapid Iteration**: Update prompts without code deployment
-- **Collaboration**: Product/non-technical teams can manage prompts
-
-### Target Users
-
-1. **Rails Developers**: Building LLM-powered features in production Rails apps
-2. **Data Scientists**: Experimenting with prompt engineering in Ruby notebooks
-3. **Platform Teams**: Managing LLM integrations across microservices
-
----
-
-## Architecture Overview
-
-### High-Level System Design
-
-```
-┌─────────────────────────────────────────────────────────────┐
-│               Langfuse Module (Global Config)                │
-│                                                              │
-│  • configure { |config| ... }                                │
-│  • client (singleton)                                        │
-│  • reset! (testing)                                          │
-└──────────────────────────┬───────────────────────────────────┘
-                           │
-                           │ creates
-                           ▼
-┌─────────────────────────────────────────────────────────────┐
-│                    Langfuse::Client                          │
-│                                                              │
-│  Prompt Methods (Flattened API):                            │
-│  • get_prompt(name, **options)                              │
-│  • get_prompt_detail(name, **options)                       │
-│  • compile_prompt(name, variables:, placeholders:)          │
-│  • create_prompt(**body)                                    │
-│  • update_prompt(name:, version:, labels:)                  │
-│  • invalidate_cache(name)                                   │
-│  • initialized?                                             │
-│                                                              │
-│  ┌─────────────────────────────────────────────────────┐   │
-│  │         Langfuse::PromptCache                       │   │
-│  │                                                     │   │
-│  │  • get_including_expired(key)                       │   │
-│  │  • set(key, value, ttl)                             │   │
-│  │  • invalidate(prompt_name)                          │   │
-│  │  • trigger_background_refresh                       │   │
-│  │  • Thread-safe operations (Mutex)                   │   │
-│  └─────────────────────────────────────────────────────┘   │
-│                                                             │
-│  ┌─────────────────────────────────────────────────────┐   │
-│  │         Langfuse::ApiClient                         │   │
-│  │                                                     │   │
-│  │  • get_prompt(name, version:, label:)               │   │
-│  │  • create_prompt(body)                              │   │
-│  │  • update_prompt_version(name, version, labels)     │   │
-│  └─────────────────────────────────────────────────────┘   │
-└─────────────────────────────────────────────────────────────┘
-                           │
-                           │ HTTP (Basic Auth)
-                           ▼
-                ┌──────────────────────┐
-                │   Langfuse API       │
-                │  (cloud.langfuse.com)│
-                └──────────────────────┘
-
-
-┌─────────────────────────────────────────────────────────────┐
-│              Prompt Client Classes                           │
-├─────────────────────────────────────────────────────────────┤
-│                                                              │
-│  ┌────────────────────────────────────────────────────┐     │
-│  │  Langfuse::TextPromptClient                        │     │
-│  │                                                    │     │
-│  │  • compile(variables = {})                         │     │
-│  │  • to_langchain                                    │     │
-│  │  • name, version, config, labels, tags             │     │
-│  └────────────────────────────────────────────────────┘     │
-│                                                              │
-│  ┌────────────────────────────────────────────────────┐     │
-│  │  Langfuse::ChatPromptClient                        │     │
-│  │                                                    │     │
-│  │  • compile(variables = {}, placeholders = {})      │     │
-│  │  • to_langchain(placeholders: {})                  │     │
-│  │  • name, version, config, labels, tags             │     │
-│  └────────────────────────────────────────────────────┘     │
-└─────────────────────────────────────────────────────────────┘
-```
-
-### Component Responsibilities
-
-| Component | Responsibility | Thread-Safe? |
-|-----------|---------------|--------------|
-| `Langfuse` (module) | Global configuration, singleton client | N/A |
-| `Langfuse::Config` | Configuration object (keys, cache, logger) | N/A |
-| `Langfuse::Client` | Main API surface, prompt operations, caching logic | Yes |
-| `Langfuse::PromptCache` | In-memory TTL cache, stale-while-revalidate | Yes |
-| `Langfuse::ApiClient` | HTTP communication with Langfuse API | Yes |
-| `Langfuse::TextPromptClient` | Text prompt manipulation, compilation | N/A (immutable) |
-| `Langfuse::ChatPromptClient` | Chat prompt manipulation, placeholders | N/A (immutable) |
-
-### Integration with Existing Gem
-
-The prompt management system integrates seamlessly:
-
-```ruby
-# Global configuration (Rails initializer)
-Langfuse.configure do |config|
-  config.public_key = ENV['LANGFUSE_PUBLIC_KEY']
-  config.secret_key = ENV['LANGFUSE_SECRET_KEY']
-end
-
-# NEW: Get global client
-client = Langfuse.client
-
-# NEW: Prompt management (flattened API)
-prompt = client.get_prompt("greeting")
-compiled = prompt.compile(name: "Alice")
-
-# Or: One-step convenience method
-text = client.compile_prompt("greeting", variables: { name: "Alice" })
-
-# Existing tracing (unchanged)
-client.trace(name: "my-trace") do |trace|
-  trace.generation(name: "llm-call", input: compiled)
-end
-```
-
----
-
-## Component Design
-
-### 1. Langfuse Module (Global Configuration)
-
-**Purpose**: Provide global configuration and singleton client for Rails convenience.
-
-```ruby
-module Langfuse
-  class << self
-    attr_writer :configuration
-
-    # Global configuration
-    def configuration
-      @configuration ||= Config.new
-    end
-
-    # Configure block (Rails initializer)
-    def configure
-      yield(configuration)
-      configuration.validate!
-    end
-
-    # Global singleton client
-    def client
-      @client ||= Client.new(configuration)
-    end
-
-    # Reset for testing
-    def reset!
-      @configuration = nil
-      @client = nil
-    end
-  end
-end
-```
-
-### 2. Langfuse::Config
-
-**Purpose**: Configuration object for client initialization.
-
-```ruby
-module Langfuse
-  class Config
-    attr_accessor :public_key, :secret_key, :base_url, :timeout, :logger
-    attr_accessor :cache_ttl, :cache_max_size, :cache_backend
-
-    DEFAULT_BASE_URL = "https://cloud.langfuse.com"
-    DEFAULT_TIMEOUT = 5
-    DEFAULT_CACHE_TTL = 60
-    DEFAULT_CACHE_MAX_SIZE = 1000
-
-    def initialize
-      @public_key = ENV['LANGFUSE_PUBLIC_KEY']
-      @secret_key = ENV['LANGFUSE_SECRET_KEY']
-      @base_url = ENV['LANGFUSE_BASE_URL'] || DEFAULT_BASE_URL
-      @timeout = DEFAULT_TIMEOUT
-      @cache_ttl = DEFAULT_CACHE_TTL
-      @cache_max_size = DEFAULT_CACHE_MAX_SIZE
-      @cache_backend = :memory
-      @logger = defined?(Rails) ? Rails.logger : Logger.new($stdout)
-
-      yield(self) if block_given?
-    end
-
-    def validate!
-      raise ConfigurationError, "public_key is required" if public_key.nil? || public_key.empty?
-      raise ConfigurationError, "secret_key is required" if secret_key.nil? || secret_key.empty?
-    end
-  end
-end
-```
-
-**Usage Examples**:
-
-```ruby
-# Rails initializer
-Langfuse.configure do |config|
-  config.public_key = ENV['LANGFUSE_PUBLIC_KEY']
-  config.secret_key = ENV['LANGFUSE_SECRET_KEY']
-  config.cache_ttl = 120  # 2 minutes
-  config.logger = Rails.logger
-end
-
-# Anywhere in app
-client = Langfuse.client  # Uses global config
-
-# Or: Custom client for multi-tenant
-config = Langfuse::Config.new do |c|
-  c.public_key = tenant.langfuse_key
-  c.secret_key = tenant.langfuse_secret
-end
-custom_client = Langfuse::Client.new(config)
-```
-
-### 3. Langfuse::Client (Flattened API)
-
-**Purpose**: Main entry point for all prompt operations with caching logic.
-
-```ruby
-module Langfuse
-  class Client
-    attr_reader :config, :api_client, :cache, :logger
-
-    # Initialize with Config object or inline options
-    #
-    # @param config_or_options [Config, Hash] Config object or hash of options
-    def initialize(config_or_options = nil)
-      if config_or_options.is_a?(Config)
-        @config = config_or_options
-      elsif config_or_options.is_a?(Hash)
-        # Backward compatibility: convert hash to config
-        @config = Config.new do |c|
-          config_or_options.each { |k, v| c.send("#{k}=", v) if c.respond_to?("#{k}=") }
-        end
-      else
-        @config = Langfuse.configuration
-      end
-
-      @config.validate!
-      @logger = @config.logger
-      @api_client = ApiClient.new(
-        public_key: @config.public_key,
-        secret_key: @config.secret_key,
-        base_url: @config.base_url,
-        timeout: @config.timeout,
-        logger: @logger
-      )
-      @cache = PromptCache.new(
-        max_size: @config.cache_max_size,
-        logger: @logger
-      )
-    end
-
-    # Check if client is initialized and ready
-    def initialized?
-      !@api_client.nil? && !@cache.nil?
-    end
-
-    # Get a prompt with caching and fallback support
-    #
-    # @param name [String] Prompt name
-    # @param options [Hash] Options hash
-    # @option options [Integer] :version Specific version
-    # @option options [String] :label Label filter (default: "production")
-    # @option options [Integer] :cache_ttl Cache TTL in seconds
-    # @option options [String, Array<Hash>] :fallback Fallback content (RECOMMENDED)
-    # @option options [Symbol] :type Force type (:text or :chat)
-    # @option options [Integer] :timeout Request timeout in seconds
-    #
-    # @return [TextPromptClient, ChatPromptClient]
-    # @raise [ApiError, NotFoundError] if fetch fails and no fallback
-    #
-    # @example Get with fallback (recommended)
-    #   prompt = client.get_prompt("greeting",
-    #     fallback: "Hello {{name}}!",
-    #     type: :text
-    #   )
-    def get_prompt(name, **options)
-      # Implementation: cache lookup -> API fetch -> fallback handling
-      # (see full implementation in detailed design below)
-    end
-
-    # Get prompt with detailed metadata (for debugging/observability)
-    #
-    # @return [Hash] { prompt:, cached:, stale:, version:, fetch_time_ms:, source: }
-    def get_prompt_detail(name, **options)
-      # Implementation details
-    end
-
-    # Convenience method: Get and compile in one step
-    #
-    # @param name [String] Prompt name
-    # @param variables [Hash] Variables for text prompts
-    # @param placeholders [Hash] Placeholders for chat prompts
-    # @param options [Hash] Same options as get_prompt
-    #
-    # @return [String, Array<Hash>] Compiled result
-    #
-    # @example
-    #   text = client.compile_prompt("greeting",
-    #     variables: { name: "Alice" },
-    #     fallback: "Hello {{name}}!",
-    #     type: :text
-    #   )
-    def compile_prompt(name, variables: {}, placeholders: {}, **options)
-      prompt = get_prompt(name, **options)
-      case prompt
-      when TextPromptClient
-        prompt.compile(variables)
-      when ChatPromptClient
-        prompt.compile(variables, placeholders)
-      end
-    end
-
-    # Create a new prompt
-    #
-    # @param body [Hash] Prompt definition
-    # @return [TextPromptClient, ChatPromptClient]
-    def create_prompt(**body)
-      validate_create_body!(body)
-      response = api_client.create_prompt(body)
-      build_prompt_client(response)
-    end
-
-    # Update prompt version labels
-    #
-    # @param name [String] Prompt name
-    # @param version [Integer] Version number
-    # @param labels [Array<String>] New labels
-    #
-    # @return [Hash] Updated prompt metadata
-    def update_prompt(name:, version:, labels:)
-      validate_update_params!(name, version, labels)
-      response = api_client.update_prompt_version(name, version, labels)
-      cache.invalidate(name)  # Only after successful update
-      response
-    end
-
-    # Invalidate cache for a prompt
-    def invalidate_cache(name)
-      cache.invalidate(name)
-      logger.info("Langfuse: Invalidated cache for #{name}")
-    end
-
-    private
-
-    def validate_fallback_type!(fallback, type)
-      case type
-      when :text
-        unless fallback.is_a?(String)
-          raise ArgumentError, "Text prompt fallback must be a String, got #{fallback.class}"
-        end
-      when :chat
-        unless fallback.is_a?(Array)
-          raise ArgumentError, "Chat prompt fallback must be an Array, got #{fallback.class}"
-        end
-        fallback.each_with_index do |msg, i|
-          unless msg.is_a?(Hash) && (msg.key?(:role) || msg.key?(:type))
-            raise ArgumentError, "Chat fallback message #{i} must have :role or :type"
-          end
-        end
-      end
-    end
-
-    # ... additional private methods for fetch_and_cache, build_prompt_client, etc.
-  end
-end
-```
-
-**Key Design Decisions:**
-
-1. **Flattened API**: All methods directly on `Client` (LaunchDarkly style)
-2. **Keyword Arguments**: `**options` for flexibility and readability
-3. **Smart Defaults**: `cache_ttl: 60`, `label: "production"`
-4. **Graceful Degradation**: Fallback support encouraged, logs instead of raising when fallback provided
-5. **Convenience Methods**: `compile_prompt` for common one-step use case
-6. **Detail Variants**: `get_prompt_detail` for observability (LaunchDarkly pattern)
-7. **Built-in Instrumentation**: ActiveSupport::Notifications for observability
-
-**Observability & Instrumentation:**
-
-```ruby
-class Client
-  def get_prompt(name, **options)
-    start_time = Time.now
-
-    # Fetch logic...
-    result = # ...
-
-    # Emit instrumentation event
-    instrument('prompt.get', {
-      name: name,
-      cached: cache_hit?,
-      duration_ms: (Time.now - start_time) * 1000,
-      version: result.version,
-      fallback_used: result.is_fallback
-    })
-
-    result
-  end
-
-  private
-
-  def instrument(event, payload)
-    return unless defined?(ActiveSupport::Notifications)
-    ActiveSupport::Notifications.instrument("langfuse.#{event}", payload)
-  end
-end
-
-# Subscribe to events for monitoring
-ActiveSupport::Notifications.subscribe('langfuse.prompt.get') do |name, start, finish, id, payload|
-  # Log to stdout
-  Rails.logger.info("Langfuse prompt fetch", payload)
-
-  # Send to StatsD/Datadog
-  StatsD.increment('langfuse.prompt.get')
-  StatsD.timing('langfuse.prompt.duration', payload[:duration_ms])
-  StatsD.increment('langfuse.prompt.cache_hit') if payload[:cached]
-  StatsD.increment('langfuse.prompt.fallback') if payload[:fallback_used]
-end
-```
-
-### 4. Langfuse::PromptCache
-
-**Purpose**: Thread-safe in-memory cache with TTL and stale-while-revalidate support.
-
-```ruby
-module Langfuse
-  class PromptCache
-    DEFAULT_TTL_SECONDS = 60
-    MAX_CACHE_SIZE = 1000  # Prevent unbounded memory growth
-
-    class CacheItem
-      attr_reader :value, :expiry
-
-      def initialize(value, ttl_seconds)
-        @value = value
-        @expiry = Time.now + ttl_seconds
-      end
-
-      def expired?
-        Time.now > @expiry
-      end
-    end
-
-    def initialize(max_size: MAX_CACHE_SIZE)
-      @cache = {}
-      @mutex = Mutex.new
-      @refreshing_keys = {}
-      @access_order = []  # Track access for LRU eviction
-      @max_size = max_size
-    end
-
-    # Get item including expired entries (for stale-while-revalidate)
-    # Implements cache stampede protection
-    #
-    # @param key [String] Cache key
-    # @return [CacheItem, nil]
-    def get_including_expired(key)
-      @mutex.synchronize do
-        item = @cache[key]
-
-        # Update access order for LRU
-        if item
-          @access_order.delete(key)
-          @access_order.push(key)
-        end
-
-        item
-      end
-    end
-
-    # Generate cache key from prompt parameters
-    #
-    # @param name [String] Prompt name
-    # @param version [Integer, nil] Version number
-    # @param label [String, nil] Label
-    # @return [String] Cache key
-    #
-    # @example
-    #   create_key(name: "greeting", label: "production")
-    #   # => "greeting-label:production"
-    def create_key(name:, version: nil, label: nil)
-      parts = [name]
-      if version
-        parts << "version:#{version}"
-      elsif label
-        parts << "label:#{label}"
-      else
-        parts << "label:production"
-      end
-      parts.join("-")
-    end
-
-    # Store value in cache with TTL and LRU eviction
-    #
-    # @param key [String] Cache key
-    # @param value [TextPromptClient, ChatPromptClient] Prompt client
-    # @param ttl_seconds [Integer, nil] TTL (default: 60)
-    def set(key, value, ttl_seconds = nil)
-      ttl = ttl_seconds || DEFAULT_TTL_SECONDS
-      @mutex.synchronize do
-        # Evict LRU entry if at capacity
-        evict_lru if @cache.size >= @max_size && !@cache.key?(key)
-
-        @cache[key] = CacheItem.new(value, ttl)
-
-        # Update access order
-        @access_order.delete(key)
-        @access_order.push(key)
-      end
-    end
-
-    # Track background refresh promise
-    #
-    # @param key [String] Cache key
-    # @param promise [Thread] Background refresh thread
-    def add_refreshing_promise(key, promise)
-      @mutex.synchronize { @refreshing_keys[key] = promise }
-
-      # Non-blocking cleanup after thread completes
-      # This ensures stale-while-revalidate doesn't block the calling thread
-      Thread.new do
-        promise.join
-        @mutex.synchronize { @refreshing_keys.delete(key) }
-      end
-    end
-
-    # Check if key is currently being refreshed
-    #
-    # @param key [String] Cache key
-    # @return [Boolean]
-    def refreshing?(key)
-      @mutex.synchronize { @refreshing_keys.key?(key) }
-    end
-
-    # Invalidate all cache entries for a prompt
-    #
-    # @param prompt_name [String] Prompt name
-    def invalidate(prompt_name)
-      @mutex.synchronize do
-        @cache.keys.each do |key|
-          if key.start_with?(prompt_name)
-            @cache.delete(key)
-            @access_order.delete(key)
-          end
-        end
-      end
-    end
-
-    private
-
-    # Evict least recently used cache entry
-    def evict_lru
-      return if @access_order.empty?
-
-      lru_key = @access_order.shift
-      @cache.delete(lru_key)
-    end
-  end
-end
-```
-
-**Key Design Decisions:**
-
-1. **Mutex for Thread Safety**: All cache operations use mutex synchronization
-2. **Stale-While-Revalidate**: Return expired cache while refreshing in background
-3. **Simple Key Generation**: Deterministic cache keys based on name/version/label
-4. **Rails.cache Integration**: Phase 2 will add optional Rails.cache backend
-
-**Alternative Considered: Rails.cache by Default**
-
-We could use `Rails.cache` immediately, but:
-- **Pro**: Distributed caching across processes/servers
-- **Con**: Requires Rails dependency, slower than in-memory
-- **Decision**: Start with in-memory, add Rails.cache as opt-in in Phase 2
-
-**Background Refresh with Thread Pool:**
-
-To prevent unbounded thread creation during cache refreshes, use a thread pool:
-
-```ruby
-require 'concurrent-ruby'
-
-class PromptCache
-  def initialize(max_size: MAX_CACHE_SIZE)
-    @cache = {}
-    @mutex = Mutex.new
-    @refreshing_keys = {}
-    @access_order = []
-    @max_size = max_size
-    # Thread pool for background refreshes (max 5 concurrent)
-    @thread_pool = Concurrent::FixedThreadPool.new(5)
-  end
-
-  def trigger_background_refresh(key, &block)
-    return if refreshing?(key)
-
-    @mutex.synchronize do
-      return if @refreshing_keys.key?(key)
-
-      # Submit to thread pool instead of creating unbounded threads
-      future = Concurrent::Future.execute(executor: @thread_pool) do
-        block.call
-      end
-
-      @refreshing_keys[key] = future
-
-      # Clean up when done (non-blocking)
-      future.add_observer do |time, value, reason|
-        @mutex.synchronize { @refreshing_keys.delete(key) }
-      end
-    end
-  end
-end
-```
-
-**Benefits:**
-- Limits concurrent API calls to 5 (configurable)
-- Prevents thread exhaustion under high load
-- Graceful handling of refresh failures
-
-### 5. Langfuse::TextPromptClient
-
-**Purpose**: Represent and manipulate text-based prompts.
-
-```ruby
-module Langfuse
-  class TextPromptClient
-    attr_reader :name, :version, :config, :labels, :tags, :prompt, :type, :is_fallback
-
-    def initialize(response, is_fallback: false)
-      @name = response[:name]
-      @version = response[:version]
-      @config = response[:config] || {}
-      @labels = response[:labels] || []
-      @tags = response[:tags] || []
-      @prompt = response[:prompt]
-      @type = :text
-      @is_fallback = is_fallback
-    end
-
-    # Compile prompt by substituting variables
-    #
-    # @param variables [Hash] Variable substitutions
-    # @return [String] Compiled prompt
-    #
-    # @example
-    #   prompt.compile(name: "Alice", city: "NYC")
-    #   # "Hello {{name}} from {{city}}!" => "Hello Alice from NYC!"
-    def compile(variables = {})
-      Mustache.render(prompt, variables.transform_keys(&:to_s))
-    end
-
-    # Convert to LangChain PromptTemplate format
-    #
-    # @return [String] Prompt with {var} syntax
-    #
-    # @example
-    #   prompt.to_langchain
-    #   # "Hello {{name}}!" => "Hello {name}!"
-    def to_langchain
-      transform_to_langchain_variables(prompt)
-    end
-
-    # Serialize to JSON
-    #
-    # @return [String] JSON representation
-    def to_json(*args)
-      {
-        name: name,
-        prompt: prompt,
-        version: version,
-        is_fallback: is_fallback,
-        tags: tags,
-        labels: labels,
-        type: type,
-        config: config
-      }.to_json(*args)
-    end
-
-    private
-
-    def transform_to_langchain_variables(content)
-      # Convert {{var}} to {var}
-      content.gsub(/\{\{(\w+)\}\}/, '{\1}')
-    end
-  end
-end
-```
-
-**Key Design Decisions:**
-
-1. **Immutable**: All attributes are read-only (Ruby convention for value objects)
-2. **Mustache Templating**: Use `mustache` gem for variable substitution
-3. **Symbol Keys**: Return `:text` for type (Ruby convention)
-4. **Simple Interface**: Focus on common use cases (compile, to_langchain)
-
-### 6. Langfuse::ChatPromptClient
-
-**Purpose**: Represent and manipulate chat-based prompts with placeholder support.
-
-```ruby
-module Langfuse
-  class ChatPromptClient
-    attr_reader :name, :version, :config, :labels, :tags, :prompt, :type, :is_fallback
-
-    # Chat message types
-    MESSAGE_TYPE_CHAT = "chatmessage"
-    MESSAGE_TYPE_PLACEHOLDER = "placeholder"
-
-    def initialize(response, is_fallback: false)
-      @name = response[:name]
-      @version = response[:version]
-      @config = response[:config] || {}
-      @labels = response[:labels] || []
-      @tags = response[:tags] || []
-      @prompt = normalize_prompt(response[:prompt])
-      @type = :chat
-      @is_fallback = is_fallback
-    end
-
-    # Compile prompt by substituting variables and resolving placeholders
-    #
-    # @param variables [Hash] Variable substitutions for Mustache templates
-    # @param placeholders [Hash] Placeholder resolutions (name => array of messages)
-    # @param required_placeholders [Array<String, Symbol>] List of required placeholder names
-    # @return [Array<Hash>] Array of chat messages with resolved placeholders
-    # @raise [ArgumentError] if required placeholder is missing or invalid
-    #
-    # @example
-    #   messages = prompt.compile(
-    #     { user_name: "Alice" },
-    #     { examples: [
-    #       { role: "user", content: "Hi" },
-    #       { role: "assistant", content: "Hello!" }
-    #     ]}
-    #   )
-    def compile(variables = {}, placeholders = {}, required_placeholders: [])
-      # Validate required placeholders are provided
-      required_placeholders.each do |name|
-        unless placeholders.key?(name) || placeholders.key?(name.to_sym) || placeholders.key?(name.to_s)
-          raise ArgumentError, "Required placeholder '#{name}' not provided"
-        end
-      end
-
-      messages = []
-
-      prompt.each do |item|
-        if item[:type] == MESSAGE_TYPE_PLACEHOLDER
-          # Resolve placeholder
-          placeholder_value = placeholders[item[:name].to_sym] || placeholders[item[:name]]
-
-          if placeholder_value.nil?
-            # Keep unresolved placeholder for debugging
-            messages << item
-          elsif placeholder_value.is_a?(Array)
-            # Handle empty arrays - skip them
-            next if placeholder_value.empty?
-
-            # Validate all messages have proper structure
-            unless valid_chat_messages?(placeholder_value)
-              raise ArgumentError, "Placeholder '#{item[:name]}' must contain valid chat messages with :role and :content"
-            end
-
-            messages.concat(placeholder_value)
-          else
-            # Invalid placeholder value
-            raise ArgumentError, "Placeholder '#{item[:name]}' must be an Array of messages, got #{placeholder_value.class}"
-          end
-        elsif item[:type] == MESSAGE_TYPE_CHAT
-          # Regular message: substitute variables
-          messages << {
-            role: item[:role],
-            content: Mustache.render(item[:content], variables.transform_keys(&:to_s))
-          }
-        end
-      end
-
-      messages
-    end
-
-    # Convert to LangChain ChatPromptTemplate format
-    #
-    # @param placeholders [Hash] Placeholder resolutions
-    # @return [Array] Array of messages and MessagesPlaceholder objects
-    #
-    # @example
-    #   langchain_messages = prompt.to_langchain(
-    #     placeholders: { examples: [...] }
-    #   )
-    def to_langchain(placeholders: {})
-      messages = []
-
-      prompt.each do |item|
-        if item[:type] == MESSAGE_TYPE_PLACEHOLDER
-          placeholder_value = placeholders[item[:name].to_sym] || placeholders[item[:name]]
-
-          if placeholder_value.is_a?(Array) && !placeholder_value.empty?
-            # Resolved placeholder: add messages with transformed variables
-            placeholder_value.each do |msg|
-              messages << {
-                role: msg[:role],
-                content: transform_to_langchain_variables(msg[:content])
-              }
-            end
-          else
-            # Unresolved: convert to LangChain MessagesPlaceholder
-            messages << ["placeholder", "{#{item[:name]}}"]
-          end
-        elsif item[:type] == MESSAGE_TYPE_CHAT
-          messages << {
-            role: item[:role],
-            content: transform_to_langchain_variables(item[:content])
-          }
-        end
-      end
-
-      messages
-    end
-
-    def to_json(*args)
-      {
-        name: name,
-        prompt: prompt.map { |item|
-          if item[:type] == MESSAGE_TYPE_CHAT
-            item.except(:type)
-          else
-            item
-          end
-        },
-        version: version,
-        is_fallback: is_fallback,
-        tags: tags,
-        labels: labels,
-        type: type,
-        config: config
-      }.to_json(*args)
-    end
-
-    private
-
-    def normalize_prompt(messages)
-      # Ensure all messages have a type field
-      messages.map do |item|
-        if item[:type]
-          item # Already has type
-        else
-          # Legacy format: add type
-          { type: MESSAGE_TYPE_CHAT }.merge(item)
-        end
-      end
-    end
-
-    def transform_to_langchain_variables(content)
-      content.gsub(/\{\{(\w+)\}\}/, '{\1}')
-    end
-
-    def valid_chat_messages?(messages)
-      messages.all? { |m| m.is_a?(Hash) && m.key?(:role) && m.key?(:content) }
-    end
-  end
-end
-```
-
-**Key Design Decisions:**
-
-1. **Placeholder Support**: First-class support for dynamic message insertion
-2. **Type Normalization**: Handle both legacy and new message formats
-3. **Flexible Placeholders**: Accept symbol or string keys for Ruby ergonomics
-4. **Array Flattening**: `compile` returns flat array of resolved messages
-
-### 7. Langfuse::ApiClient Extensions
-
-**Purpose**: Add HTTP endpoints for prompt operations.
-
-```ruby
-module Langfuse
-  class ApiClient
-    # ... existing methods ...
-
-    # Fetch a prompt by name
-    #
-    # @param name [String] Prompt name
-    # @param version [Integer, nil] Specific version
-    # @param label [String, nil] Label filter
-    # @param timeout_seconds [Integer, nil] Request timeout
-    #
-    # @return [Hash] Prompt response
-    # @note Retries are handled by Faraday retry middleware (max: 2, interval: 0.5s)
-    def get_prompt(name, version: nil, label: nil, timeout_seconds: nil)
-      params = {}
-      params[:version] = version if version
-      params[:label] = label if label
-
-      response = connection(timeout: timeout_seconds).get("/api/public/v2/prompts/#{name}") do |req|
-        req.params = params
-      end
-
-      handle_response(response)
-    rescue Faraday::Error => e
-      raise ApiError, "Failed to fetch prompt '#{name}': #{e.message}"
-    end
-
-    # Create a new prompt
-    #
-    # @param body [Hash] Prompt definition
-    # @return [Hash] Created prompt response
-    def create_prompt(body)
-      response = connection.post("/api/public/v2/prompts") do |req|
-        req.headers['Content-Type'] = 'application/json'
-        req.body = body.to_json
-      end
-
-      handle_response(response)
-    end
-
-    # Update prompt version labels
-    #
-    # @param name [String] Prompt name
-    # @param version [Integer] Version number
-    # @param labels [Array<String>] New labels
-    #
-    # @return [Hash] Updated prompt
-    def update_prompt_version(name, version, labels)
-      response = connection.patch("/api/public/v2/prompts/#{name}/#{version}") do |req|
-        req.headers['Content-Type'] = 'application/json'
-        req.body = { labels: labels }.to_json
-      end
-
-      handle_response(response)
-    end
-
-    private
-
-    def connection(timeout: nil)
-      if timeout
-        # Create dedicated connection for custom timeout
-        # to avoid mutating shared connection
-        build_connection(timeout: timeout)
-      else
-        @connection ||= build_connection
-      end
-    end
-
-    def build_connection(timeout: nil)
-      Faraday.new(
-        url: base_url,
-        headers: {
-          'Authorization' => authorization_header,
-          'User-Agent' => "langfuse-ruby/#{Langfuse::VERSION}"
-        }
-      ) do |conn|
-        conn.request :retry, max: 2, interval: 0.5
-        conn.response :json, content_type: /\bjson$/
-        conn.adapter Faraday.default_adapter
-        conn.options.timeout = timeout if timeout
-      end
-    end
-
-    def authorization_header
-      # Basic Auth: base64(public_key:secret_key)
-      credentials = "#{@public_key}:#{@secret_key}"
-      "Basic #{Base64.strict_encode64(credentials)}"
-    end
-
-    def handle_response(response)
-      case response.status
-      when 200..299
-        symbolize_keys(response.body)
-      when 401
-        raise UnauthorizedError, "Invalid API credentials"
-      when 404
-        raise NotFoundError, "Prompt not found"
-      when 429
-        raise RateLimitError, "Rate limit exceeded"
-      else
-        raise ApiError, "HTTP #{response.status}: #{response.body}"
-      end
-    end
-
-    def symbolize_keys(hash)
-      # Recursively convert string keys to symbols
-      JSON.parse(hash.to_json, symbolize_names: true)
-    end
-  end
-end
-```
-
-**Key Design Decisions:**
-
-1. **Faraday for HTTP**: Industry standard, flexible middleware
-2. **Basic Auth**: Use public_key:secret_key as per Langfuse spec
-3. **Automatic Retries**: Built-in exponential backoff for transient errors
-4. **Symbol Keys**: Return hashes with symbol keys (Ruby convention)
-5. **Custom Exceptions**: Specific errors for different failure modes
-
----
-
-## API Design
-
-### Client Initialization
-
-```ruby
-# Option 1: Global configuration (recommended for Rails)
-Langfuse.configure do |config|
-  config.public_key = ENV['LANGFUSE_PUBLIC_KEY']
-  config.secret_key = ENV['LANGFUSE_SECRET_KEY']
-  config.cache_ttl = 120  # 2 minutes
-end
-
-# Use global singleton client
-client = Langfuse.client
-
-# Option 2: Per-client configuration
-config = Langfuse::Config.new do |c|
-  c.public_key = "pk_..."
-  c.secret_key = "sk_..."
-end
-client = Langfuse::Client.new(config)
-
-# Option 3: Inline hash (backward compatible)
-client = Langfuse::Client.new(
-  public_key: "pk_...",
-  secret_key: "sk_..."
-)
-```
-
-### Get Prompt (Flattened API)
-
-```ruby
-# Get latest production version
-prompt = client.get_prompt("greeting")
-
-# Get specific version
-prompt = client.get_prompt("greeting", version: 2)
-
-# Get by label
-prompt = client.get_prompt("greeting", label: "staging")
-
-# Disable caching for testing
-prompt = client.get_prompt("greeting", cache_ttl: 0)
-
-# With fallback for resilience (RECOMMENDED)
-prompt = client.get_prompt("greeting",
-  fallback: "Hello {{name}}!",
-  type: :text
-)
-
-# Chat prompt with fallback
-prompt = client.get_prompt("conversation",
-  type: :chat,
-  fallback: [
-    { role: "system", content: "You are a helpful assistant." },
-    { role: "user", content: "{{user_message}}" }
-  ]
-)
-
-# Get with detailed metadata (debugging/observability)
-detail = client.get_prompt_detail("greeting")
-# => {
-#   prompt: TextPromptClient,
-#   cached: true,
-#   stale: false,
-#   version: 3,
-#   fetch_time_ms: 1.2,
-#   source: :cache
-# }
-```
-
-### Convenience Method: Compile in One Step
-
-```ruby
-# Get and compile in single call
-text = client.compile_prompt("greeting",
-  variables: { name: "Alice", city: "SF" },
-  fallback: "Hello {{name}}!",
-  type: :text
-)
-# => "Hello Alice from SF!"
-
-# Chat prompt compilation
-messages = client.compile_prompt("conversation",
-  variables: { user_name: "Alice" },
-  placeholders: {
-    examples: [
-      { role: "user", content: "Hi" },
-      { role: "assistant", content: "Hello!" }
-    ]
-  },
-  type: :chat
-)
-```
-
-### Create Prompt
-
-```ruby
-# Create text prompt
-text_prompt = client.create_prompt(
-  name: "greeting",
-  prompt: "Hello {{name}} from {{city}}!",
-  type: :text,
-  labels: ["production"],
-  tags: ["customer-facing"],
-  config: { temperature: 0.7 }
-)
-
-# Create chat prompt
-chat_prompt = client.create_prompt(
-  name: "conversation",
-  type: :chat,
-  prompt: [
-    { role: "system", content: "You are a helpful assistant." },
-    { role: "user", content: "{{user_message}}" }
-  ],
-  labels: ["staging"]
-)
-
-# Create chat prompt with placeholders
-chat_prompt = client.create_prompt(
-  name: "rag-pipeline",
-  type: :chat,
-  prompt: [
-    { role: "system", content: "You are a helpful assistant." },
-    { type: "placeholder", name: "examples" },
-    { role: "user", content: "{{user_question}}" }
-  ]
-)
-```
-
-### Update Prompt
-
-```ruby
-# Promote version to production
-client.update_prompt(
-  name: "greeting",
-  version: 3,
-  labels: ["production", "stable"]
-)
-
-# Tag for A/B testing
-client.update_prompt(
-  name: "greeting",
-  version: 4,
-  labels: ["experiment-a"]
-)
-```
-
-### Two-Step: Get + Compile
-
-```ruby
-# Text prompt compilation
-text_prompt = client.get_prompt("greeting", type: :text)
-compiled_text = text_prompt.compile(
-  name: "Alice",
-  city: "San Francisco"
-)
-# => "Hello Alice from San Francisco!"
-
-# Chat prompt compilation
-chat_prompt = client.get_prompt("conversation", type: :chat)
-compiled_messages = chat_prompt.compile(
-  { user_name: "Alice" },
-  {
-    examples: [
-      { role: "user", content: "What's the weather?" },
-      { role: "assistant", content: "Let me check for you." }
-    ]
-  }
-)
-# => [
-#   { role: "system", content: "You are a helpful assistant." },
-#   { role: "user", content: "What's the weather?" },
-#   { role: "assistant", content: "Let me check for you." },
-#   { role: "user", content: "Alice's message" }
-# ]
-```
-
-### LangChain Integration
-
-```ruby
-# Text prompt to LangChain
-text_prompt = client.prompt.get("greeting", type: :text)
-langchain_template = text_prompt.to_langchain
-# => "Hello {name} from {city}!"
-
-# Chat prompt to LangChain
-chat_prompt = client.prompt.get("conversation", type: :chat)
-langchain_messages = chat_prompt.to_langchain
-# => [
-#   { role: "system", content: "You are a helpful assistant." },
-#   ["placeholder", "{examples}"],
-#   { role: "user", content: "{user_message}" }
-# ]
-```
-
-### Ruby Idioms
-
-```ruby
-# Graceful error handling with fallback (recommended)
-prompt = client.get_prompt("greeting",
-  fallback: "Hello {{name}}!",
-  type: :text
-)
-# Always succeeds - returns fallback on error
-
-# Or: Traditional exception handling
-begin
-  prompt = client.get_prompt("greeting")
-rescue Langfuse::NotFoundError => e
-  Rails.logger.error("Prompt not found: #{e.message}")
-  # Handle error
-end
-
-# Rails integration with global client
-class AiService
-  def initialize
-    @langfuse = Langfuse.client  # Global singleton
-  end
-
-  def generate_greeting(user)
-    # One-step compile with fallback
-    text = @langfuse.compile_prompt("greeting",
-      variables: { name: user.name, city: user.city },
-      fallback: "Hello {{name}} from {{city}}!",
-      type: :text
-    )
-
-    # Use with OpenAI, Anthropic, etc.
-    OpenAI::Client.new.chat(
-      parameters: {
-        model: "gpt-4",
-        messages: [{ role: "user", content: text }]
-      }
-    )
-  end
-end
-```
-
----
-
-## Caching Strategy
-
-### Cache Behavior
-
-The caching system implements **stale-while-revalidate** pattern for optimal performance:
-
-```
-┌─────────────────────────────────────────────────────────────┐
-│              Cache State Transitions                         │
-└─────────────────────────────────────────────────────────────┘
-
-[MISS] ──fetch──> [FRESH] ──60s──> [EXPIRED]
-                     │                 │
-                     │                 │
-                     │                 └──background refresh──> [FRESH]
-                     │                          │
-                     └──────return stale────────┘
-
-
-State: MISS
-- No cache entry exists
-- Fetch immediately from API
-- Block until response received
-- Store in cache with TTL
-
-State: FRESH
-- Cache entry exists and not expired
-- Return immediately from cache
-- No API call made
-- Best performance (<1ms)
-
-State: EXPIRED
-- Cache entry exists but expired
-- Return stale cache immediately
-- Trigger background refresh (async)
-- Next request will use fresh data
-- Ensures fast response times
-```
-
-### Cache Key Generation
-
-```ruby
-# Format: "name-{version|label}:value"
-
-# Latest production (default)
-create_key(name: "greeting")
-# => "greeting-label:production"
-
-# Specific version
-create_key(name: "greeting", version: 2)
-# => "greeting-version:2"
-
-# Specific label
-create_key(name: "greeting", label: "staging")
-# => "greeting-label:staging"
-```
-
-### TTL Configuration
-
-```ruby
-# Default TTL: 60 seconds
-prompt = client.prompt.get("greeting")
-
-# Custom TTL: 5 minutes
-prompt = client.prompt.get("greeting", cache_ttl_seconds: 300)
-
-# Disable caching
-prompt = client.prompt.get("greeting", cache_ttl_seconds: 0)
-
-# Very long TTL for stable prompts
-prompt = client.prompt.get("greeting", cache_ttl_seconds: 3600)
-```
-
-### Thread Safety
-
-All cache operations are thread-safe using `Mutex`:
-
-```ruby
-class PromptCache
-  def initialize
-    @cache = {}
-    @mutex = Mutex.new
-    @refreshing_keys = {}
-  end
-
-  def get_including_expired(key)
-    @mutex.synchronize { @cache[key] }
-  end
-
-  def set(key, value, ttl)
-    @mutex.synchronize do
-      @cache[key] = CacheItem.new(value, ttl)
-    end
-  end
-
-  def refreshing?(key)
-    @mutex.synchronize { @refreshing_keys.key?(key) }
-  end
-end
-```
-
-### Invalidation
-
-Cache invalidation happens automatically on updates:
-
-```ruby
-# Update prompt labels
-client.prompt.update(name: "greeting", version: 3, labels: ["production"])
-
-# Cache automatically invalidated
-# All keys starting with "greeting" are removed
-# Next get("greeting") will fetch fresh data
-```
-
-### Rails.cache Integration (Phase 2)
-
-```ruby
-# Opt-in to distributed caching
-Langfuse.configure do |config|
-  config.cache_backend = :rails
-  config.cache_namespace = "langfuse_prompts"
-end
-
-# Implementation
-class PromptCache
-  def initialize(backend: :memory)
-    @backend = backend
-    @cache = backend == :rails ? Rails.cache : {}
-    @mutex = Mutex.new unless backend == :rails
-  end
-
-  def get_including_expired(key)
-    if @backend == :rails
-      Rails.cache.read(cache_key(key))
-    else
-      @mutex.synchronize { @cache[key] }
-    end
-  end
-
-  private
-
-  def cache_key(key)
-    "#{Langfuse.configuration.cache_namespace}:#{key}"
-  end
-end
-```
-
-**Trade-off: In-memory vs Rails.cache**
-
-| Aspect | In-memory | Rails.cache |
-|--------|-----------|-------------|
-| Speed | 0.01ms | 1-10ms (Redis) |
-| Shared across processes | No | Yes |
-| Memory usage | Per-process | Shared |
-| Ideal for | Single-server | Multi-server |
-| Default | ✓ Phase 1 | Phase 2 option |
-
----
-
-## REST API Integration
-
-### Langfuse API Endpoints
-
-```
-Base URL: https://cloud.langfuse.com
-Authentication: Basic Auth (public_key:secret_key)
-
-GET    /api/public/v2/prompts/{name}
-       ?version={version}
-       &label={label}
-
-POST   /api/public/v2/prompts
-
-PATCH  /api/public/v2/prompts/{name}/{version}
-```
-
-### HTTP Client: Faraday
-
-**Why Faraday?**
-
-1. **Industry Standard**: Most popular Ruby HTTP client
-2. **Middleware Support**: Easy to add logging, instrumentation
-3. **Adapter Agnostic**: Works with Net::HTTP, Patron, HTTPClient
-4. **Built-in Retry**: Exponential backoff for transient errors
-5. **Already Used**: Likely in Gemfile for other integrations
-
-**Alternative Considered: HTTParty**
-
-- Simpler API, but less flexible
-- No built-in retry middleware
-- Harder to instrument for observability
-
-### Request/Response Handling
-
-```ruby
-# GET Prompt
-GET /api/public/v2/prompts/greeting?label=production
-
-Response 200 OK:
-{
-  "name": "greeting",
-  "version": 3,
-  "type": "text",
-  "prompt": "Hello {{name}}!",
-  "config": { "temperature": 0.7 },
-  "labels": ["production"],
-  "tags": ["customer-facing"],
-  "createdAt": "2025-01-01T00:00:00Z",
-  "updatedAt": "2025-01-15T00:00:00Z"
-}
-
-# CREATE Prompt
-POST /api/public/v2/prompts
-Content-Type: application/json
-
-{
-  "name": "greeting",
-  "type": "text",
-  "prompt": "Hello {{name}}!",
-  "labels": ["staging"],
-  "config": {}
-}
-
-Response 201 Created:
-{
-  "name": "greeting",
-  "version": 1,
-  ...
-}
-
-# UPDATE Prompt
-PATCH /api/public/v2/prompts/greeting/3
-Content-Type: application/json
-
-{
-  "labels": ["production", "stable"]
-}
-
-Response 200 OK:
-{
-  "name": "greeting",
-  "version": 3,
-  "labels": ["production", "stable"],
-  ...
-}
-```
-
-### Error Handling
-
-```ruby
-module Langfuse
-  class Error < StandardError; end
-  class ApiError < Error; end
-  class UnauthorizedError < ApiError; end
-  class NotFoundError < ApiError; end
-  class RateLimitError < ApiError; end
-  class TimeoutError < ApiError; end
-
-  class ApiClient
-    def handle_response(response)
-      case response.status
-      when 200..299
-        symbolize_keys(response.body)
-      when 401
-        raise UnauthorizedError, "Invalid API credentials"
-      when 404
-        raise NotFoundError, "Resource not found: #{response.body}"
-      when 429
-        raise RateLimitError, "Rate limit exceeded. Retry after: #{response.headers['Retry-After']}"
-      when 500..599
-        raise ApiError, "Server error: #{response.status}"
-      else
-        raise ApiError, "Unexpected response: #{response.status}"
-      end
-    end
-  end
-end
-```
-
-### Retry Logic
-
-```ruby
-# Faraday retry middleware configuration
-connection = Faraday.new do |conn|
-  conn.request :retry,
-    max: 2,
-    interval: 0.5,
-    interval_randomness: 0.5,
-    backoff_factor: 2,
-    retry_statuses: [429, 500, 502, 503, 504],
-    retry_if: ->(env, exception) {
-      # Retry on network errors
-      exception.is_a?(Faraday::TimeoutError) ||
-      exception.is_a?(Faraday::ConnectionFailed)
-    }
-end
-
-# Exponential backoff:
-# Attempt 1: immediate
-# Attempt 2: 0.5s + random(0-0.25s)
-# Attempt 3: 1.0s + random(0-0.5s)
-```
-
-### Authentication
-
-```ruby
-# Basic Auth implementation
-class ApiClient
-  def initialize(public_key:, secret_key:, base_url:)
-    @public_key = public_key
-    @secret_key = secret_key
-    @base_url = base_url
-  end
-
-  private
-
-  def authorization_header
-    credentials = "#{@public_key}:#{@secret_key}"
-    "Basic #{Base64.strict_encode64(credentials)}"
-  end
-
-  def connection
-    @connection ||= Faraday.new(
-      url: @base_url,
-      headers: {
-        'Authorization' => authorization_header,
-        'User-Agent' => "langfuse-ruby/#{Langfuse::VERSION}",
-        'Content-Type' => 'application/json'
-      }
-    )
-  end
-end
-```
-
-### Timeout Configuration
-
-```ruby
-# Default timeout: 5 seconds
-client = Langfuse::Client.new(timeout: 5)
-
-# Per-request timeout override
-prompt = client.prompt.get(
-  "greeting",
-  fetch_timeout_ms: 2000  # 2 second timeout
-)
-
-# Implementation
-def get_prompt(name, timeout_seconds: nil, **options)
-  conn = connection.dup
-  conn.options.timeout = timeout_seconds if timeout_seconds
-
-  response = conn.get("/api/public/v2/prompts/#{name}") do |req|
-    req.params = options
-  end
-
-  handle_response(response)
-rescue Faraday::TimeoutError => e
-  raise Langfuse::TimeoutError, "Request timed out after #{timeout_seconds}s"
-end
-```
-
----
-
-## Variable Substitution
-
-### Templating Engine: Mustache
-
-**Why Mustache?**
-
-1. **Logic-less**: Simple, secure, no arbitrary code execution
-2. **Cross-language**: Same syntax as JavaScript SDK (consistency)
-3. **Ruby Gem**: Well-maintained `mustache` gem available
-4. **Familiar**: Widely used in Rails ecosystem
-
-**Alternative Considered: ERB**
-
-- Pro: Built into Ruby stdlib
-- Con: Allows Ruby code execution (security risk)
-- Con: Different syntax than JS SDK (inconsistent)
-
-### Text Prompt Compilation
-
-```ruby
-# Template
-"Hello {{name}} from {{city}}!"
-
-# Variables
-{ name: "Alice", city: "San Francisco" }
-
-# Compiled
-"Hello Alice from San Francisco!"
-
-# Implementation
-def compile(variables = {})
-  # Mustache expects string keys
-  Mustache.render(prompt, variables.transform_keys(&:to_s))
-end
-```
-
-### Chat Prompt Compilation
-
-```ruby
-# Template
-[
-  { role: "system", content: "You are helping {{user_name}}." },
-  { type: "placeholder", name: "examples" },
-  { role: "user", content: "{{user_question}}" }
-]
-
-# Variables + Placeholders
-variables = { user_name: "Alice", user_question: "What's the weather?" }
-placeholders = {
-  examples: [
-    { role: "user", content: "How are you?" },
-    { role: "assistant", content: "I'm great!" }
-  ]
-}
-
-# Compiled
-[
-  { role: "system", content: "You are helping Alice." },
-  { role: "user", content: "How are you?" },
-  { role: "assistant", content: "I'm great!" },
-  { role: "user", content: "What's the weather?" }
-]
-```
-
-### Placeholder Resolution
-
-```ruby
-def compile(variables = {}, placeholders = {})
-  messages = []
-
-  prompt.each do |item|
-    case item[:type]
-    when MESSAGE_TYPE_PLACEHOLDER
-      # Resolve placeholder
-      name = item[:name]
-      value = placeholders[name.to_sym] || placeholders[name]
-
-      if valid_messages?(value)
-        # Flatten array of messages
-        messages.concat(value)
-      elsif value.nil?
-        # Keep unresolved for debugging
-        messages << item
-      else
-        # Invalid type: stringify
-        messages << { role: "system", content: value.to_s }
-      end
-
-    when MESSAGE_TYPE_CHAT
-      # Regular message: apply Mustache
-      messages << {
-        role: item[:role],
-        content: Mustache.render(item[:content], variables.transform_keys(&:to_s))
-      }
-    end
-  end
-
-  messages
-end
-
-def valid_messages?(value)
-  value.is_a?(Array) &&
-    value.all? { |m| m.is_a?(Hash) && m.key?(:role) && m.key?(:content) }
-end
-```
-
-### Escaping and Security
-
-```ruby
-# Mustache escapes HTML by default
-# Disable escaping for plain text
-Mustache.escape = ->(text) { text }
-
-# Or use triple mustache for unescaped
-"Hello {{{user_input}}}!"  # No escaping
-
-# For chat prompts, always sanitize user input in variables
-def compile(variables = {}, placeholders = {})
-  # Sanitize string values to prevent injection and limit payload size
-  safe_variables = variables.transform_values do |v|
-    v.is_a?(String) ? sanitize(v) : v
-  end
-
-  # ... compilation logic
-end
-
-# Sanitize input to prevent control character injection and DoS attacks
-#
-# @param text [String] Input text to sanitize
-# @param max_length [Integer] Maximum allowed length (default: 10,000)
-# @return [String] Sanitized text
-#
-# Rationale for 10,000 char limit:
-# - Most LLM prompts are <5K tokens (~20K chars)
-# - Prevents memory exhaustion attacks
-# - Large enough for legitimate use cases
-# - Configurable via parameter if needed
-def sanitize(text, max_length: 10_000)
-  # Remove control characters (null bytes, escape sequences, etc.)
-  sanitized = text.gsub(/[\x00-\x1F\x7F]/, '')
-
-  # Truncate to prevent DoS
-  sanitized.length > max_length ? sanitized[0...max_length] : sanitized
-end
-```
-
-### LangChain Variable Transformation
-
-```ruby
-# Langfuse format: {{variable}}
-# LangChain format: {variable}
-
-def transform_to_langchain_variables(content)
-  # Simple regex replacement
-  content.gsub(/\{\{(\w+)\}\}/, '{\1}')
-end
-
-# Example
-transform_to_langchain_variables("Hello {{name}} from {{city}}!")
-# => "Hello {name} from {city}!"
-```
-
-### Edge Cases
-
-```ruby
-# Empty variables
-prompt.compile({})
-# => "Hello {{name}}!"  (unchanged)
-
-# Missing variables
-prompt.compile(name: "Alice")
-# => "Hello Alice from {{city}}!"
-
-# Extra variables
-prompt.compile(name: "Alice", city: "SF", unused: "value")
-# => "Hello Alice from SF!" (unused ignored)
-
-# Nil values
-prompt.compile(name: nil)
-# => "Hello  from {{city}}!"
-
-# Nested objects (not supported)
-prompt.compile(user: { name: "Alice" })
-# => "Hello {{name}}!"  (no nested access)
-```
-
----
-
-## Implementation Phases
-
-### Phase 1: Core Functionality (MVP)
-
-**Goal**: Basic prompt retrieval and caching
-
-**Scope**:
-- `PromptManager#get` with caching
-- `TextPromptClient` with `compile`
-- `ChatPromptClient` with `compile`
-- `PromptCache` (in-memory)
-- `ApiClient` extensions for GET /prompts
-
-**Deliverables**:
-- [ ] `Langfuse::PromptManager` class
-- [ ] `Langfuse::TextPromptClient` class
-- [ ] `Langfuse::ChatPromptClient` class
-- [ ] `Langfuse::PromptCache` class
-- [ ] `ApiClient#get_prompt` method
-- [ ] Basic error handling
-- [ ] Unit tests (>90% coverage)
-- [ ] Integration tests with VCR
-- [ ] Documentation and examples
-
-**Success Criteria**:
-- Can fetch and cache prompts
-- Can compile text and chat prompts
-- Thread-safe for Rails apps
-- <100ms cache hit latency
-
-**Estimated Effort**: 3-4 days (includes buffer for edge cases and thorough testing)
-
----
-
-### Phase 2: Advanced Features
-
-**Goal**: Prompt creation, updates, and advanced caching
-
-**Scope**:
-- `PromptManager#create`
-- `PromptManager#update`
-- Placeholder support for chat prompts
-- Rails.cache backend option
-- Cache invalidation on updates
-
-**Deliverables**:
-- [ ] `ApiClient#create_prompt` method
-- [ ] `ApiClient#update_prompt_version` method
-- [ ] Placeholder compilation in `ChatPromptClient`
-- [ ] Rails.cache adapter
-- [ ] Configuration for cache backend
-- [ ] Additional tests for new features
-
-**Success Criteria**:
-- Can create and update prompts
-- Placeholders work correctly
-- Rails.cache integration functional
-- No breaking changes
-
-**Estimated Effort**: 2-3 days
-
----
-
-### Phase 3: LangChain Integration
-
-**Goal**: Seamless LangChain compatibility
-
-**Scope**:
-- `TextPromptClient#to_langchain`
-- `ChatPromptClient#to_langchain`
-- LangChain MessagesPlaceholder format
-- Variable syntax transformation
-
-**Deliverables**:
-- [ ] LangChain format conversion methods
-- [ ] Tests for LangChain compatibility
-- [ ] Documentation with LangChain examples
-
-**Success Criteria**:
-- Outputs work with langchain-ruby gem
-- Variable syntax correctly transformed
-- Placeholders converted to MessagesPlaceholder
-
-**Estimated Effort**: 1 day
-
----
-
-### Phase 4: Polish and Optimization
-
-**Goal**: Production-ready quality
-
-**Scope**:
-- Performance optimization
-- Enhanced error messages
-- Observability hooks
-- Comprehensive documentation
-
-**Deliverables**:
-- [ ] Benchmarks and performance tests
-- [ ] Instrumentation for monitoring (StatsD, Datadog)
-- [ ] Detailed error messages with remediation hints
-- [ ] Complete API documentation
-- [ ] Migration guide from manual prompt management
-
-**Success Criteria**:
-- <10ms p95 latency for cache hits
-- Comprehensive error messages
-- Full documentation coverage
-
-**Estimated Effort**: 2-3 days (comprehensive observability and documentation)
-
----
-
-### Total Implementation Timeline
-
-**Estimated Total**: 8-11 days (2-2.5 weeks with 30% contingency buffer)
-
-**Phases can be deployed incrementally:**
-1. Phase 1 → Beta release for early adopters
-2. Phase 2 → Feature-complete release
-3. Phase 3 → LangChain integration (optional)
-4. Phase 4 → Production-ready v1.0
-
----
-
-## Testing Strategy
-
-### Unit Tests
-
-**Coverage Target**: >90%
-
-```ruby
-# spec/langfuse/prompt_manager_spec.rb
-RSpec.describe Langfuse::PromptManager do
-  let(:api_client) { instance_double(Langfuse::ApiClient) }
-  let(:manager) { described_class.new(api_client: api_client) }
-
-  describe "#get" do
-    context "with cache miss" do
-      it "fetches from API and caches result" do
-        allow(api_client).to receive(:get_prompt).and_return(prompt_response)
-
-        prompt = manager.get("greeting")
-
-        expect(prompt).to be_a(Langfuse::TextPromptClient)
-        expect(prompt.name).to eq("greeting")
-        expect(api_client).to have_received(:get_prompt).once
-      end
-    end
-
-    context "with cache hit" do
-      it "returns cached prompt without API call" do
-        manager.get("greeting")  # Prime cache
-
-        prompt = manager.get("greeting")
-
-        expect(api_client).to have_received(:get_prompt).once  # Only first call
-      end
-    end
-
-    context "with expired cache" do
-      it "returns stale cache and refreshes in background" do
-        # Test stale-while-revalidate
-      end
-    end
-
-    context "with fallback" do
-      it "returns fallback on API error" do
-        allow(api_client).to receive(:get_prompt).and_raise(Langfuse::ApiError)
-
-        prompt = manager.get("greeting", fallback: "Hello!", type: :text)
-
-        expect(prompt.is_fallback).to be true
-        expect(prompt.prompt).to eq("Hello!")
-      end
-    end
-  end
-
-  describe "#create" do
-    it "creates text prompt and returns client" do
-      allow(api_client).to receive(:create_prompt).and_return(created_response)
-
-      prompt = manager.create(
-        name: "greeting",
-        prompt: "Hello {{name}}!",
-        type: :text
-      )
-
-      expect(prompt).to be_a(Langfuse::TextPromptClient)
-    end
-  end
-
-  describe "#update" do
-    it "updates labels and invalidates cache" do
-      manager.get("greeting")  # Prime cache
-
-      manager.update(name: "greeting", version: 1, labels: ["production"])
-
-      # Cache should be invalidated
-      expect(manager.cache.get_including_expired("greeting-label:production")).to be_nil
-    end
-  end
-end
-
-# spec/langfuse/text_prompt_client_spec.rb
-RSpec.describe Langfuse::TextPromptClient do
-  let(:response) do
-    {
-      name: "greeting",
-      version: 1,
-      type: "text",
-      prompt: "Hello {{name}} from {{city}}!",
-      config: {},
-      labels: ["production"],
-      tags: []
-    }
-  end
-  let(:client) { described_class.new(response) }
-
-  describe "#compile" do
-    it "substitutes variables" do
-      result = client.compile(name: "Alice", city: "SF")
-      expect(result).to eq("Hello Alice from SF!")
-    end
-
-    it "handles missing variables" do
-      result = client.compile(name: "Alice")
-      expect(result).to eq("Hello Alice from {{city}}!")
-    end
-
-    it "accepts string keys" do
-      result = client.compile("name" => "Alice", "city" => "SF")
-      expect(result).to eq("Hello Alice from SF!")
-    end
-  end
-
-  describe "#to_langchain" do
-    it "transforms mustache to langchain syntax" do
-      result = client.to_langchain
-      expect(result).to eq("Hello {name} from {city}!")
-    end
-  end
-
-  describe "#to_json" do
-    it "serializes to JSON" do
-      json = JSON.parse(client.to_json)
-      expect(json["name"]).to eq("greeting")
-      expect(json["type"]).to eq("text")
-    end
-  end
-end
-
-# spec/langfuse/chat_prompt_client_spec.rb
-RSpec.describe Langfuse::ChatPromptClient do
-  let(:response) do
-    {
-      name: "conversation",
-      version: 1,
-      type: "chat",
-      prompt: [
-        { type: "chatmessage", role: "system", content: "You are {{role}}." },
-        { type: "placeholder", name: "examples" },
-        { type: "chatmessage", role: "user", content: "{{question}}" }
-      ],
-      config: {},
-      labels: [],
-      tags: []
-    }
-  end
-  let(:client) { described_class.new(response) }
-
-  describe "#compile" do
-    it "substitutes variables and resolves placeholders" do
-      result = client.compile(
-        { role: "a helper", question: "What?" },
-        {
-          examples: [
-            { role: "user", content: "Hi" },
-            { role: "assistant", content: "Hello!" }
-          ]
-        }
-      )
-
-      expect(result).to eq([
-        { role: "system", content: "You are a helper." },
-        { role: "user", content: "Hi" },
-        { role: "assistant", content: "Hello!" },
-        { role: "user", content: "What?" }
-      ])
-    end
-
-    it "keeps unresolved placeholders" do
-      result = client.compile({ role: "a helper", question: "What?" })
-
-      expect(result[1]).to eq({ type: "placeholder", name: "examples" })
-    end
-  end
-
-  describe "#to_langchain" do
-    it "converts to langchain format" do
-      result = client.to_langchain
-
-      expect(result).to eq([
-        { role: "system", content: "You are {role}." },
-        ["placeholder", "{examples}"],
-        { role: "user", content: "{question}" }
-      ])
-    end
-  end
-end
-
-# spec/langfuse/prompt_cache_spec.rb
-RSpec.describe Langfuse::PromptCache do
-  let(:cache) { described_class.new }
-  let(:prompt) { instance_double(Langfuse::TextPromptClient) }
-
-  describe "#set and #get_including_expired" do
-    it "stores and retrieves values" do
-      cache.set("key", prompt, 60)
-
-      item = cache.get_including_expired("key")
-
-      expect(item.value).to eq(prompt)
-      expect(item.expired?).to be false
-    end
-
-    it "marks items as expired after TTL" do
-      cache.set("key", prompt, 0)  # Immediate expiry
-
-      sleep 0.01
-      item = cache.get_including_expired("key")
-
-      expect(item.expired?).to be true
-      expect(item.value).to eq(prompt)  # Still returns value
-    end
-  end
-
-  describe "#create_key" do
-    it "generates key with default label" do
-      key = cache.create_key(name: "greeting")
-      expect(key).to eq("greeting-label:production")
-    end
-
-    it "generates key with version" do
-      key = cache.create_key(name: "greeting", version: 2)
-      expect(key).to eq("greeting-version:2")
-    end
-
-    it "generates key with custom label" do
-      key = cache.create_key(name: "greeting", label: "staging")
-      expect(key).to eq("greeting-label:staging")
-    end
-  end
-
-  describe "#invalidate" do
-    it "removes all keys for prompt name" do
-      cache.set("greeting-label:production", prompt, 60)
-      cache.set("greeting-version:2", prompt, 60)
-      cache.set("other-label:production", prompt, 60)
-
-      cache.invalidate("greeting")
-
-      expect(cache.get_including_expired("greeting-label:production")).to be_nil
-      expect(cache.get_including_expired("greeting-version:2")).to be_nil
-      expect(cache.get_including_expired("other-label:production")).not_to be_nil
-    end
-  end
-
-  describe "thread safety" do
-    it "handles concurrent access" do
-      threads = 10.times.map do
-        Thread.new do
-          100.times { |i| cache.set("key-#{i}", prompt, 60) }
-        end
-      end
-
-      threads.each(&:join)
-
-      # Should not raise or corrupt data
-      expect(cache.get_including_expired("key-0")).not_to be_nil
-    end
-  end
-
-  describe "cache stampede protection" do
-    it "prevents duplicate background refreshes" do
-      # Simulate 100 threads hitting expired cache simultaneously
-      expired_key = "expired-prompt"
-      cache.set(expired_key, prompt, 0) # Immediately expired
-      sleep 0.01
-
-      refresh_count = Concurrent::AtomicFixnum.new(0)
-      allow(manager).to receive(:fetch_and_cache) do
-        refresh_count.increment
-      end
-
-      threads = 100.times.map do
-        Thread.new { manager.get("expired-prompt") }
-      end
-
-      threads.each(&:join)
-
-      # Should only trigger 1 background refresh, not 100
-      expect(refresh_count.value).to eq(1)
-    end
-  end
-
-  describe "cache expiry edge cases" do
-    it "handles expiry exactly at read time" do
-      cache.set("key", prompt, 0.1) # 100ms TTL
-      sleep 0.1 # Expire exactly now
-
-      item = cache.get_including_expired("key")
-      expect(item).not_to be_nil
-      expect(item.expired?).to be true
-    end
-  end
-
-  describe "LRU eviction" do
-    it "evicts least recently used when at capacity" do
-      small_cache = described_class.new(max_size: 3)
-
-      # Fill cache
-      small_cache.set("key1", prompt, 60)
-      small_cache.set("key2", prompt, 60)
-      small_cache.set("key3", prompt, 60)
-
-      # Access key1 to make it recently used
-      small_cache.get_including_expired("key1")
-
-      # Add key4 - should evict key2 (LRU)
-      small_cache.set("key4", prompt, 60)
-
-      expect(small_cache.get_including_expired("key1")).not_to be_nil
-      expect(small_cache.get_including_expired("key2")).to be_nil
-      expect(small_cache.get_including_expired("key3")).not_to be_nil
-      expect(small_cache.get_including_expired("key4")).not_to be_nil
-    end
-  end
-end
-```
-
-### Integration Tests with VCR
-
-```ruby
-# spec/integration/prompt_manager_integration_spec.rb
-RSpec.describe "Prompt Manager Integration", vcr: true do
-  let(:client) do
-    Langfuse::Client.new(
-      public_key: ENV["LANGFUSE_PUBLIC_KEY"],
-      secret_key: ENV["LANGFUSE_SECRET_KEY"],
-      base_url: "https://cloud.langfuse.com"
-    )
-  end
-
-  describe "fetching prompts" do
-    it "retrieves text prompt from API", vcr: { cassette_name: "get_text_prompt" } do
-      prompt = client.prompt.get("greeting")
-
-      expect(prompt).to be_a(Langfuse::TextPromptClient)
-      expect(prompt.name).to eq("greeting")
-      expect(prompt.version).to be > 0
-    end
-
-    it "retrieves chat prompt from API", vcr: { cassette_name: "get_chat_prompt" } do
-      prompt = client.prompt.get("conversation", type: :chat)
-
-      expect(prompt).to be_a(Langfuse::ChatPromptClient)
-      expect(prompt.prompt).to be_an(Array)
-    end
-  end
-
-  describe "creating prompts" do
-    it "creates new text prompt", vcr: { cassette_name: "create_text_prompt" } do
-      prompt = client.prompt.create(
-        name: "test-#{SecureRandom.hex(4)}",
-        prompt: "Test {{variable}}",
-        type: :text
-      )
-
-      expect(prompt.version).to eq(1)
-    end
-  end
-
-  describe "updating prompts" do
-    it "updates prompt labels", vcr: { cassette_name: "update_prompt" } do
-      result = client.prompt.update(
-        name: "greeting",
-        version: 1,
-        labels: ["test"]
-      )
-
-      expect(result[:labels]).to include("test")
-    end
-  end
-end
-```
-
-### Performance Tests
-
-```ruby
-# spec/performance/caching_performance_spec.rb
-RSpec.describe "Caching Performance" do
-  let(:manager) { Langfuse::PromptManager.new(api_client: api_client) }
-  let(:api_client) { instance_double(Langfuse::ApiClient) }
-
-  before do
-    allow(api_client).to receive(:get_prompt).and_return(prompt_response)
-  end
-
-  it "cache hits are <1ms" do
-    manager.get("greeting")  # Prime cache
-
-    time = Benchmark.realtime do
-      100.times { manager.get("greeting") }
-    end
-
-    avg_time = (time / 100) * 1000  # Convert to ms
-    expect(avg_time).to be < 1
-  end
-
-  it "handles 1000 concurrent requests" do
-    threads = 1000.times.map do
-      Thread.new { manager.get("greeting") }
-    end
-
-    expect { threads.each(&:join) }.not_to raise_error
-  end
-end
-```
-
-### Test Coverage Requirements
-
-| Component | Coverage Target |
-|-----------|----------------|
-| PromptManager | >95% |
-| TextPromptClient | >95% |
-| ChatPromptClient | >95% |
-| PromptCache | >95% |
-| ApiClient | >90% |
-| **Overall** | **>90%** |
-
----
-
-## Dependencies
-
-### Required Gems
-
-```ruby
-# langfuse-ruby.gemspec
-Gem::Specification.new do |spec|
-  spec.name = "langfuse-ruby"
-  spec.version = "0.2.0"
-  spec.authors = ["Langfuse"]
-  spec.summary = "Ruby SDK for Langfuse"
-
-  # Runtime dependencies
-  spec.add_dependency "faraday", "~> 2.0"
-  spec.add_dependency "faraday-retry", "~> 2.0"
-  spec.add_dependency "mustache", "~> 1.1"
-  spec.add_dependency "concurrent-ruby", "~> 1.2"
-
-  # Development dependencies
-  spec.add_development_dependency "rspec", "~> 3.12"
-  spec.add_development_dependency "vcr", "~> 6.1"
-  spec.add_development_dependency "webmock", "~> 3.18"
-  spec.add_development_dependency "rubocop", "~> 1.50"
-  spec.add_development_dependency "simplecov", "~> 0.22"
-end
-```
-
-### Dependency Justification
-
-| Gem | Purpose | Why? |
-|-----|---------|------|
-| `faraday` | HTTP client | Industry standard, flexible, middleware support |
-| `faraday-retry` | Retry logic | Exponential backoff, transient error handling |
-| `mustache` | Templating | Logic-less, same as JS SDK, security |
-| `concurrent-ruby` | Thread pool | Bounded concurrency for background refreshes, prevents thread exhaustion |
-| `rspec` | Testing | Ruby standard, readable syntax |
-| `vcr` | HTTP recording | Record real API responses for tests |
-| `webmock` | HTTP stubbing | Mock HTTP for isolated tests |
-| `rubocop` | Linting | Code quality, style enforcement |
-| `simplecov` | Coverage | Track test coverage metrics |
-
-### Optional Dependencies
-
-```ruby
-# Optional: Rails integration for testing
-spec.add_development_dependency "rails", ">= 6.0" if ENV["RAILS_VERSION"]
-```
-
-### Version Constraints
-
-- **Ruby**: >= 2.7 (modern syntax, better performance)
-- **Faraday**: ~> 2.0 (latest stable, HTTP/2 support)
-- **Mustache**: ~> 1.1 (last updated 2016, but stable and widely used; logic-less design means few updates needed)
-- **Concurrent-Ruby**: ~> 1.2 (actively maintained, production-ready thread primitives)
-
-### Gemfile.lock Considerations
-
-- Pin exact versions in CI for reproducibility
-- Use pessimistic versioning (`~>`) for flexibility
-- Test against multiple Ruby versions (2.7, 3.0, 3.1, 3.2)
-
----
-
-## Code Examples
-
-### Basic Usage
-
-```ruby
-require "langfuse"
-
-# Configure globally
-Langfuse.configure do |config|
-  config.public_key = ENV["LANGFUSE_PUBLIC_KEY"]
-  config.secret_key = ENV["LANGFUSE_SECRET_KEY"]
-end
-
-# Get global client
-client = Langfuse.client
-
-# Get a text prompt (two-step)
-prompt = client.get_prompt("greeting")
-compiled = prompt.compile(name: "Alice", city: "San Francisco")
-puts compiled
-# => "Hello Alice from San Francisco!"
-
-# Or: One-step convenience method
-text = client.compile_prompt("greeting",
-  variables: { name: "Alice", city: "San Francisco" }
-)
-puts text
-
-# Get a chat prompt
-chat_prompt = client.get_prompt("conversation", type: :chat)
-messages = chat_prompt.compile(
-  { user_name: "Alice" },
-  {
-    history: [
-      { role: "user", content: "Hi!" },
-      { role: "assistant", content: "Hello!" }
-    ]
-  }
-)
-```
-
-### Rails Integration
-
-```ruby
-# config/initializers/langfuse.rb
-Langfuse.configure do |config|
-  config.public_key = ENV["LANGFUSE_PUBLIC_KEY"]
-  config.secret_key = ENV["LANGFUSE_SECRET_KEY"]
-  config.base_url = ENV.fetch("LANGFUSE_BASE_URL", "https://cloud.langfuse.com")
-  config.cache_ttl = 120  # 2 minutes
-  config.logger = Rails.logger
-end
-
-# app/services/ai_greeting_service.rb
-class AiGreetingService
-  def initialize
-    @langfuse = Langfuse.client  # Global singleton
-  end
-
-  def generate_greeting(user)
-    # Fetch and compile in one step with fallback
-    compiled = @langfuse.compile_prompt("user-greeting",
-      variables: {
-        name: user.name,
-        city: user.city,
-        subscription: user.subscription_tier
-      },
-      fallback: "Hello {{name}}!",
-      type: :text
-    )
-
-    # Get prompt config for temperature
-    prompt = @langfuse.get_prompt("user-greeting")
-    temperature = prompt.config[:temperature] || 0.7
-
-    # Call OpenAI
-    response = openai_client.chat(
-      parameters: {
-        model: "gpt-4",
-        messages: [{ role: "user", content: compiled }],
-        temperature: temperature
-      }
-    )
-
-    # Trace with Langfuse
-    @langfuse.trace(name: "greeting-generation") do |trace|
-      trace.generation(
-        name: "openai-call",
-        input: compiled,
-        output: response.dig("choices", 0, "message", "content"),
-        model: "gpt-4",
-        metadata: { user_id: user.id }
-      )
-    end
-
-    response.dig("choices", 0, "message", "content")
-  end
-
-  private
-
-  def openai_client
-    @openai_client ||= OpenAI::Client.new(access_token: ENV["OPENAI_API_KEY"])
-  end
-end
-```
-
-### Chat Prompt with Placeholders
-
-```ruby
-# Create a RAG prompt with placeholders
-client.create_prompt(
-  name: "rag-qa",
-  type: :chat,
-  prompt: [
-    {
-      role: "system",
-      content: "You are a helpful assistant. Use the context to answer questions."
-    },
-    {
-      type: "placeholder",
-      name: "context_documents"
-    },
-    {
-      role: "user",
-      content: "{{user_question}}"
-    }
-  ],
-  labels: ["production"]
-)
-
-# Later: compile with dynamic context (two-step)
-prompt = client.get_prompt("rag-qa", type: :chat)
-messages = prompt.compile(
-  { user_question: "What is the capital of France?" },
-  {
-    context_documents: [
-      { role: "system", content: "Context: France is a country in Europe." },
-      { role: "system", content: "Context: Paris is the capital of France." }
-    ]
-  }
-)
-
-# Or: compile in one step
-messages = client.compile_prompt("rag-qa",
-  variables: { user_question: "What is the capital of France?" },
-  placeholders: {
-    context_documents: [
-      { role: "system", content: "Context: France is a country in Europe." },
-      { role: "system", content: "Context: Paris is the capital of France." }
-    ]
-  },
-  type: :chat
-)
-
-# Result:
-# [
-#   { role: "system", content: "You are a helpful assistant..." },
-#   { role: "system", content: "Context: France is a country..." },
-#   { role: "system", content: "Context: Paris is the capital..." },
-#   { role: "user", content: "What is the capital of France?" }
-# ]
-```
-
-### Error Handling
-
-```ruby
-# Graceful degradation with fallback (RECOMMENDED)
-# Never raises - returns fallback on any error
-prompt = client.get_prompt("greeting",
-  fallback: "Hello {{name}}!",
-  type: :text
-)
-
-# Traditional exception handling
-begin
-  prompt = client.get_prompt("greeting")
-rescue Langfuse::NotFoundError => e
-  Rails.logger.error("Prompt not found: #{e.message}")
-  # Fallback logic here
-rescue Langfuse::ApiError => e
-  Rails.logger.error("Langfuse API error: #{e.message}")
-  # Handle error
-end
-
-# Retry with exponential backoff
-require "retryable"
-
-Retryable.retryable(
-  tries: 3,
-  on: [Langfuse::TimeoutError, Langfuse::RateLimitError],
-  sleep: ->(n) { 2**n }  # 2s, 4s, 8s
-) do
-  prompt = client.get_prompt("greeting")
-end
-```
-
-### Testing with Mocks
-
-```ruby
-# spec/services/ai_greeting_service_spec.rb
-RSpec.describe AiGreetingService do
-  let(:langfuse_client) { instance_double(Langfuse::Client) }
-  let(:prompt) do
-    instance_double(
-      Langfuse::TextPromptClient,
-      compile: "Hello Alice from SF!",
-      config: { temperature: 0.7 }
-    )
-  end
-
-  before do
-    # Mock global client
-    allow(Langfuse).to receive(:client).and_return(langfuse_client)
-
-    # Mock compile_prompt for one-step usage
-    allow(langfuse_client).to receive(:compile_prompt)
-      .and_return("Hello Alice from SF!")
-
-    # Mock get_prompt for two-step usage
-    allow(langfuse_client).to receive(:get_prompt).and_return(prompt)
-  end
-
-  it "generates personalized greeting" do
-    service = described_class.new
-    user = create(:user, name: "Alice", city: "SF")
-
-    greeting = service.generate_greeting(user)
-
-    expect(greeting).to be_present
-    expect(langfuse_client).to have_received(:compile_prompt)
-      .with("user-greeting", hash_including(variables: hash_including(name: "Alice")))
-  end
-end
-```
-
-### LangChain Integration
-
-```ruby
-require "langchain"
-
-# Fetch prompt from Langfuse
-prompt = client.prompt.get("greeting", type: :text)
-
-# Convert to LangChain format
-langchain_template = prompt.to_langchain
-# => "Hello {name} from {city}!"
-
-# Use with LangChain
-llm = Langchain::LLM::OpenAI.new(api_key: ENV["OPENAI_API_KEY"])
-prompt_template = Langchain::Prompt::PromptTemplate.new(
-  template: langchain_template,
-  input_variables: ["name", "city"]
-)
-
-result = llm.complete(
-  prompt: prompt_template.format(name: "Alice", city: "SF")
-)
-
-# Chat prompts with LangChain
-chat_prompt = client.prompt.get("conversation", type: :chat)
-langchain_messages = chat_prompt.to_langchain(
-  placeholders: {
-    history: [
-      { role: "user", content: "Hi" },
-      { role: "assistant", content: "Hello!" }
-    ]
-  }
-)
-
-# Use with ChatOpenAI
-chat_model = Langchain::LLM::OpenAIChat.new(api_key: ENV["OPENAI_API_KEY"])
-response = chat_model.chat(messages: langchain_messages)
-```
-
----
-
-## Migration Strategy
-
-### For Existing Langfuse Users
-
-**Before (Manual Prompt Management)**:
-
-```ruby
-# Hardcoded prompts
-def greeting_prompt(user)
-  "Hello #{user.name}! Welcome to our service."
-end
-
-# Or: stored in database
-class Prompt < ApplicationRecord
-  def compile(variables)
-    content.gsub(/\{\{(\w+)\}\}/) { variables[$1.to_sym] }
-  end
-end
-```
-
-**After (Langfuse Prompt Management)**:
-
-```ruby
-# Centralized in Langfuse
-def greeting_prompt(user)
-  # Option 1: Two-step
-  prompt = @langfuse.get_prompt("user-greeting")
-  prompt.compile(name: user.name, tier: user.tier)
-
-  # Option 2: One-step with fallback
-  @langfuse.compile_prompt("user-greeting",
-    variables: { name: user.name, tier: user.tier },
-    fallback: "Hello {{name}}!",
-    type: :text
-  )
-end
-```
-
-### Migration Steps
-
-1. **Install Updated Gem**:
-   ```ruby
-   # Gemfile
-   gem "langfuse-ruby", "~> 0.2.0"
-   ```
-
-2. **Add Global Configuration**:
-   ```ruby
-   # config/initializers/langfuse.rb
-   Langfuse.configure do |config|
-     config.public_key = ENV['LANGFUSE_PUBLIC_KEY']
-     config.secret_key = ENV['LANGFUSE_SECRET_KEY']
-     config.cache_ttl = 120
-     config.logger = Rails.logger
-   end
-   ```
-
-3. **Create Prompts in Langfuse**:
-   ```ruby
-   # scripts/migrate_prompts.rb
-   client = Langfuse.client
-
-   # Migrate each hardcoded prompt
-   client.create_prompt(
-     name: "user-greeting",
-     prompt: "Hello {{name}}! Welcome to {{tier}} tier.",
-     type: :text,
-     labels: ["production"]
-   )
-   ```
-
-4. **Update Application Code**:
-   ```ruby
-   # Before
-   - greeting = "Hello #{user.name}!"
-
-   # After (two-step)
-   + prompt = Langfuse.client.get_prompt("user-greeting")
-   + greeting = prompt.compile(name: user.name)
-
-   # Or after (one-step with fallback)
-   + greeting = Langfuse.client.compile_prompt("user-greeting",
-   +   variables: { name: user.name },
-   +   fallback: "Hello {{name}}!",
-   +   type: :text
-   + )
-   ```
-
-5. **Test with Fallbacks**:
-   ```ruby
-   # Safe rollout with fallback
-   prompt = Langfuse.client.get_prompt("user-greeting",
-     fallback: "Hello {{name}}!",  # Old hardcoded version
-     type: :text
-   )
-   ```
-
-6. **Monitor and Iterate**:
-   - Check Langfuse dashboard for prompt usage
-   - A/B test new prompt versions
-   - Update prompts without code deployment
-
-### Backward Compatibility
-
-**No breaking changes to existing API**:
-
-```ruby
-# Existing tracing still works
-client = Langfuse.client
-client.trace(name: "my-trace") do |trace|
-  trace.generation(name: "llm-call", input: "test")
-end
-
-# NEW prompt management is additive (flattened API)
-client.get_prompt("greeting")
-client.compile_prompt("greeting", variables: { name: "Alice" })
-```
-
-### Rollback Plan
-
-If issues arise:
-
-1. **Use Fallbacks**: All prompts have fallback option
-2. **Disable Caching**: Set `cache_ttl_seconds: 0`
-3. **Revert to Hardcoded**: Fallback to old prompt logic
-4. **Pin Old Version**: `gem "langfuse-ruby", "~> 0.1.4"`
-
----
-
-## Trade-offs and Alternatives
-
-### Design Decision Matrix
-
-| Decision | Chosen Approach | Alternative | Trade-off |
-|----------|----------------|-------------|-----------|
-| **Templating** | Mustache | ERB | Security vs. convenience |
-| **HTTP Client** | Faraday | HTTParty | Flexibility vs. simplicity |
-| **Caching** | In-memory | Rails.cache | Speed vs. distribution |
-| **Thread Safety** | Mutex | Thread-local | Simplicity vs. performance |
-| **API Style** | Keyword args | Positional | Readability vs. brevity |
-
-### 1. Templating: Mustache vs. ERB
-
-**Chosen: Mustache**
-
-- **Pro**: Logic-less, secure, cross-SDK consistency
-- **Pro**: No arbitrary code execution risk
-- **Con**: No conditionals or loops (must be done in code)
-
-**Alternative: ERB**
-
-- **Pro**: Built-in, no dependency
-- **Pro**: Full Ruby power (conditionals, loops)
-- **Con**: Security risk (code injection)
-- **Con**: Different from JS SDK
-
-**Decision Rationale**: Security and consistency outweigh convenience.
-
-### 2. Caching: In-memory vs. Rails.cache
-
-**Chosen: In-memory (Phase 1), Rails.cache optional (Phase 2)**
-
-**In-memory**:
-- **Pro**: Extremely fast (<1ms)
-- **Pro**: No external dependencies
-- **Con**: Not shared across processes
-- **Con**: Higher memory per-process
-
-**Rails.cache (Redis)**:
-- **Pro**: Shared across all processes/servers
-- **Pro**: Centralized cache management
-- **Con**: 10-100x slower than in-memory
-- **Con**: Requires Redis dependency
-
-**Decision Rationale**: Start with simplest approach, add distribution as opt-in.
-
-### 3. Thread Safety: Mutex vs. Thread-local
-
-**Chosen: Mutex**
-
-- **Pro**: Simple, proven approach
-- **Pro**: Shared cache across threads
-- **Con**: Lock contention under high load
-
-**Alternative: Thread-local Storage**
-
-- **Pro**: No locking, faster
-- **Con**: Duplicated cache per thread
-- **Con**: Higher memory usage
-
-**Decision Rationale**: Rails apps typically have limited threads per process, mutex overhead acceptable.
-
-### 4. Async Refresh: Threads vs. Fibers
-
-**Chosen: Threads (Phase 1), consider Fibers (Phase 2)**
-
-**Threads**:
-- **Pro**: Built-in, familiar
-- **Con**: Heavier weight
-
-**Fibers**:
-- **Pro**: Lightweight concurrency
-- **Pro**: Better for high-concurrency scenarios
-- **Con**: Requires Ruby 3.0+
-- **Con**: Less familiar to developers
-
-**Decision Rationale**: Threads are sufficient for MVP, evaluate Fibers based on real-world performance.
-
-### 5. API Style: Keyword Args vs. Options Hash
-
-**Chosen: Keyword Arguments**
-
-```ruby
-# Keyword args (chosen)
-prompt.get("name", version: 2, label: "production")
-
-# Options hash (alternative)
-prompt.get("name", { version: 2, label: "production" })
-```
-
-**Rationale**: Keyword args provide better IDE autocomplete and explicit API.
-
----
-
-## Open Questions
-
-### 1. Rails.cache Integration Priority
-
-**Question**: Should Rails.cache integration be Phase 1 or Phase 2?
-
-**Options**:
-- **A**: Phase 1 - Implement both backends from start
-- **B**: Phase 2 - Start simple with in-memory
-
-**Recommendation**: **Phase 2**
-- Rationale: In-memory is sufficient for most use cases, easier to test, faster to ship MVP
-
-### 2. Async Background Refresh
-
-**Question**: How to implement background refresh in stale-while-revalidate?
-
-**Options**:
-- **A**: Simple threads (`Thread.new { ... }`)
-- **B**: Sidekiq jobs (requires Sidekiq dependency)
-- **C**: Fibers (requires Ruby 3.0+)
-
-**Recommendation**: **Simple threads**
-- Rationale: No additional dependencies, sufficient for prompt refresh use case
-
-### 3. Prompt Validation
-
-**Question**: Should we validate prompt structure before sending to API?
-
-**Options**:
-- **A**: Client-side validation (check required fields)
-- **B**: Rely on API validation (simpler)
-
-**Recommendation**: **API validation**
-- Rationale: Avoid duplicating server logic, API is source of truth
-
-### 4. LangChain Dependency
-
-**Question**: Should we depend on `langchain-ruby` gem for `to_langchain` methods?
-
-**Options**:
-- **A**: Hard dependency (import LangChain types)
-- **B**: Soft dependency (return plain Ruby hashes)
-- **C**: Optional dependency (only load if available)
-
-**Recommendation**: **Soft dependency**
-- Rationale: Return plain hashes that work with LangChain without requiring the gem
-
-### 5. Observability Hooks
-
-**Question**: What observability should be built-in?
-
-**Options**:
-- **A**: Logging only (simple)
-- **B**: StatsD metrics (cache hits, API latency)
-- **C**: OpenTelemetry traces (full observability)
-
-**Recommendation**: **Logging + StatsD hooks**
-- Rationale: Logging is essential, StatsD is common in Rails, OpenTelemetry can be Phase 3
-
-### 6. Configuration Pattern
-
-**Question**: Global config vs. per-client config?
-
-**Options**:
-- **A**: Global: `Langfuse.configure { |c| ... }`
-- **B**: Per-client: `Langfuse::Client.new(config)`
-- **C**: Both (global defaults, per-client overrides)
-
-**Recommendation**: **Both**
-- Rationale: Global config for Rails initializer, per-client for multi-tenant apps
-
-```ruby
-# Global config
-Langfuse.configure do |config|
-  config.public_key = ENV["LANGFUSE_PUBLIC_KEY"]
-  config.secret_key = ENV["LANGFUSE_SECRET_KEY"]
-  config.cache_backend = :rails
-end
-
-# Per-client override
-client = Langfuse::Client.new(
-  public_key: tenant.langfuse_key,
-  cache_backend: :memory
-)
-```
-
-### 7. Prompt Versioning Strategy
-
-**Question**: How to handle version conflicts between cache and API?
-
-**Scenario**: Prompt version 1 is cached, version 2 is promoted to production
-
-**Options**:
-- **A**: Cache by label (current approach - auto-updates)
-- **B**: Cache by version (explicit, never changes)
-- **C**: Configurable cache key strategy
-
-**Recommendation**: **Cache by label (default), support version caching**
-- Rationale: Labels enable dynamic updates, versions for stability
-
----
-
-## Appendix: ASCII Diagrams
-
-### Caching Flow Diagram
-
-```
-┌─────────────────────────────────────────────────────────────┐
-│                    get("greeting")                           │
-└──────────────────────┬──────────────────────────────────────┘
-                       │
-                       ▼
-         ┌─────────────────────────┐
-         │  Check cache for key    │
-         │  "greeting-label:prod"  │
-         └─────────┬───────────────┘
-                   │
-         ┌─────────┴─────────┐
-         │                   │
-         ▼                   ▼
-    ┌────────┐         ┌─────────┐
-    │ MISS   │         │  HIT    │
-    └────┬───┘         └────┬────┘
-         │                  │
-         │            ┌─────┴──────┐
-         │            │            │
-         │            ▼            ▼
-         │       ┌────────┐  ┌──────────┐
-         │       │ FRESH  │  │ EXPIRED  │
-         │       └───┬────┘  └────┬─────┘
-         │           │            │
-         │           ▼            ▼
-         │     ┌──────────┐  ┌────────────────┐
-         │     │ Return   │  │ Return stale + │
-         │     │ cached   │  │ refresh async  │
-         │     └──────────┘  └────────────────┘
-         │
-         ▼
-    ┌────────────────┐
-    │ Fetch from API │
-    └────────┬───────┘
-             │
-       ┌─────┴──────┐
-       │            │
-       ▼            ▼
-  ┌─────────┐  ┌──────────┐
-  │ SUCCESS │  │  ERROR   │
-  └────┬────┘  └────┬─────┘
-       │            │
-       │      ┌─────┴──────┐
-       │      │            │
-       │      ▼            ▼
-       │  ┌─────────┐  ┌─────────┐
-       │  │Fallback?│  │  Raise  │
-       │  └────┬────┘  └─────────┘
-       │       │
-       │       ▼
-       │  ┌──────────────┐
-       │  │Return fallback│
-       │  └──────────────┘
-       │
-       ▼
-  ┌──────────────┐
-  │ Store cache  │
-  └──────┬───────┘
-         │
-         ▼
-  ┌──────────────┐
-  │Return prompt │
-  └──────────────┘
-```
-
-### Class Hierarchy
-
-```
-Langfuse::Client
-│
-├── prompt: PromptManager
-│   │
-│   ├── cache: PromptCache
-│   │   ├── CacheItem (value, expiry)
-│   │   └── Methods: get, set, invalidate
-│   │
-│   ├── api_client: ApiClient
-│   │   └── Methods: get_prompt, create_prompt, update_prompt
-│   │
-│   └── Methods: get, create, update
-│
-└── (existing methods: trace, generation, etc.)
-
-
-Prompt Client Hierarchy:
-
-BasePromptClient (abstract)
-│
-├── TextPromptClient
-│   ├── compile(variables)
-│   └── to_langchain()
-│
-└── ChatPromptClient
-    ├── compile(variables, placeholders)
-    └── to_langchain(placeholders:)
-```
-
-### Request Flow
-
-```
-Application Code
-      │
-      │ client.prompt.get("greeting")
-      ▼
-PromptManager
-      │
-      │ 1. Check cache
-      ▼
-PromptCache
-      │
-      ├──[MISS]──────────┐
-      │                  │
-      │                  ▼
-      │            ApiClient
-      │                  │
-      │                  │ 2. GET /api/v2/prompts/greeting
-      │                  ▼
-      │            Langfuse API
-      │                  │
-      │                  │ 3. Response: { name, version, prompt, ... }
-      │                  ▼
-      │            ApiClient
-      │                  │
-      │                  │ 4. Parse response
-      │                  ▼
-      │            PromptManager
-      │                  │
-      │                  │ 5. Build client (Text/Chat)
-      │                  ▼
-      │            TextPromptClient / ChatPromptClient
-      │                  │
-      │                  │ 6. Store in cache
-      │                  ▼
-      │            PromptCache
-      │
-      │ [HIT: FRESH]
-      ├──────────────────┐
-      │                  │
-      │                  │ 7. Return from cache
-      │                  ▼
-      │            TextPromptClient / ChatPromptClient
-      │
-      │ [HIT: EXPIRED]
-      └──────────────────┐
-                         │
-                         │ 8a. Return stale
-                         ▼
-                   TextPromptClient / ChatPromptClient
-                         │
-                         │ 8b. Background refresh
-                         ▼
-                   (Async: steps 2-6)
-```
-
----
-
-## Design Revisions and Improvements
-
-This section documents critical fixes and improvements made to the design based on technical review.
-
-### Critical Fixes Applied
-
-1. **Non-blocking Background Refresh** (Lines 389-400)
-   - **Issue**: `promise.join` blocked calling thread, defeating stale-while-revalidate
-   - **Fix**: Cleanup happens in separate thread to maintain non-blocking behavior
-   - **Impact**: Ensures fast response times even with expired cache
-
-2. **Connection Singleton Bug** (Lines 816-831)
-   - **Issue**: Mutable timeout on shared connection affected all subsequent requests
-   - **Fix**: Create dedicated connection instances for custom timeouts
-   - **Impact**: Prevents timeout configuration from leaking between requests
-
-3. **Removed Double Retry Logic** (Lines 771-783)
-   - **Issue**: Manual retries + Faraday middleware = up to 4 retries instead of 2
-   - **Fix**: Rely solely on Faraday retry middleware
-   - **Impact**: Predictable retry behavior, cleaner code
-
-4. **Cache Stampede Protection** (Lines 379-387)
-   - **Issue**: 1000 concurrent requests on expired cache = 1000 API calls
-   - **Fix**: Track refreshing keys, only first requester triggers refresh
-   - **Impact**: Prevents API rate limit exhaustion
-
-5. **Bounded Thread Pool** (Lines 514-557)
-   - **Issue**: Unbounded thread creation during cache refreshes
-   - **Fix**: Use `concurrent-ruby` FixedThreadPool (max 5 threads)
-   - **Impact**: Prevents thread exhaustion, controls concurrent API calls
-
-### Important Enhancements
-
-6. **LRU Cache Eviction** (Lines 362-378, 417-434)
-   - **Addition**: Max cache size (1000 entries) with LRU eviction policy
-   - **Benefit**: Prevents unbounded memory growth
-   - **Implementation**: Track access order, evict least recently used
-
-7. **Fallback Type Validation** (Lines 281-298)
-   - **Addition**: Validate fallback matches specified type (text/chat)
-   - **Benefit**: Prevent runtime errors from type mismatches
-   - **Example**: Reject text fallback when type is :chat
-
-8. **Cache Invalidation Safety** (Lines 268-276)
-   - **Issue**: Cache invalidated even on failed API updates
-   - **Fix**: Only invalidate after successful API response
-   - **Impact**: Prevents serving stale data after failed updates
-
-9. **Placeholder Validation** (Lines 588-630)
-   - **Addition**: Validate placeholder structure, handle empty arrays
-   - **Addition**: Support required_placeholders parameter
-   - **Benefit**: Better error messages, fail fast on invalid data
-
-10. **Security Documentation** (Lines 1527-1545)
-    - **Addition**: Document sanitization rationale (DoS prevention)
-    - **Addition**: Configurable max_length parameter
-    - **Benefit**: Clear security posture, flexible limits
-
-### Testing Enhancements
-
-11. **Edge Case Coverage** (Lines 2010-2074)
-    - Added: Cache stampede protection tests
-    - Added: Cache expiry edge case tests
-    - Added: LRU eviction tests
-    - Added: Concurrent access tests
-    - **Impact**: >95% confidence in production behavior
-
-### Observability Additions
-
-12. **Built-in Instrumentation** (Lines 318-357)
-    - **Addition**: ActiveSupport::Notifications integration
-    - **Metrics**: cache hits, duration, fallback usage
-    - **Integration**: Easy StatsD/Datadog hookup
-    - **Impact**: Production visibility without custom code
-
-### Timeline Adjustments
-
-13. **Realistic Estimates** (Lines 1619, 1648, 1698, 1704)
-    - Phase 1: 2-3 days → **3-4 days** (edge cases + thorough testing)
-    - Phase 2: 2 days → **2-3 days**
-    - Phase 4: 1-2 days → **2-3 days** (comprehensive observability)
-    - Total: 6-8 days → **8-11 days** (30% contingency buffer)
-    - **Rationale**: Account for code review, testing, documentation
-
-### Dependency Updates
-
-14. **New Required Dependency** (Line 2235)
-    - **Added**: `concurrent-ruby ~> 1.2`
-    - **Purpose**: Thread pool for bounded concurrency
-    - **Justification**: Production-grade thread primitives, prevents resource exhaustion
-
----
-
-## API Evolution: Original vs LaunchDarkly-Inspired Design
-
-This section documents the evolution from the initial nested API design to the final LaunchDarkly-inspired flattened API.
-
-### Design Comparison
-
-| Aspect | Original Design | LaunchDarkly-Inspired (Final) |
-|--------|----------------|-------------------------------|
-| **API Structure** | Nested: `client.prompt.get()` | Flat: `client.get_prompt()` |
-| **Configuration** | Inline only | Global config + per-client |
-| **Global Client** | No | `Langfuse.client` singleton |
-| **Fallback Pattern** | Optional, raises on error | Encouraged, returns fallback |
-| **Convenience Methods** | No | `compile_prompt()` for one-step |
-| **Detail Variants** | No | `get_prompt_detail()` for debugging |
-| **Method Count** | 3 (get, create, update) | 6 (get_prompt, get_prompt_detail, compile_prompt, create_prompt, update_prompt, invalidate_cache) |
-| **State Checking** | No | `initialized?` |
-
-### Code Comparison
-
-#### Initialization
-
-```ruby
-# Original
-client = Langfuse::Client.new(
-  public_key: ENV['LANGFUSE_PUBLIC_KEY'],
-  secret_key: ENV['LANGFUSE_SECRET_KEY']
-)
-
-# LaunchDarkly-Inspired
-Langfuse.configure do |config|
-  config.public_key = ENV['LANGFUSE_PUBLIC_KEY']
-  config.secret_key = ENV['LANGFUSE_SECRET_KEY']
-  config.cache_ttl = 120
-end
-client = Langfuse.client
-```
-
-#### Get Prompt
-
-```ruby
-# Original (nested)
-prompt = client.prompt.get("greeting")
-compiled = prompt.compile(name: "Alice")
-
-# LaunchDarkly-Inspired (flattened, two options)
-# Option 1: Two-step (same as original)
-prompt = client.get_prompt("greeting")
-compiled = prompt.compile(name: "Alice")
-
-# Option 2: One-step convenience
-compiled = client.compile_prompt("greeting", variables: { name: "Alice" })
-```
-
-#### With Fallback
-
-```ruby
-# Original
-begin
-  prompt = client.prompt.get("greeting")
-rescue Langfuse::NotFoundError
-  prompt = client.prompt.get("greeting", fallback: "Hello!", type: :text)
-end
-
-# LaunchDarkly-Inspired (graceful by default)
-prompt = client.get_prompt("greeting",
-  fallback: "Hello {{name}}!",
-  type: :text
-)
-# Never raises - returns fallback on error
-```
-
-#### Debugging
-
-```ruby
-# Original (no built-in support)
-start = Time.now
-prompt = client.prompt.get("greeting")
-duration = Time.now - start
-Rails.logger.info("Fetched in #{duration}s")
-
-# LaunchDarkly-Inspired (built-in)
-detail = client.get_prompt_detail("greeting")
-# => {
-#   prompt: ...,
-#   cached: true,
-#   version: 3,
-#   fetch_time_ms: 1.2,
-#   source: :cache
-# }
-```
-
-### Why the Change?
-
-The LaunchDarkly-inspired design provides:
-
-1. **Simpler Mental Model**: Everything on `Client`, no nested managers
-2. **Better Rails Integration**: Global config and singleton pattern
-3. **More Resilient**: Fallbacks encouraged, graceful degradation
-4. **Better DX**: Convenience methods reduce boilerplate
-5. **Better Observability**: Detail variants for debugging
-6. **Industry Pattern**: Familiar to developers using LaunchDarkly
-
-The additional methods and slight API surface increase are worth the improved developer experience and production reliability.
-
----
-
-## Summary and Next Steps
-
-### Summary
-
-This design document outlines a comprehensive plan to add prompt management functionality to the `langfuse-ruby` gem, achieving feature parity with the JavaScript SDK while incorporating LaunchDarkly's exceptional API design patterns.
-
-**Key Highlights**:
-
-1. **LaunchDarkly-Inspired API**: Flattened API surface, global configuration, singleton pattern
-2. **Architecture**: Clean separation of concerns (Config, Client, Cache, Clients, API)
-3. **Performance**: Sub-ms cache hits with stale-while-revalidate
-4. **Thread Safety**: Mutex-based synchronization for Rails apps
-5. **Developer Experience**: Intuitive API, convenience methods, fallback support, observability
-6. **Incremental Rollout**: 4 phases from MVP to production-ready
-
-### Next Steps
-
-1. **Review and Feedback**:
-   - [ ] Architecture review with team
-   - [ ] API design feedback from early users
-   - [ ] Security review (authentication, input validation)
-
-2. **Phase 1 Implementation** (Week 1-2):
-   - [ ] Set up project structure
-   - [ ] Implement core classes (Manager, Cache, Clients)
-   - [ ] Add ApiClient extensions
-   - [ ] Write comprehensive tests
-   - [ ] Documentation and examples
-
-3. **Beta Release**:
-   - [ ] Publish `0.2.0.beta1` to RubyGems
-   - [ ] Gather feedback from early adopters
-   - [ ] Iterate based on real-world usage
-
-4. **Phase 2-4** (Week 3-4):
-   - [ ] Advanced features (create, update, placeholders)
-   - [ ] Rails.cache integration
-   - [ ] LangChain helpers
-   - [ ] Performance optimization
-
-5. **Production Release**:
-   - [ ] Final QA and testing
-   - [ ] Complete documentation
-   - [ ] Migration guides
-   - [ ] Publish `0.2.0` stable
-
-### Success Criteria
-
-**Technical**:
-- [ ] All JavaScript SDK features implemented
-- [ ] >90% test coverage
-- [ ] Thread-safe for production Rails apps
-- [ ] <100ms p95 latency for cached prompts
-
-**Documentation**:
-- [ ] Complete API reference
-- [ ] Migration guide from hardcoded prompts
-- [ ] Integration examples (Rails, LangChain)
-- [ ] Troubleshooting guide
-
-**Adoption**:
-- [ ] 10+ beta users providing feedback
-- [ ] Zero critical bugs in production
-- [ ] Positive developer feedback
-
----
-
-**Document End**
diff --git a/docs/future-enhancements/STALE_WHILE_REVALIDATE_DESIGN.md b/docs/future-enhancements/STALE_WHILE_REVALIDATE_DESIGN.md
deleted file mode 100644
index 149a558..0000000
--- a/docs/future-enhancements/STALE_WHILE_REVALIDATE_DESIGN.md
+++ /dev/null
@@ -1,510 +0,0 @@
-# Stale-While-Revalidate (SWR) Design Document
-
-**Status:** Design Only - Not Implemented
-**Phase:** 7.3 (Future Enhancement)
-**Created:** 2025-10-16
-
----
-
-## Problem Statement
-
-With current caching (Phases 7.1 + 7.2), the first request after cache expiry must wait for the Langfuse API call (~100ms). Even with stampede protection preventing 1,200 simultaneous API calls, one user still pays the latency cost.
-
-**Current Timeline:**
-```
-Time: 10:00:00 - Prompt cached (TTL: 300s)
-Time: 10:05:00 - Cache expires
-Time: 10:05:00.001 - Request arrives
-  → Check cache: MISS (expired)
-  → Acquire lock: SUCCESS
-  → Call Langfuse API: 100ms ⏳ (user waits)
-  → Populate cache
-  → Return to user
-Total latency: ~100ms for first user
-```
-
----
-
-## Solution: Stale-While-Revalidate
-
-Serve slightly outdated (stale) data immediately while refreshing in the background. Users get instant responses (~1ms) even after cache "expires".
-
-**With SWR Timeline:**
-```
-Time: 10:00:00 - Prompt cached
-  - fresh_until: 10:05:00 (TTL: 5 minutes)
-  - stale_until: 10:10:00 (grace period: 5 more minutes)
-
-Time: 10:05:01 - Request arrives (cache expired but not stale)
-  → Return STALE data immediately (1ms latency) ✨
-  → Trigger background refresh (doesn't block user)
-  → Background: Fetch from API, update cache
-```
-
----
-
-## Design Overview
-
-### Three Cache States
-
-1. **FRESH** (`Time.now < fresh_until`): Return immediately, no action needed
-2. **REVALIDATE** (`fresh_until <= Time.now < stale_until`): Return stale data + trigger background refresh
-3. **STALE** (`Time.now >= stale_until`): Must fetch fresh data synchronously
-
-### Cache Entry Structure
-
-**Current (Phase 7.1/7.2):**
-```ruby
-CacheEntry = Struct.new(:data, :expires_at)
-```
-
-**With SWR (Phase 7.3):**
-```ruby
-CacheEntry = Struct.new(:data, :fresh_until, :stale_until) do
-  def fresh?
-    Time.now < fresh_until
-  end
-
-  def stale?
-    Time.now > stale_until
-  end
-
-  def revalidate?
-    !fresh? && !stale?
-  end
-end
-```
-
----
-
-## Implementation Approach
-
-### 1. Configuration
-
-```ruby
-Langfuse.configure do |config|
-  config.cache_backend = :rails
-  config.cache_ttl = 300                      # Fresh for 5 minutes
-  config.cache_stale_while_revalidate = true  # Enable SWR (opt-in)
-  config.cache_stale_ttl = 300                # Serve stale for 5 more minutes
-  config.cache_refresh_threads = 5            # Thread pool size (see analysis below)
-end
-```
-
-**New config options:**
-- `cache_stale_while_revalidate` (Boolean, default: false) - Enable SWR
-- `cache_stale_ttl` (Integer, default: same as cache_ttl) - Grace period duration
-- `cache_refresh_threads` (Integer, default: 5) - Background thread pool size
-
-### 2. RailsCacheAdapter Enhancement
-
-```ruby
-require 'concurrent'
-
-class RailsCacheAdapter
-  def initialize(ttl:, stale_ttl: nil, refresh_threads: 5, ...)
-    @ttl = ttl
-    @stale_ttl = stale_ttl || ttl
-    @thread_pool = Concurrent::CachedThreadPool.new(
-      max_threads: refresh_threads,
-      min_threads: 2,
-      max_queue: 50,
-      fallback_policy: :discard  # Drop oldest if queue full
-    )
-  end
-
-  # New method for SWR
-  def fetch_with_stale_while_revalidate(key, &block)
-    entry = get_entry_with_metadata(key)
-
-    if entry && entry[:fresh_until] > Time.now
-      # FRESH - return immediately
-      return entry[:data]
-    elsif entry && entry[:stale_until] > Time.now
-      # REVALIDATE - return stale + refresh in background
-      schedule_refresh(key, &block)
-      return entry[:data]  # Instant response! ✨
-    else
-      # STALE or MISS - must fetch synchronously
-      fetch_and_cache_with_metadata(key, &block)
-    end
-  end
-
-  private
-
-  def schedule_refresh(key, &block)
-    # Prevent duplicate refreshes
-    refresh_lock_key = "#{namespaced_key(key)}:refreshing"
-    return unless acquire_refresh_lock(refresh_lock_key)
-
-    @thread_pool.post do
-      begin
-        value = block.call
-        set_with_metadata(key, value)
-      ensure
-        release_lock(refresh_lock_key)
-      end
-    end
-  end
-
-  def get_entry_with_metadata(key)
-    # Fetch from Redis including timestamps
-    raw = Rails.cache.read("#{namespaced_key(key)}:metadata")
-    return nil unless raw
-
-    JSON.parse(raw, symbolize_names: true)
-  end
-
-  def set_with_metadata(key, value)
-    now = Time.now
-    entry = {
-      data: value,
-      fresh_until: now + @ttl,
-      stale_until: now + @ttl + @stale_ttl
-    }
-
-    # Store both data and metadata
-    Rails.cache.write(namespaced_key(key), value, expires_in: @ttl + @stale_ttl)
-    Rails.cache.write("#{namespaced_key(key)}:metadata", entry.to_json, expires_in: @ttl + @stale_ttl)
-
-    value
-  end
-
-  def acquire_refresh_lock(lock_key)
-    # Short-lived lock (60s) to prevent duplicate background refreshes
-    Rails.cache.write(lock_key, true, unless_exist: true, expires_in: 60)
-  end
-end
-```
-
-### 3. ApiClient Integration
-
-```ruby
-# In ApiClient#get_prompt
-def get_prompt(name, version: nil, label: nil)
-  raise ArgumentError, "Cannot specify both version and label" if version && label
-
-  cache_key = PromptCache.build_key(name, version: version, label: label)
-
-  # Use SWR if cache supports it and SWR is enabled
-  if cache&.respond_to?(:fetch_with_stale_while_revalidate)
-    cache.fetch_with_stale_while_revalidate(cache_key) do
-      fetch_prompt_from_api(name, version: version, label: label)
-    end
-  elsif cache&.respond_to?(:fetch_with_lock)
-    # Rails.cache with stampede protection (Phase 7.2)
-    cache.fetch_with_lock(cache_key) do
-      fetch_prompt_from_api(name, version: version, label: label)
-    end
-  elsif cache
-    # In-memory cache - simple get/set
-    cached_data = cache.get(cache_key)
-    return cached_data if cached_data
-
-    prompt_data = fetch_prompt_from_api(name, version: version, label: label)
-    cache.set(cache_key, prompt_data)
-    prompt_data
-  else
-    # No cache
-    fetch_prompt_from_api(name, version: version, label: label)
-  end
-end
-```
-
----
-
-## Thread Pool Sizing Analysis
-
-### Calculation
-
-```
-Threads = (Number of prompts × API latency) / Desired refresh time
-
-Example (SimplePractice):
-- Prompts: 50 unique prompts
-- API latency: 200ms
-- Desired refresh time: 5 seconds (before users notice stale data)
-
-Threads = (50 × 0.2) / 5 = 2 threads minimum
-Add 25% buffer: 2 × 1.25 = 2.5 → 3 threads
-
-With 100 prompts:
-Threads = (100 × 0.2) / 5 = 4 threads minimum
-Add 25% buffer: 4 × 1.25 = 5 threads ✅
-```
-
-### Scenarios
-
-**Scenario 1: Steady State (Distributed Expiry)**
-```
-TTL: 5 minutes = 300 seconds
-Prompts: 50 total
-
-Expiry rate: 50 prompts / 300 seconds = 0.16 prompts/second
-           = ~1 prompt every 6 seconds
-
-Thread requirement: 1 thread sufficient
-```
-
-**Scenario 2: Post-Deploy (Worst Case - All Expire Together)**
-```
-Prompts: 50 all cached at T=0
-At T=5min: All 50 hit "revalidate" state simultaneously
-
-With 2 threads:  50 ÷ 2 = 25 batches × 200ms = 5 seconds ⚠️
-With 5 threads:  50 ÷ 5 = 10 batches × 200ms = 2 seconds ✅
-With 10 threads: 50 ÷ 10 = 5 batches × 200ms = 1 second ✅
-```
-
-### Recommendations
-
-**Option A: Fixed Pool (Simplest)**
-```ruby
-config.cache_refresh_threads = 5  # Default, configurable
-@thread_pool = Concurrent::FixedThreadPool.new(5)
-```
-- **Pros**: Simple, predictable, easy to reason about
-- **Cons**: May be too few (large apps) or too many (small apps)
-
-**Option B: Auto-Sizing Pool (Recommended)**
-```ruby
-@thread_pool = Concurrent::CachedThreadPool.new(
-  max_threads: 10,      # Cap at 10
-  min_threads: 2,       # Keep 2 warm
-  max_queue: 50,        # Queue up to 50 refreshes
-  fallback_policy: :discard  # Drop oldest if queue full
-)
-```
-- **Pros**: Self-adjusts to load, efficient resource usage
-- **Cons**: Slightly more complex behavior
-
-**Option C: Calculated Based on Config**
-```ruby
-def default_refresh_threads
-  # Estimate: 1 thread per 25 prompts, min 2, max 10
-  estimated_prompts = config.cache_estimated_prompts || 50
-  threads = (estimated_prompts / 25.0).ceil
-  [[threads, 2].max, 10].min
-end
-```
-- **Pros**: Automatically sized based on expected load
-- **Cons**: Requires estimating number of prompts
-
-**Recommendation**: Use **Option B (Auto-Sizing Pool)** - best balance of simplicity and efficiency.
-
----
-
-## Benefits
-
-### 1. Better User Experience
-- Users almost never wait for API calls
-- Consistent low latency (~1ms cache reads)
-- Only "too stale" requests pay the 100ms cost
-
-### 2. Reduced Perceived Latency
-```
-Without SWR:
-- 99% of requests: 1ms (cached)
-- 1% of requests: 100ms (first after expiry)
-- P99 latency: 100ms
-
-With SWR:
-- 99.9% of requests: 1ms (cached or stale)
-- 0.1% of requests: 100ms (truly stale)
-- P99 latency: 1ms ✨
-```
-
-### 3. Graceful Degradation
-- If Langfuse API is slow/down, users still get stale data
-- Only after grace period do requests fail
-- Gives time to fix issues without user impact
-
-### 4. Smoother Load Pattern
-- Background refreshes happen asynchronously
-- No thundering herd at expiry time
-- API load is distributed over time
-
----
-
-## Trade-offs
-
-### Pros
-✅ Near-instant response times (serve stale data)
-✅ Background refresh doesn't block requests
-✅ Dramatically reduces P99 latency
-✅ More resilient to API slowdowns
-✅ Smooth cache warming (no cold-start spikes)
-
-### Cons
-❌ Users might get slightly outdated data
-❌ More complex caching logic
-❌ Requires background thread pool (~10-20MB memory)
-❌ Stale data could be incorrect if prompts change frequently
-❌ Adds dependency on concurrent-ruby gem
-
----
-
-## When to Use SWR
-
-**Good for:**
-- ✅ Prompts that don't change often (production prompts are typically stable)
-- ✅ High-traffic applications where latency matters
-- ✅ Systems where eventual consistency is acceptable
-- ✅ Apps with many processes (background refresh amortized)
-
-**Not ideal for:**
-- ❌ Prompts that change frequently (users might see old versions)
-- ❌ Critical data that must always be fresh
-- ❌ Low-traffic apps (background refresh overhead not worth it)
-- ❌ Apps sensitive to memory usage (thread pool overhead)
-
----
-
-## Example: SimplePractice Impact
-
-**Without SWR (current with Phase 7.2):**
-```
-- 1,200 processes
-- 50 prompts
-- Cache expires every 5 minutes
-- First request after expiry: 100ms latency
-- Other 1,199 requests: 1ms (stampede protection)
-```
-
-**With SWR:**
-```
-- ALL 1,200 requests: 1ms latency ✨
-- Background refresh happens without blocking
-- Stale data served for up to 5 more minutes if refresh fails
-- Same 50 API calls every 5 minutes (no extra API load)
-```
-
----
-
-## Testing Strategy
-
-### Unit Tests
-
-1. **Cache state transitions**
-   - Fresh → Revalidate → Stale
-   - Timestamps correctly set
-
-2. **Background refresh**
-   - Scheduled correctly
-   - Not duplicated (refresh lock)
-   - Executes asynchronously
-
-3. **Thread pool behavior**
-   - Queues refreshes
-   - Discards on overflow
-   - Scales up/down
-
-### Integration Tests
-
-1. **With ApiClient**
-   - Returns stale data immediately
-   - Background refresh completes
-   - Next request gets fresh data
-
-2. **Concurrency**
-   - Multiple processes hit revalidate state
-   - Only one background refresh happens
-
-3. **Error handling**
-   - Background refresh fails → keep serving stale
-   - Background refresh succeeds → cache updated
-
-### Load Tests
-
-1. **Post-deploy scenario**
-   - All prompts expire simultaneously
-   - Measure refresh time with different thread pool sizes
-
-2. **Steady state**
-   - Measure latency distribution (P50, P99, P999)
-   - Verify background refreshes don't impact user requests
-
----
-
-## Dependencies
-
-**New Gem:**
-- `concurrent-ruby ~> 1.2` - Thread pool management
-
-**Existing:**
-- Rails.cache (Redis) - Already required for Phase 7.1
-
----
-
-## Estimated Effort
-
-**Lines of Code:** ~200-250 new lines
-- RailsCacheAdapter: ~100 lines (fetch_with_stale_while_revalidate, metadata methods)
-- Config: ~20 lines (new options, validation)
-- ApiClient: ~20 lines (integration)
-- Tests: ~60-100 lines
-
-**Complexity:** Medium
-- Thread pool management (concurrent-ruby handles this)
-- Metadata storage in Redis (straightforward)
-- Background refresh scheduling (lock-based deduplication)
-
-**Testing Effort:** Medium-High
-- Background/async behavior harder to test
-- Need timing-based tests (sleep, wait for refresh)
-- Concurrency edge cases
-
-**Time Estimate:** 4-6 hours
-- 2 hours: Implementation
-- 2 hours: Testing
-- 1 hour: Documentation
-- 1 hour: Buffer/debugging
-
----
-
-## Future Enhancements
-
-### Phase 7.3.1: Smart Refresh Scheduling
-Instead of refreshing immediately on first stale request, schedule refreshes intelligently:
-- Predict when prompts will expire based on usage patterns
-- Pre-refresh popular prompts before they go stale
-- Distribute refreshes to avoid spikes
-
-### Phase 7.3.2: Adaptive TTL
-Automatically adjust TTL based on prompt change frequency:
-- Track how often prompts change in Langfuse
-- Increase TTL for stable prompts
-- Decrease TTL for frequently updated prompts
-
-### Phase 7.3.3: Metrics & Observability
-Add instrumentation for:
-- Stale hit rate
-- Background refresh success rate
-- Time spent in each cache state
-- Thread pool utilization
-
----
-
-## Decision: Not Implementing (Yet)
-
-**Rationale:**
-- Phase 7.1 (Rails.cache adapter) + Phase 7.2 (stampede protection) already provide excellent performance
-- Stampede protection ensures only 1 API call per cache miss (not 1,200)
-- The 100ms latency hit happens very infrequently (once per TTL window)
-- Added complexity (thread pool, metadata, concurrent-ruby dependency) may not be worth the marginal latency improvement
-- Can revisit if P99 latency becomes a problem in production
-
-**When to Reconsider:**
-- Users complain about latency spikes
-- P99 latency metrics show cache expiry causing issues
-- Langfuse API becomes slower (>500ms)
-- Need to support very high traffic (10,000+ requests/sec)
-
----
-
-## References
-
-- **HTTP Stale-While-Revalidate**: [RFC 5861](https://datatracker.ietf.org/doc/html/rfc5861)
-- **SWR Pattern**: [Vercel SWR Library](https://swr.vercel.app/)
-- **concurrent-ruby**: [GitHub](https://github.com/ruby-concurrency/concurrent-ruby)
-- **Thread Pool Sizing**: [Little's Law](https://en.wikipedia.org/wiki/Little%27s_law)