diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md index 4897a4b..270628b 100644 --- a/docs/ARCHITECTURE.md +++ b/docs/ARCHITECTURE.md @@ -535,21 +535,9 @@ User Code - **In-memory**: Monitor-based locking (minimal contention) - **Rails.cache**: Redis atomic operations (high concurrency) -## Future Enhancements +## See Also -See [docs/future-enhancements/](future-enhancements/) for detailed designs: - -- **Stale-While-Revalidate**: Background cache refresh for even lower latency -- **Cost Tracking**: Automatic LLM cost calculation -- **Automatic LLM Client Wrappers**: Zero-boilerplate tracing for OpenAI, Anthropic - -## Additional Resources - -- [Main README](../README.md) - Getting started guide -- [Caching Guide](CACHING.md) - Detailed caching documentation -- [Tracing Guide](TRACING.md) - LLM observability guide -- [Rails Integration](RAILS.md) - Rails-specific patterns - -## Questions? - -Open an issue on [GitHub](https://github.com/langfuse/langfuse-ruby/issues) if you have architecture questions. +- [Caching Guide](CACHING.md) - Cache backends, SWR, and stampede protection +- [Tracing Guide](TRACING.md) - LLM observability and nested spans +- [Rails Integration](RAILS.md) - Rails-specific patterns and testing +- [API Reference](API_REFERENCE.md) - Complete method reference diff --git a/docs/CACHING.md b/docs/CACHING.md index e0285e5..cab14d2 100644 --- a/docs/CACHING.md +++ b/docs/CACHING.md @@ -668,14 +668,9 @@ end 2. Verify Redis is accessible 3. Check `cache_lock_timeout` is sufficient -## Additional Resources +## See Also -- [Main README](../README.md) - SDK overview - [Configuration Reference](CONFIGURATION.md) - All config options including SWR -- [Rails Integration Guide](RAILS.md) - Rails-specific patterns -- [Tracing Guide](TRACING.md) - LLM observability -- [Architecture Guide](ARCHITECTURE.md) - Design decisions - -## Questions? - -Open an issue on [GitHub](https://github.com/langfuse/langfuse-ruby/issues) if you have questions or need help with caching. +- [Rails Integration](RAILS.md) - Rails-specific cache patterns +- [Architecture Guide](ARCHITECTURE.md) - Cache design decisions +- [Error Handling](ERROR_HANDLING.md) - Cache miss fallback behavior diff --git a/docs/GETTING_STARTED.md b/docs/GETTING_STARTED.md index c10d9e2..e194c59 100644 --- a/docs/GETTING_STARTED.md +++ b/docs/GETTING_STARTED.md @@ -302,14 +302,14 @@ end See [ERROR_HANDLING.md](ERROR_HANDLING.md) for complete error reference. -## Next Steps - -- **[PROMPTS.md](PROMPTS.md)** - Chat prompts, versioning, Mustache templating -- **[TRACING.md](TRACING.md)** - Nested observations, RAG patterns, OpenTelemetry -- **[SCORING.md](SCORING.md)** - Add quality scores to traces -- **[DATASETS.md](DATASETS.md)** - Create and manage evaluation datasets -- **[EXPERIMENTS.md](EXPERIMENTS.md)** - Run systematic evaluations with the experiment runner -- **[CACHING.md](CACHING.md)** - Optimize performance with caching -- **[RAILS.md](RAILS.md)** - Rails-specific patterns and testing -- **[CONFIGURATION.md](CONFIGURATION.md)** - All configuration options -- **[API_REFERENCE.md](API_REFERENCE.md)** - Complete method reference +## See Also + +- [Prompts Guide](PROMPTS.md) - Chat prompts, versioning, Mustache templating +- [Tracing Guide](TRACING.md) - Nested observations, RAG patterns, OpenTelemetry +- [Scoring Guide](SCORING.md) - Add quality scores to traces +- [Datasets Guide](DATASETS.md) - Create and manage evaluation datasets +- [Experiments Guide](EXPERIMENTS.md) - Run evaluations against datasets +- [Caching Guide](CACHING.md) - In-memory and Rails.cache backends, SWR +- [Configuration Reference](CONFIGURATION.md) - All configuration options +- [Rails Integration](RAILS.md) - Rails-specific patterns and testing +- [API Reference](API_REFERENCE.md) - Complete method reference diff --git a/docs/MIGRATION.md b/docs/MIGRATION.md index 9f43048..5ffe988 100644 --- a/docs/MIGRATION.md +++ b/docs/MIGRATION.md @@ -717,9 +717,10 @@ prompt.compile("name" => "Alice") 3. Use fallbacks for critical paths 4. Check network latency to Langfuse API -## Additional Resources +## See Also -- [Main README](../README.md) - SDK overview -- [Rails Integration Guide](RAILS.md) - Rails-specific patterns -- [Tracing Guide](TRACING.md) - LLM observability -- [Langfuse Documentation](https://langfuse.com/docs) - Official docs +- [Getting Started](GETTING_STARTED.md) - Installation and first trace +- [Rails Integration](RAILS.md) - Rails-specific patterns +- [Prompts Guide](PROMPTS.md) - Versioning and Mustache templating +- [Caching Guide](CACHING.md) - Cache backends and SWR +- [Langfuse Documentation](https://langfuse.com/docs) - Official Langfuse docs diff --git a/docs/RAILS.md b/docs/RAILS.md index 0914289..5fc6878 100644 --- a/docs/RAILS.md +++ b/docs/RAILS.md @@ -626,9 +626,10 @@ If experiencing high memory usage: 1. **Reduce cache_max_size**: Default is 1000, reduce if needed 2. **Enable cache cleanup**: Implement periodic cache cleanup in background job -## Additional Resources +## See Also -- [Main README](../README.md) - SDK overview and basic usage -- [Tracing Guide](TRACING.md) - Deep dive on LLM tracing +- [Getting Started](GETTING_STARTED.md) - Installation and first trace +- [Configuration Reference](CONFIGURATION.md) - All config options +- [Tracing Guide](TRACING.md) - Nested spans and OpenTelemetry - [Migration Guide](MIGRATION.md) - Migrating from hardcoded prompts - [Langfuse Documentation](https://langfuse.com/docs) - Official Langfuse docs diff --git a/docs/README.md b/docs/README.md new file mode 100644 index 0000000..2e4afd6 --- /dev/null +++ b/docs/README.md @@ -0,0 +1,37 @@ +# Langfuse Ruby SDK — Documentation + +## Foundations + +Core concepts you need before using any feature. + +- **[Getting Started](GETTING_STARTED.md)** — Install the gem, configure credentials, send your first trace +- **[Configuration](CONFIGURATION.md)** — All `Langfuse.configure` options: keys, timeouts, cache backends, SWR + +## Core Features + +The three primitives of the SDK. + +- **[Prompts](PROMPTS.md)** — Fetch, compile, and version-manage text and chat prompts +- **[Tracing](TRACING.md)** — Nested spans, RAG patterns, OpenTelemetry integration +- **[Scoring](SCORING.md)** — Attach quality scores to traces and observations + +## Evaluation + +Systematic testing of LLM behavior. + +- **[Datasets](DATASETS.md)** — Create and manage evaluation datasets +- **[Experiments](EXPERIMENTS.md)** — Run evaluations against datasets with the experiment runner + +## Production + +Patterns for real-world deployments. + +- **[Caching](CACHING.md)** — In-memory and Rails.cache backends, SWR, stampede protection +- **[Error Handling](ERROR_HANDLING.md)** — Exception types, retry behavior, fallback strategies +- **[Rails Integration](RAILS.md)** — Initializers, controller tracing, testing helpers +- **[Migration Guide](MIGRATION.md)** — Move from hardcoded prompts to Langfuse-managed prompts + +## Reference + +- **[API Reference](API_REFERENCE.md)** — Complete method reference for every public class +- **[Architecture](ARCHITECTURE.md)** — Internal design: layers, threading, cache architecture diff --git a/docs/TRACING.md b/docs/TRACING.md index abecf2e..0446983 100644 --- a/docs/TRACING.md +++ b/docs/TRACING.md @@ -668,8 +668,12 @@ This allows traces to flow seamlessly across: - Go services - Any service that implements W3C Trace Context -## Resources +## See Also +- [Scoring Guide](SCORING.md) - Add quality scores to traces +- [Prompts Guide](PROMPTS.md) - Managed prompts with tracing integration +- [Rails Integration](RAILS.md) - Controller-level tracing patterns +- [API Reference](API_REFERENCE.md) - Complete method reference - [Langfuse Documentation](https://langfuse.com/docs) - [OpenTelemetry Ruby Documentation](https://opentelemetry.io/docs/instrumentation/ruby/) - [W3C Trace Context Specification](https://www.w3.org/TR/trace-context/) diff --git a/docs/design-history/TRACING_DESIGN.md b/docs/design-history/TRACING_DESIGN.md deleted file mode 100644 index f742fff..0000000 --- a/docs/design-history/TRACING_DESIGN.md +++ /dev/null @@ -1,1668 +0,0 @@ -# Langfuse Ruby SDK - Tracing & Observability Design (OpenTelemetry-Based) - -**Status:** Draft Design Document (Revised for OpenTelemetry) -**Created:** 2025-10-15 -**Revised:** 2025-10-15 -**Author:** Noah Fisher - ---- - -## Table of Contents - -1. [Overview](#overview) -2. [Why OpenTelemetry?](#why-opentelemetry) -3. [Design Principles](#design-principles) -4. [Architecture](#architecture) -5. [API Design](#api-design) -6. [Data Model (OTel + Langfuse)](#data-model-otel--langfuse) -7. [OpenTelemetry Integration](#opentelemetry-integration) -8. [Ingestion Architecture](#ingestion-architecture) -9. [Distributed Tracing](#distributed-tracing) -10. [Prompt-to-Trace Linking](#prompt-to-trace-linking) -11. [Cost & Token Tracking](#cost--token-tracking) -12. [APM Integration](#apm-integration) -13. [Error Handling & Resilience](#error-handling--resilience) -14. [Implementation Phases](#implementation-phases) - ---- - -## Overview - -This document defines the tracing and observability features for the Langfuse Ruby SDK. These features complement the existing prompt management functionality (Phases 0-5 complete) by providing comprehensive LLM observability built on **OpenTelemetry**, the CNCF standard for distributed tracing. - -### Goals - -1. **OpenTelemetry Foundation**: Build on industry-standard OTel SDK for tracing -2. **Rails-Friendly**: Seamless integration with Rails applications and ActiveJob -3. **Distributed Tracing**: Automatic context propagation across services -4. **APM Integration**: Traces appear in Datadog, New Relic, Honeycomb, etc. -5. **Ruby-First API**: Idiomatic blocks and patterns, despite OTel underneath -6. **Automatic Linking**: Connect prompts to traces automatically -7. **Production-Ready**: Batching, retries, circuit breakers, graceful degradation - -### Non-Goals (for v1.0) - -- Real-time streaming of traces (future enhancement) -- Client-side tracing (browser SDK) -- Custom OTel instrumentations (use existing gems) - ---- - -## Why OpenTelemetry? - -### Rationale - -After researching Langfuse's Python SDK, it became clear that **Langfuse is built on top of OpenTelemetry**, not as a separate system: - -> "Context Propagation: **OpenTelemetry automatically handles** the propagation of the current trace and span context." - Langfuse Python SDK docs - -**Benefits of OTel Foundation:** - -1. **Industry Standard** - CNCF standard, used by every major APM vendor -2. **Context Propagation** - Automatic distributed tracing via W3C Trace Context -3. **Ecosystem Integration** - Works with existing Ruby instrumentation (Rails, Sidekiq, HTTP) -4. **Less Code** - Use OTel's span lifecycle, we add Langfuse-specific attributes -5. **Consistency** - Matches Python/TypeScript SDK architecture -6. **APM Correlation** - Langfuse traces appear alongside infrastructure traces - -**Trade-offs:** - -- ✅ More robust, future-proof -- ✅ Automatic distributed tracing -- ✅ Industry ecosystem support -- ❌ Adds ~10 OTel gem dependencies -- ❌ Slightly more complex setup -- ⚖️ Basic usage stays simple for developers - ---- - -## Design Principles - -### 1. OpenTelemetry Foundation - -- **Build on OTel SDK** for span/trace management -- **Create custom Exporter** to convert OTel spans → Langfuse events -- **Use OTel Context** for propagation (not custom thread-local) -- **Add Langfuse extensions** as span attributes (model, tokens, prompts, costs) - -### 2. Consistency with Existing Architecture - -Follow the same patterns established in prompt management: -- **Flat API**: Methods on `Client`, not nested managers -- **Global Configuration**: `Langfuse.configure` pattern -- **Thread-Safe**: OTel handles this for us -- **Minimal Dependencies**: Only add what's necessary -- **Ruby Conventions**: snake_case, keyword arguments, blocks - -### 3. Ruby-First API (Hide OTel Complexity) - -```ruby -# ✅ GOOD - Ruby idioms (OTel underneath) -Langfuse.trace("user-query") do |trace| - trace.generation("llm-call", model: "gpt-4") do |gen| - gen.input = [{ role: "user", content: "Hello" }] - gen.output = call_openai(...) - end -end - -# ❌ AVOID - Exposing OTel internals -tracer = OpenTelemetry.tracer_provider.tracer("langfuse") -span = tracer.start_span("user-query") -``` - -### 4. Async by Default - -- Background processing via ActiveJob (works with Sidekiq, Resque, Delayed Job, GoodJob, etc.) -- Batching to reduce API calls -- Graceful degradation if ActiveJob is unavailable -- Sync mode for debugging/testing -- Configurable queue name - -### 5. Developer Experience - -- Simple for basic use cases (hide OTel) -- Powerful for advanced scenarios (expose OTel when needed) -- Clear error messages -- Automatic metadata capture -- Minimal boilerplate - ---- - -## Architecture - -### High-Level Architecture - -``` -┌─────────────────────────────────────────────────────────────┐ -│ Application Code │ -│ Langfuse.trace(...) { |t| t.generation(...) } │ -└────────────────────┬────────────────────────────────────────┘ - │ - ▼ -┌─────────────────────────────────────────────────────────────┐ -│ Langfuse Ruby API (Block-based) │ -│ • Langfuse::Client │ -│ • Langfuse::Tracer (wrapper around OTel) │ -│ • Langfuse::Generation (adds model, tokens, prompts) │ -└────────────────────┬────────────────────────────────────────┘ - │ - ▼ -┌─────────────────────────────────────────────────────────────┐ -│ OpenTelemetry SDK (Core Tracing) │ -│ • Tracer: Creates spans │ -│ • Context: Propagates trace/span context │ -│ • Span: Time-bounded operations │ -│ • Attributes: Key-value metadata │ -└────────────────────┬────────────────────────────────────────┘ - │ - ▼ -┌─────────────────────────────────────────────────────────────┐ -│ Langfuse Exporter (OTel → Langfuse Events) │ -│ • Converts OTel spans → Langfuse ingestion format │ -│ • Adds Langfuse-specific fields (prompt, costs) │ -│ • Batches events for ingestion API │ -└────────────────────┬────────────────────────────────────────┘ - │ - ┌────────────┴───────────┐ - │ │ - ▼ ▼ -┌──────────────┐ ┌──────────────────────┐ -│ Sync Export │ │ Async Export │ -│ (Test/Debug) │ │ (ActiveJob) │ -└──────┬───────┘ └──────────┬───────────┘ - │ │ - └────────────┬────────────┘ - │ - ▼ -┌─────────────────────────────────────────────────────────────┐ -│ Langfuse Ingestion API │ -│ POST /api/public/ingestion (batch events) │ -└─────────────────────────────────────────────────────────────┘ -``` - -### Component Breakdown - -1. **Langfuse Ruby API** - Idiomatic Ruby blocks, hide OTel complexity -2. **OpenTelemetry SDK** - Handle span lifecycle, context propagation -3. **Langfuse Exporter** - Custom OTel exporter that converts spans to Langfuse events -4. **Ingestion Client** - HTTP client with batching, retries, circuit breaker -5. **ActiveJob Worker** - Async background processing (optional, works with any ActiveJob backend) - ---- - -## API Design - -### Core Concepts (Same as Before) - -**Traces** → The root container for an LLM interaction (OTel trace) -**Spans** → Time-bounded operations (OTel span) -**Generations** → LLM calls (OTel span with extra attributes) -**Events** → Point-in-time markers (OTel span events) -**Scores** → Evaluations (custom Langfuse concept, sent separately) - -### Hierarchy (OTel Spans) - -``` -OTel Trace -├── OTel Span (type: "span", name: "document-retrieval") -│ └── OTel Span (type: "generation", model: "text-embedding-ada-002") -├── OTel Span (type: "span", name: "llm-processing") -│ └── OTel Span (type: "generation", model: "gpt-4") -├── OTel Event (name: "user-feedback") -└── Custom Score (sent via Langfuse API) -``` - ---- - -## API Examples - -### Example 1: Basic Trace with Generation - -```ruby -# Simple block-based API (OTel underneath) -Langfuse.trace(name: "chat-completion", user_id: "user-123") do |trace| - trace.generation( - name: "openai-call", - model: "gpt-4", - input: [{ role: "user", content: "Hello!" }] - ) do |gen| - response = openai_client.chat(...) - - gen.output = response.choices.first.message.content - gen.usage = { - prompt_tokens: response.usage.prompt_tokens, - completion_tokens: response.usage.completion_tokens, - total_tokens: response.usage.total_tokens - } - end -end - -# Under the hood: -# 1. OTel creates a root span with name "chat-completion" -# 2. OTel creates child span with name "openai-call" -# 3. Langfuse adds attributes: model="gpt-4", type="generation", usage={...} -# 4. Langfuse Exporter converts to ingestion events -``` - -### Example 2: Nested Spans for RAG Pipeline - -```ruby -Langfuse.trace(name: "rag-query", session_id: "session-456") do |trace| - # Document retrieval span (OTel span) - docs = trace.span(name: "retrieval", input: { query: "ML basics" }) do |span| - # Embedding generation (OTel span with type="generation") - embedding = span.generation( - name: "embed-query", - model: "text-embedding-ada-002", - input: "ML basics" - ) do |gen| - result = openai_client.embeddings(...) - gen.output = result.embedding - gen.usage = { total_tokens: result.usage.total_tokens } - result.embedding - end - - # Retrieve from vector DB - vector_db.search(embedding, limit: 5) - end - - # LLM generation span - trace.span(name: "llm-generation") do |span| - prompt = build_rag_prompt(docs) - - span.generation( - name: "gpt4-completion", - model: "gpt-4", - input: prompt, - metadata: { num_docs: docs.size } - ) do |gen| - response = openai_client.chat(...) - gen.output = response.choices.first.message.content - gen.usage = { - prompt_tokens: response.usage.prompt_tokens, - completion_tokens: response.usage.completion_tokens - } - end - end - - # Track user feedback event (OTel span event) - trace.event( - name: "user-feedback", - input: { rating: "thumbs_up" } - ) - - # Add quality score (custom Langfuse concept) - trace.score(name: "helpfulness", value: 0.95) -end - -# OTel automatically handles: -# - Parent-child span relationships -# - Timestamps (start_time, end_time) -# - Context propagation within the trace -``` - -### Example 3: Distributed Tracing Across Services - -```ruby -# Service A (API Gateway) -def handle_request - Langfuse.trace(name: "api-request", user_id: "user-123") do |trace| - # Make HTTP request to Service B - # OTel automatically injects W3C Trace Context headers! - response = HTTParty.get( - "http://service-b/process", - headers: trace.inject_context # W3C traceparent header - ) - - trace.event(name: "downstream-call", output: response.code) - end -end - -# Service B (Processing Service) -def process_request - # Extract context from headers (W3C Trace Context) - context = Langfuse.extract_context(request.headers) - - # This trace is automatically linked to the parent trace in Service A! - Langfuse.trace(name: "process-data", context: context) do |trace| - trace.generation(name: "llm-call", model: "gpt-4") do |gen| - # ... LLM processing - end - end -end - -# Result: Single unified trace across both services! -# Service A → Service B (parent-child relationship preserved) -``` - -### Example 4: APM Integration (Datadog Example) - -```ruby -# When both Langfuse and Datadog are configured: - -Langfuse.trace(name: "user-query") do |trace| - trace.generation(name: "gpt4", model: "gpt-4") do |gen| - # Call external API - response = HTTParty.get("https://api.example.com/data") - # ... LLM processing - end -end - -# Result in Datadog APM: -# ┌─ Trace: user-query (Langfuse + Datadog) -# │ ├─ Span: gpt4 (Langfuse generation) -# │ │ └─ Attributes: model=gpt-4, tokens=225, cost=0.00525 -# │ └─ Span: http.request (Datadog automatic instrumentation) -# │ └─ URL: https://api.example.com/data -# └─ All correlated with the same trace_id! -``` - -### Example 5: Advanced - Direct OTel Access - -```ruby -# For advanced users who need OTel directly -Langfuse.trace(name: "complex-workflow") do |trace| - # Access underlying OTel span - otel_span = trace.current_span - - # Add custom OTel attributes - otel_span.set_attribute("custom.metric", 42) - - # Use OTel status - otel_span.status = OpenTelemetry::Trace::Status.error("Failed") - - # Still use Langfuse convenience methods - trace.generation(name: "gpt4", model: "gpt-4") do |gen| - # ... - end -end -``` - ---- - -## Data Model (OTel + Langfuse) - -### How OTel Spans Map to Langfuse Concepts - -| Langfuse Concept | OpenTelemetry Representation | Langfuse-Specific Attributes | -|------------------|------------------------------|------------------------------| -| **Trace** | OTel Trace (root span) | `user_id`, `session_id`, `tags`, `public` | -| **Span** | OTel Span | `langfuse.type="span"`, `input`, `output`, `level` | -| **Generation** | OTel Span | `langfuse.type="generation"`, `model`, `usage`, `prompt_name`, `prompt_version` | -| **Event** | OTel Span Event | `name`, `input`, `output` | -| **Score** | Custom (not OTel) | Sent separately via Langfuse API | - -### OTel Span Attributes for Langfuse - -**Common Attributes (all spans):** -```ruby -{ - "langfuse.type" => "span", # or "generation" - "langfuse.trace_id" => "trace-abc123", - "langfuse.user_id" => "user-456", - "langfuse.session_id" => "session-789", - "langfuse.metadata" => { ... }, # JSON - "langfuse.input" => { ... }, # JSON - "langfuse.output" => { ... }, # JSON - "langfuse.level" => "default" # debug, default, warning, error -} -``` - -**Generation-Specific Attributes:** -```ruby -{ - "langfuse.type" => "generation", - "langfuse.model" => "gpt-4", - "langfuse.model_parameters" => { temperature: 0.7 }, # JSON - "langfuse.usage.prompt_tokens" => 100, - "langfuse.usage.completion_tokens" => 50, - "langfuse.usage.total_tokens" => 150, - "langfuse.usage.total_cost" => 0.00525, # Auto-calculated - "langfuse.prompt_name" => "support-assistant", # Auto-linked - "langfuse.prompt_version" => 3, # Auto-linked - "langfuse.completion_start_time" => "2025-10-15T10:00:02.5Z" # Streaming -} -``` - -### OTel Event Format - -```ruby -# Span event (for user feedback, etc.) -span.add_event( - "user-feedback", - attributes: { - "langfuse.input" => { feedback_type: "thumbs_up" }.to_json, - "langfuse.level" => "default" - }, - timestamp: Time.now -) -``` - -### Score (Separate from OTel) - -Scores are sent as separate events to Langfuse API (not OTel): - -```ruby -{ - type: "score-create", - body: { - id: "score-xyz", - trace_id: "trace-abc", # Link to OTel trace - observation_id: "span-123", # Link to OTel span - name: "helpfulness", - value: 0.95, - comment: "Very helpful", - data_type: "numeric" - } -} -``` - ---- - -## OpenTelemetry Integration - -### OTel Components We'll Use - -1. **opentelemetry-sdk** - Core tracing SDK -2. **opentelemetry-api** - Public API -3. **opentelemetry-exporter-otlp** - (Optional) For OTel Collector -4. **opentelemetry-instrumentation-all** - (Optional) Auto-instrumentation - -### Initialization - -```ruby -require 'opentelemetry/sdk' -require 'langfuse/exporter' - -# Initialize OpenTelemetry with Langfuse exporter -OpenTelemetry::SDK.configure do |c| - c.service_name = 'my-rails-app' - c.service_version = ENV['APP_VERSION'] - - # Add Langfuse exporter - c.add_span_processor( - OpenTelemetry::SDK::Trace::Export::BatchSpanProcessor.new( - Langfuse::Exporter.new( - public_key: ENV['LANGFUSE_PUBLIC_KEY'], - secret_key: ENV['LANGFUSE_SECRET_KEY'] - ) - ) - ) - - # Optionally add OTLP exporter for APM (Datadog, etc.) - if ENV['OTEL_EXPORTER_OTLP_ENDPOINT'] - c.add_span_processor( - OpenTelemetry::SDK::Trace::Export::BatchSpanProcessor.new( - OpenTelemetry::Exporter::OTLP::Exporter.new - ) - ) - end -end -``` - -### Langfuse Wrapper Around OTel - -```ruby -module Langfuse - class Tracer - def initialize(otel_tracer) - @otel_tracer = otel_tracer - end - - def trace(name:, **attributes, &block) - # Create OTel span - @otel_tracer.in_span(name, attributes: otel_attributes(attributes)) do |span| - # Wrap in Langfuse::Trace for Ruby API - trace_obj = Trace.new(span, attributes) - yield(trace_obj) - end - end - - private - - def otel_attributes(attrs) - # Convert Langfuse attributes to OTel format - { - "langfuse.type" => "trace", - "langfuse.user_id" => attrs[:user_id], - "langfuse.session_id" => attrs[:session_id], - "langfuse.metadata" => attrs[:metadata].to_json - }.compact - end - end -end -``` - -### Langfuse::Trace Class - -```ruby -module Langfuse - class Trace - attr_reader :otel_span - - def initialize(otel_span, attributes = {}) - @otel_span = otel_span - @attributes = attributes - end - - def span(name:, **attrs, &block) - # Create child OTel span - tracer = OpenTelemetry.tracer_provider.tracer('langfuse') - tracer.in_span(name, attributes: otel_attributes(attrs, type: "span")) do |span| - span_obj = Span.new(span, attrs) - yield(span_obj) - end - end - - def generation(name:, model:, **attrs, &block) - tracer = OpenTelemetry.tracer_provider.tracer('langfuse') - tracer.in_span(name, attributes: otel_attributes(attrs, type: "generation", model: model)) do |span| - gen_obj = Generation.new(span, attrs.merge(model: model)) - yield(gen_obj) - end - end - - def event(name:, **attrs) - @otel_span.add_event(name, attributes: { - "langfuse.input" => attrs[:input].to_json, - "langfuse.level" => attrs[:level] || "default" - }.compact) - end - - def score(name:, value:, **attrs) - # Scores are sent separately (not OTel) - ScoreBuffer.push( - trace_id: @otel_span.context.trace_id.hex_id, - name: name, - value: value, - **attrs - ) - end - - def inject_context - # For distributed tracing - inject W3C headers - carrier = {} - OpenTelemetry.propagation.inject(carrier) - carrier - end - - def current_span - # For advanced users who need OTel directly - @otel_span - end - - private - - def otel_attributes(attrs, type:, model: nil) - { - "langfuse.type" => type, - "langfuse.model" => model, - "langfuse.input" => attrs[:input].to_json, - "langfuse.metadata" => attrs[:metadata].to_json - }.compact - end - end -end -``` - ---- - -## Ingestion Architecture - -### Langfuse Exporter (OTel Custom Exporter) - -```ruby -module Langfuse - class Exporter - def initialize(public_key:, secret_key:, **options) - @public_key = public_key - @secret_key = secret_key - @buffer = EventBuffer.new - @ingestion_client = IngestionClient.new(public_key, secret_key) - end - - # Called by OTel BatchSpanProcessor - def export(span_data_list, timeout: nil) - events = span_data_list.map { |span| convert_span_to_event(span) } - - # Buffer events - events.each { |event| @buffer.push(event) } - - # Trigger batch send if buffer is full - flush_if_needed - - OpenTelemetry::SDK::Trace::Export::SUCCESS - rescue StandardError => e - Rails.logger.error("Langfuse export failed: #{e.message}") - OpenTelemetry::SDK::Trace::Export::FAILURE - end - - def force_flush(timeout: nil) - events = @buffer.drain_all - return if events.empty? - - @ingestion_client.send_batch(events) - end - - def shutdown(timeout: nil) - force_flush(timeout: timeout) - end - - private - - def convert_span_to_event(span) - attrs = span.attributes || {} - type = attrs["langfuse.type"] || "span" - - case type - when "trace" - create_trace_event(span, attrs) - when "span" - create_span_event(span, attrs) - when "generation" - create_generation_event(span, attrs) - end - end - - def create_trace_event(span, attrs) - { - id: SecureRandom.uuid, - timestamp: span.start_timestamp, - type: "trace-create", - body: { - id: span.trace_id.hex_id, - name: span.name, - user_id: attrs["langfuse.user_id"], - session_id: attrs["langfuse.session_id"], - metadata: parse_json(attrs["langfuse.metadata"]), - tags: attrs["langfuse.tags"], - timestamp: span.start_timestamp - }.compact - } - end - - def create_generation_event(span, attrs) - { - id: SecureRandom.uuid, - timestamp: span.start_timestamp, - type: "generation-create", - body: { - id: span.span_id.hex_id, - trace_id: span.trace_id.hex_id, - parent_observation_id: span.parent_span_id&.hex_id, - name: span.name, - model: attrs["langfuse.model"], - input: parse_json(attrs["langfuse.input"]), - output: parse_json(attrs["langfuse.output"]), - model_parameters: parse_json(attrs["langfuse.model_parameters"]), - usage: extract_usage(attrs), - prompt_name: attrs["langfuse.prompt_name"], - prompt_version: attrs["langfuse.prompt_version"], - start_time: span.start_timestamp, - end_time: span.end_timestamp, - completion_start_time: attrs["langfuse.completion_start_time"], - level: attrs["langfuse.level"] || "default", - status_message: span.status&.description - }.compact - } - end - - def extract_usage(attrs) - return nil unless attrs["langfuse.usage.total_tokens"] - - { - prompt_tokens: attrs["langfuse.usage.prompt_tokens"], - completion_tokens: attrs["langfuse.usage.completion_tokens"], - total_tokens: attrs["langfuse.usage.total_tokens"], - total_cost: attrs["langfuse.usage.total_cost"] - }.compact - end - - def parse_json(json_string) - JSON.parse(json_string) if json_string - rescue JSON::ParserError - nil - end - - def flush_if_needed - return unless @buffer.size >= config.batch_size - - events = @buffer.drain(max: config.batch_size) - - # Use ActiveJob if available, otherwise sync - if async_enabled? - IngestionJob.perform_later(events: events) - else - @ingestion_client.send_batch(events) - end - end - - def async_enabled? - defined?(ActiveJob) && config.tracing_async - end - end -end -``` - -### ActiveJob Worker - -```ruby -module Langfuse - class IngestionJob < ActiveJob::Base - queue_as { Langfuse.config.job_queue } - - retry_on StandardError, wait: :exponentially_longer, attempts: 3 - - def perform(events:) - client = IngestionClient.new( - Langfuse.config.public_key, - Langfuse.config.secret_key - ) - - client.send_batch(events) - rescue StandardError => e - Rails.logger.error("Langfuse ingestion failed: #{e.message}") - # Re-raise to trigger ActiveJob retry - raise - end - end -end -``` - -### Batch Request Format (Same as Before) - -```ruby -# POST /api/public/ingestion -{ - batch: [ - { - id: "event-123", - timestamp: "2025-10-15T10:00:00.000Z", - type: "trace-create", - body: { - id: "abc123def456", # OTel trace_id - name: "user-query", - user_id: "user-456", - metadata: { ... } - } - }, - { - id: "event-124", - timestamp: "2025-10-15T10:00:01.000Z", - type: "generation-create", - body: { - id: "789xyz", # OTel span_id - trace_id: "abc123def456", # OTel trace_id - parent_observation_id: "parent-span-id", - name: "openai-call", - model: "gpt-4", - input: [...], - output: "...", - usage: { ... } - } - } - ] -} -``` - ---- - -## Distributed Tracing - -### W3C Trace Context Propagation - -OpenTelemetry automatically handles distributed tracing via W3C Trace Context headers: - -**Header Format:** -``` -traceparent: 00--- -tracestate: langfuse= -``` - -### Automatic Propagation (HTTP Calls) - -```ruby -# With opentelemetry-instrumentation-http installed: - -Langfuse.trace(name: "api-request") do |trace| - # OTel automatically injects traceparent header! - response = HTTParty.get("http://service-b/api") - - # Downstream service sees: - # traceparent: 00-abc123def456-789xyz-01 -end -``` - -### Manual Context Injection/Extraction - -```ruby -# Service A - Inject context -Langfuse.trace(name: "parent") do |trace| - headers = trace.inject_context - # => { "traceparent" => "00-abc123...", "tracestate" => "..." } - - HTTParty.get(url, headers: headers) -end - -# Service B - Extract context -def handle_request - context = Langfuse.extract_context(request.headers) - - Langfuse.trace(name: "child", context: context) do |trace| - # Automatically linked to parent trace! - end -end -``` - -### Implementation - -```ruby -module Langfuse - def self.extract_context(headers) - carrier = headers.to_h - OpenTelemetry.propagation.extract(carrier) - end - - def self.trace(name:, context: nil, **attrs, &block) - if context - # Use extracted context as parent - OpenTelemetry::Context.with_current(context) do - tracer.trace(name: name, **attrs, &block) - end - else - # Create new root trace - tracer.trace(name: name, **attrs, &block) - end - end -end -``` - ---- - -## Prompt-to-Trace Linking - -### Automatic Linking (Same as Before) - -When a prompt is used in a generation, automatically capture as OTel attributes: - -```ruby -prompt = Langfuse.client.get_prompt("support-assistant", version: 3) - -Langfuse.trace(name: "support-query") do |trace| - trace.generation( - name: "response", - model: "gpt-4", - prompt: prompt # ← Automatic linking - ) do |gen| - messages = prompt.compile(customer: "Alice") - response = call_llm(messages) - gen.output = response - end -end - -# OTel span attributes: -# { -# "langfuse.type": "generation", -# "langfuse.model": "gpt-4", -# "langfuse.prompt_name": "support-assistant", # Auto-captured -# "langfuse.prompt_version": 3, # Auto-captured -# "langfuse.input": "[{\"role\":\"system\"...}]" # Compiled prompt -# } -``` - -### Implementation - -```ruby -class Generation - def initialize(otel_span, attributes = {}) - @otel_span = otel_span - @attributes = attributes - - # Auto-detect prompt - if attributes[:prompt].is_a?(Langfuse::TextPromptClient) || - attributes[:prompt].is_a?(Langfuse::ChatPromptClient) - @otel_span.set_attribute("langfuse.prompt_name", attributes[:prompt].name) - @otel_span.set_attribute("langfuse.prompt_version", attributes[:prompt].version) - end - end -end -``` - ---- - -## Cost & Token Tracking - -### Automatic Cost Calculation (Same as Before) - -```ruby -# Model pricing database (built-in) -LANGFUSE_MODEL_PRICING = { - "gpt-4" => { - prompt_tokens: 0.03 / 1000, - completion_tokens: 0.06 / 1000 - }, - "gpt-4-turbo" => { - prompt_tokens: 0.01 / 1000, - completion_tokens: 0.03 / 1000 - } - # ... more models -} -``` - -### Usage in Generation - -```ruby -class Generation - def usage=(usage_hash) - model = @attributes[:model] - - # Set token counts as OTel attributes - @otel_span.set_attribute("langfuse.usage.prompt_tokens", usage_hash[:prompt_tokens]) - @otel_span.set_attribute("langfuse.usage.completion_tokens", usage_hash[:completion_tokens]) - @otel_span.set_attribute("langfuse.usage.total_tokens", usage_hash[:total_tokens]) - - # Auto-calculate cost if not provided - unless usage_hash[:total_cost] - cost = CostCalculator.calculate( - model: model, - prompt_tokens: usage_hash[:prompt_tokens], - completion_tokens: usage_hash[:completion_tokens] - ) - @otel_span.set_attribute("langfuse.usage.total_cost", cost) - end - end -end -``` - ---- - -## APM Integration - -### How It Works - -When multiple OTel exporters are configured, **the same trace appears in both Langfuse and your APM**: - -```ruby -# config/initializers/opentelemetry.rb -OpenTelemetry::SDK.configure do |c| - c.service_name = 'rails-app' - - # Langfuse exporter (LLM-specific details) - c.add_span_processor( - OpenTelemetry::SDK::Trace::Export::BatchSpanProcessor.new( - Langfuse::Exporter.new(...) - ) - ) - - # Datadog exporter (infrastructure details) - c.add_span_processor( - OpenTelemetry::SDK::Trace::Export::BatchSpanProcessor.new( - OpenTelemetry::Exporter::OTLP::Exporter.new( - endpoint: 'http://datadog-agent:4318' - ) - ) - ) -end -``` - -### Result in Datadog APM - -``` -Trace ID: abc123def456 -├─ Span: chat-completion (Langfuse trace) -│ ├─ Duration: 3.2s -│ ├─ Service: rails-app -│ ├─ Attributes: -│ │ ├─ langfuse.user_id: "user-123" -│ │ └─ langfuse.session_id: "session-456" -│ │ -│ ├─ Span: openai-call (Langfuse generation) -│ │ ├─ Duration: 2.8s -│ │ ├─ Attributes: -│ │ │ ├─ langfuse.model: "gpt-4" -│ │ │ ├─ langfuse.usage.total_tokens: 225 -│ │ │ └─ langfuse.usage.total_cost: 0.00675 -│ │ -│ └─ Span: http.request (Datadog auto-instrumentation) -│ ├─ Duration: 2.7s -│ ├─ URL: https://api.openai.com/v1/chat/completions -│ └─ Status: 200 -``` - -### Benefits - -1. **Unified View**: See LLM calls alongside database queries, HTTP requests -2. **Performance Analysis**: Identify slow LLM calls impacting response time -3. **Error Correlation**: Link LLM failures to infrastructure issues -4. **Cost Attribution**: Correlate costs with specific users/features - ---- - -## Error Handling & Resilience - -### 1. Circuit Breaker (Same Pattern) - -```ruby -class Langfuse::IngestionClient - def initialize - @circuit_breaker = Stoplight("langfuse-ingestion") - .with_threshold(5) - .with_timeout(30) - .with_cool_off_time(10) - .with_data_store(Stoplight::DataStore::Redis.new(Redis.current)) - end - - def send_batch(events) - @circuit_breaker.run do - connection.post("/api/public/ingestion", { batch: events }) - end - rescue Stoplight::Error::RedLight => e - Rails.logger.warn("Langfuse circuit open: #{e.message}") - # Drop events or store for retry - end -end -``` - -### 2. OTel Export Failures - -```ruby -# If Langfuse exporter fails, OTel continues normally -class Langfuse::Exporter - def export(span_data_list, timeout: nil) - # Try to export - events = convert_spans(span_data_list) - send_events(events) - - OpenTelemetry::SDK::Trace::Export::SUCCESS - rescue StandardError => e - # Log but don't crash app - Rails.logger.error("Langfuse export failed: #{e.message}") - - # Other exporters (Datadog) still work! - OpenTelemetry::SDK::Trace::Export::FAILURE - end -end -``` - -### 3. Graceful Degradation - -```ruby -# Master kill switch -Langfuse.configure do |config| - config.tracing_enabled = ENV.fetch("LANGFUSE_TRACING", "true") == "true" -end - -# Disable exporter if tracing is off -def initialize_otel - return unless Langfuse.config.tracing_enabled - - OpenTelemetry::SDK.configure do |c| - c.add_span_processor( - OpenTelemetry::SDK::Trace::Export::BatchSpanProcessor.new( - Langfuse::Exporter.new - ) - ) - end -end -``` - ---- - -## Implementation Phases - -### Phase T0: OpenTelemetry Setup (Week 1, Days 1-2) - -**Goal:** Get OpenTelemetry working with basic tracing - -#### T0.1 OTel Dependencies -- [ ] Add `opentelemetry-sdk` gem -- [ ] Add `opentelemetry-api` gem -- [ ] Add `opentelemetry-instrumentation-all` (optional) -- [ ] Add `opentelemetry-exporter-otlp` (optional, for APM) -- [ ] Update Gemfile and bundle install - -#### T0.2 Basic OTel Configuration -- [ ] Create `config/initializers/opentelemetry.rb` -- [ ] Configure service name, version -- [ ] Add console exporter for testing -- [ ] Write basic trace/span test - -#### T0.3 Verify OTel Works -- [ ] Create simple trace in test -- [ ] Verify spans are exported -- [ ] Test context propagation -- [ ] Document OTel setup - -**Dependencies Added:** -- `opentelemetry-sdk ~> 1.4` -- `opentelemetry-api ~> 1.2` -- `opentelemetry-common ~> 0.21` -- `opentelemetry-exporter-otlp ~> 0.27` (optional) - -**Milestone:** OTel tracing works! - ---- - -### Phase T1: Langfuse Exporter (Week 1, Days 3-4) - -**Goal:** Custom OTel exporter that converts spans to Langfuse events - -#### T1.1 Exporter Skeleton -- [ ] Create `Langfuse::Exporter` class -- [ ] Implement `export(span_data_list)` method -- [ ] Implement `force_flush` and `shutdown` -- [ ] Register with OTel SpanProcessor - -#### T1.2 Span Conversion -- [ ] Implement `convert_span_to_event(span)` -- [ ] Extract Langfuse attributes from OTel span -- [ ] Handle trace-create, span-create, generation-create -- [ ] Write conversion tests - -#### T1.3 Ingestion Client (Sync) -- [ ] Create `Langfuse::IngestionClient` -- [ ] Implement `POST /api/public/ingestion` -- [ ] Add Basic Auth -- [ ] Add retry logic (Faraday) -- [ ] Write tests with WebMock - -**Milestone:** OTel spans → Langfuse API! - ---- - -### Phase T2: Ruby API Wrapper (Week 2) - -**Goal:** Idiomatic Ruby API that wraps OTel - -#### T2.1 Langfuse::Tracer -- [ ] Create wrapper around OTel tracer -- [ ] Implement `Langfuse.trace { |t| ... }` -- [ ] Map Ruby kwargs to OTel attributes -- [ ] Write tests - -#### T2.2 Trace/Span/Generation Classes -- [ ] Create `Langfuse::Trace` wrapper -- [ ] Create `Langfuse::Span` wrapper -- [ ] Create `Langfuse::Generation` class -- [ ] Handle input/output/metadata -- [ ] Write comprehensive tests - -#### T2.3 Global Configuration -- [ ] Add tracing config to `Langfuse::Config` -- [ ] Integrate with `Langfuse.configure` -- [ ] Auto-initialize OTel on configure -- [ ] Write tests - -**Milestone:** Ruby block API works! - ---- - -### Phase T3: Async Processing (Week 2-3) - -**Goal:** Background processing via Sidekiq - -#### T3.1 Event Buffer -- [ ] Create `Langfuse::EventBuffer` -- [ ] Implement thread-safe push/drain -- [ ] Add overflow handling -- [ ] Write concurrency tests - -#### T3.2 Batch Span Processor -- [ ] Configure OTel BatchSpanProcessor -- [ ] Set batch size (50 spans) -- [ ] Set flush interval (10s) -- [ ] Test batching behavior - -#### T3.3 Sidekiq Worker -- [ ] Create `Langfuse::IngestionWorker` -- [ ] Accept batch of events -- [ ] Send via IngestionClient -- [ ] Add error handling - -#### T3.4 Async Configuration -- [ ] Add `tracing_async` config option -- [ ] Toggle between sync/async export -- [ ] Auto-detect Sidekiq availability -- [ ] Write tests - -**Milestone:** Async batching works! - ---- - -### Phase T4: Prompt Linking (Week 3) - -**Goal:** Automatic prompt-to-trace linking - -#### T4.1 Prompt Detection -- [ ] Detect `prompt:` kwarg in generation -- [ ] Extract name and version from PromptClient -- [ ] Add as OTel attributes - -#### T4.2 OTel Attribute Mapping -- [ ] Add `langfuse.prompt_name` attribute -- [ ] Add `langfuse.prompt_version` attribute -- [ ] Include in exporter conversion - -#### T4.3 Integration Tests -- [ ] Test end-to-end prompt linking -- [ ] Test with TextPromptClient -- [ ] Test with ChatPromptClient - -**Milestone:** Automatic prompt linking! - ---- - -### Phase T5: Cost Tracking (Week 3) - -**Goal:** Automatic cost calculation - -#### T5.1 Model Pricing Database -- [ ] Create pricing hash (same as before) -- [ ] Add OpenAI, Anthropic models -- [ ] Add custom pricing support - -#### T5.2 Cost Calculator -- [ ] Create `Langfuse::CostCalculator` -- [ ] Calculate from tokens + model -- [ ] Handle unknown models - -#### T5.3 Usage Enhancement -- [ ] Auto-calculate costs in `Generation#usage=` -- [ ] Add cost as OTel attribute -- [ ] Include in exporter - -**Milestone:** Automatic cost calculation! - ---- - -### Phase T6: Distributed Tracing (Week 4) - -**Goal:** W3C Trace Context support - -#### T6.1 Context Injection -- [ ] Implement `trace.inject_context` -- [ ] Use OTel propagation API -- [ ] Return W3C headers hash - -#### T6.2 Context Extraction -- [ ] Implement `Langfuse.extract_context(headers)` -- [ ] Use OTel propagation API -- [ ] Link child traces to parent - -#### T6.3 HTTP Instrumentation -- [ ] Add `opentelemetry-instrumentation-http` -- [ ] Test automatic header injection -- [ ] Test cross-service tracing - -**Milestone:** Distributed tracing works! - ---- - -### Phase T7: APM Integration (Week 4) - -**Goal:** Multi-exporter configuration - -#### T7.1 Multiple Exporters -- [ ] Document multi-exporter setup -- [ ] Test with Datadog + Langfuse -- [ ] Test with OTLP + Langfuse -- [ ] Ensure independent failures - -#### T7.2 Correlation -- [ ] Verify trace IDs match across exporters -- [ ] Test unified traces in Datadog -- [ ] Document APM integration - -**Milestone:** APM integration! - ---- - -### Phase T8: Advanced Features (Week 5) - -**Goal:** Events, scores, manual API - -#### T8.1 Events -- [ ] Implement `trace.event(name, ...)` -- [ ] Use OTel span events -- [ ] Map to Langfuse event format -- [ ] Test event export - -#### T8.2 Scores -- [ ] Implement `trace.score(name, value)` -- [ ] Buffer scores separately -- [ ] Send as score-create events -- [ ] Test score export - -#### T8.3 Manual API -- [ ] Expose `trace.current_span` (OTel span) -- [ ] Support manual span start/end -- [ ] Document advanced usage - -**Milestone:** Full feature parity! - ---- - -### Phase T9: Rails Integration (Week 5) - -**Goal:** Automatic Rails tracing - -#### T9.1 Middleware -- [ ] Create `Langfuse::Middleware` -- [ ] Auto-wrap requests in traces -- [ ] Capture request metadata -- [ ] Use OTel Rack instrumentation - -#### T9.2 ActiveJob Integration -- [ ] Auto-wrap jobs in traces -- [ ] Link job traces to request traces -- [ ] Use existing OTel ActiveJob instrumentation - -**Milestone:** Automatic Rails tracing! - ---- - -### Phase T10: Documentation & Polish (Week 6) - -**Goal:** Production-ready release - -#### T10.1 Documentation -- [ ] Complete API documentation (YARD) -- [ ] Write comprehensive README section -- [ ] Document OTel integration -- [ ] Write APM integration guide -- [ ] Document distributed tracing - -#### T10.2 Performance Testing -- [ ] Benchmark OTel overhead -- [ ] Optimize exporter -- [ ] Memory profiling - -#### T10.3 Final Polish -- [ ] Ensure >90% test coverage -- [ ] Fix all Rubocop issues -- [ ] Review error messages -- [ ] Final security review - -**Milestone:** Tracing features ready for 1.0! 🚀 - ---- - -## Configuration Example - -```ruby -# config/initializers/langfuse.rb -Langfuse.configure do |config| - # Authentication - config.public_key = ENV["LANGFUSE_PUBLIC_KEY"] - config.secret_key = ENV["LANGFUSE_SECRET_KEY"] - - # Tracing - config.tracing_enabled = true - config.tracing_async = true - config.batch_size = 50 - config.flush_interval = 10 - config.job_queue = :default # ActiveJob queue name (default: :default) - - # Model pricing - config.model_pricing["custom-model"] = { - prompt_tokens: 0.005 / 1000, - completion_tokens: 0.01 / 1000 - } -end - -# config/initializers/opentelemetry.rb -OpenTelemetry::SDK.configure do |c| - c.service_name = 'rails-app' - c.service_version = ENV['APP_VERSION'] - - # Langfuse exporter (LLM observability) - c.add_span_processor( - OpenTelemetry::SDK::Trace::Export::BatchSpanProcessor.new( - Langfuse::Exporter.new, - max_queue_size: 1000, - max_export_batch_size: 50, - schedule_delay: 10_000 # 10 seconds - ) - ) - - # Optional: Datadog exporter (APM) - if ENV['DD_AGENT_HOST'] - c.add_span_processor( - OpenTelemetry::SDK::Trace::Export::BatchSpanProcessor.new( - OpenTelemetry::Exporter::OTLP::Exporter.new( - endpoint: "http://#{ENV['DD_AGENT_HOST']}:4318" - ) - ) - ) - end -end -``` - ---- - -## Dependencies - -### Core OTel Dependencies -- `opentelemetry-sdk ~> 1.4` - Core tracing SDK -- `opentelemetry-api ~> 1.2` - Public API -- `opentelemetry-common ~> 0.21` - Common utilities - -### Optional OTel Dependencies -- `opentelemetry-exporter-otlp ~> 0.27` - For APM integration -- `opentelemetry-instrumentation-http ~> 0.23` - Automatic HTTP tracing -- `opentelemetry-instrumentation-rails ~> 0.30` - Automatic Rails tracing -- `opentelemetry-instrumentation-active_job ~> 0.7` - ActiveJob tracing -- `opentelemetry-instrumentation-sidekiq ~> 0.25` - Sidekiq tracing - -### Existing Dependencies (from prompt management) -- `faraday ~> 2.0` - HTTP client -- `faraday-retry ~> 2.0` - Retry logic - -### New Dependencies (tracing-specific) -- `stoplight ~> 4.0` - Circuit breaker (if not added in prompt Phase 7) - ---- - -## Open Questions - -1. **OTel Instrumentation Scope**: Should we auto-install OTel instrumentations? - - **Recommendation**: Make them optional, document in README - -2. **Sampling Strategy**: Use OTel sampler or custom? - - **Recommendation**: Use OTel ParentBasedSampler with configurable rate - -3. **Score Timing**: When to send scores (immediate vs batched)? - - **Recommendation**: Batch with other events for efficiency - -4. **OTel Collector**: Support OTLP Collector as intermediary? - - **Recommendation**: Yes, document as option for high-volume deployments - -5. **Context Storage**: Use OTel Context API or custom? - - **Recommendation**: Use OTel Context API (thread-safe, distributed-ready) - ---- - -## Future Enhancements (Post-v1.0) - -These features are not required for v1.0 but could be added in future releases based on user feedback: - -### Automatic LLM Client Wrappers - -**Motivation:** TypeScript and Python SDKs provide automatic wrappers for popular LLM clients (OpenAI, Anthropic) that eliminate boilerplate by auto-capturing inputs, outputs, and usage. - -**TypeScript Example:** -```typescript -import { observeOpenAI } from '@langfuse/openai'; -const openai = observeOpenAI(new OpenAI()); - -// All OpenAI calls automatically traced! -const completion = await openai.chat.completions.create({ - model: "gpt-4", - messages: [{ role: "user", content: "Hello" }] -}); -``` - -**Python Example:** -```python -from langfuse.openai import openai # Wrapped client - -# All calls automatically traced! -response = openai.chat.completions.create( - model="gpt-4", - messages=[{"role": "user", "content": "Hello"}] -) -``` - -**Proposed Ruby Implementation:** - -```ruby -# gem install langfuse-openai (optional extension gem) -require 'langfuse/integrations/openai' - -# Wrap OpenAI client for automatic tracing -openai = Langfuse::OpenAI.wrap(OpenAI::Client.new) - -# Inside a Langfuse trace, OpenAI calls are auto-traced -Langfuse.trace(name: "user-query") do |trace| - # Automatic generation span created! - response = openai.chat( - parameters: { - model: "gpt-4", - messages: [{ role: "user", content: "Hello" }] - } - ) - - # Usage, tokens, cost automatically captured - # Input/output automatically logged - # Model name automatically detected -end -``` - -**Implementation Approach:** - -1. **Separate Extension Gem** (optional dependency) - - `langfuse-openai` gem for OpenAI integration - - `langfuse-anthropic` gem for Anthropic integration - - Keeps core `langfuse` gem lightweight - -2. **Monkey-patching with Module Prepend** - ```ruby - module Langfuse - module OpenAI - def self.wrap(client) - client.singleton_class.prepend(ClientExtensions) - client - end - - module ClientExtensions - def chat(parameters:) - # Extract trace context from OTel - current_trace = Langfuse.current_trace - - if current_trace - current_trace.generation( - name: "openai-chat", - model: parameters[:model], - input: parameters[:messages] - ) do |gen| - response = super(parameters: parameters) - - gen.output = response.choices.first.message.content - gen.usage = { - prompt_tokens: response.usage.prompt_tokens, - completion_tokens: response.usage.completion_tokens, - total_tokens: response.usage.total_tokens - } - - response - end - else - super(parameters: parameters) - end - end - end - end - end - ``` - -3. **OTel Context Detection** - - Check if inside a Langfuse trace (via OTel context) - - Only trace if within active trace - - Pass through normally if not tracing - -**Benefits:** -- ✅ Zero boilerplate for common use cases -- ✅ Matches TypeScript/Python SDK experience -- ✅ Automatic input/output/usage capture -- ✅ Optional (doesn't bloat core gem) - -**Trade-offs:** -- ⚠️ Requires additional gem dependencies -- ⚠️ Monkey-patching risks (mitigated by Module#prepend) -- ⚠️ Needs maintenance for each LLM provider -- ⚠️ May not cover all edge cases - -**Recommendation:** Implement as separate extension gems after v1.0, starting with `langfuse-openai` based on user demand. - ---- - -## Success Metrics - -After implementation, the SDK should achieve: - -1. **Performance**: <2ms OTel overhead per trace/span creation -2. **Reliability**: >99.9% event delivery (with retry) -3. **Throughput**: Handle 10,000+ traces/second (async mode) -4. **Test Coverage**: >90% code coverage -5. **Memory**: <15MB memory overhead (OTel + Langfuse) -6. **Developer Experience**: <10 lines of code for typical use case -7. **APM Compatibility**: Works alongside Datadog, New Relic, etc. - ---- - -## References - -- [Langfuse Tracing Docs](https://langfuse.com/docs/tracing) -- [Langfuse Python SDK](https://langfuse.com/docs/sdk/python/low-level-sdk) (uses OTel) -- [OpenTelemetry Ruby](https://opentelemetry.io/docs/instrumentation/ruby/) -- [OpenTelemetry Specification](https://opentelemetry.io/docs/specs/otel/) -- [W3C Trace Context](https://www.w3.org/TR/trace-context/) -- [Datadog OpenTelemetry](https://docs.datadoghq.com/tracing/setup_overview/open_standards/otel/) - ---- - -## Key Advantages of OTel-Based Design - -### vs Custom Implementation - -| Feature | OTel-Based | Custom Implementation | -|---------|------------|----------------------| -| **Context Propagation** | Automatic (W3C Trace Context) | Manual thread-local storage | -| **Distributed Tracing** | Built-in across services | Complex custom solution | -| **APM Integration** | Native support | Requires custom exporters | -| **Industry Adoption** | CNCF standard, widely used | SDK-specific | -| **Code Maintenance** | Less custom code | More code to maintain | -| **Learning Curve** | OTel patterns (well-documented) | Custom patterns | -| **Instrumentation** | Rich ecosystem (auto-instrument) | Manual instrumentation | -| **Future-Proof** | Industry direction | May diverge from standards | - -### Key Decision Points - -**Choose OTel-Based Design If:** -- ✅ You want distributed tracing across microservices -- ✅ You use APM tools (Datadog, New Relic, Honeycomb) -- ✅ You want automatic instrumentation (HTTP, Rails, Sidekiq) -- ✅ You value industry standards over custom solutions -- ✅ You want unified observability (infrastructure + LLM) - -**Avoid OTel If:** -- ❌ You need minimal dependencies (OTel adds ~10 gems) -- ❌ You only trace within a single app (no distributed tracing) -- ❌ You don't use APM tools -- ❌ You want complete control over internals - -**For SimplePractice (100 microservices):** OTel-based design is **strongly recommended** due to distributed architecture and existing APM tooling. - ---- - -**END OF DESIGN DOCUMENT** diff --git a/docs/design-history/langfuse-ruby-prompt-management-design.md b/docs/design-history/langfuse-ruby-prompt-management-design.md deleted file mode 100644 index 6ea3d47..0000000 --- a/docs/design-history/langfuse-ruby-prompt-management-design.md +++ /dev/null @@ -1,3556 +0,0 @@ -# Langfuse Ruby SDK - Prompt Management Technical Design - -**Document Version:** 1.0 -**Date:** 2025-10-02 -**Author:** Technical Architecture Team -**Status:** Design Document - ---- - -## Table of Contents - -1. [Executive Summary](#executive-summary) -2. [Problem Statement](#problem-statement) -3. [Architecture Overview](#architecture-overview) -4. [Component Design](#component-design) -5. [API Design](#api-design) -6. [Caching Strategy](#caching-strategy) -7. [REST API Integration](#rest-api-integration) -8. [Variable Substitution](#variable-substitution) -9. [Implementation Phases](#implementation-phases) -10. [Testing Strategy](#testing-strategy) -11. [Dependencies](#dependencies) -12. [Code Examples](#code-examples) -13. [Migration Strategy](#migration-strategy) -14. [Trade-offs and Alternatives](#trade-offs-and-alternatives) -15. [Open Questions](#open-questions) - ---- - -## Executive Summary - -This document outlines the technical design for adding prompt management functionality to the `langfuse-ruby` gem, achieving feature parity with the JavaScript SDK's prompt management capabilities while adhering to Ruby idioms and best practices. - -### Key Objectives - -- **Feature Parity**: Match JavaScript SDK's prompt management functionality -- **Ruby Conventions**: Follow Ruby/Rails conventions (snake_case, blocks, Rails.cache integration) -- **Thread Safety**: Ensure concurrent request safety for Rails applications -- **Performance**: Implement intelligent caching with stale-while-revalidate pattern -- **Developer Experience**: Provide intuitive, well-documented API - -### Success Metrics - -- All JavaScript SDK prompt features available in Ruby -- Sub-100ms cache hits for prompt retrieval -- Zero breaking changes to existing langfuse-ruby API -- Comprehensive test coverage (>90%) -- Thread-safe for production Rails applications - ---- - -## Design Philosophy: LaunchDarkly-Inspired API - -### Why LaunchDarkly as a Model? - -The LaunchDarkly Ruby SDK is widely regarded as one of the best-designed Ruby gems, with exceptional developer ergonomics. This design incorporates several key patterns from LaunchDarkly: - -**1. Flat API Surface** -- LaunchDarkly: `client.variation('flag', user, default)` -- Langfuse: `client.get_prompt('name', fallback: "...")` -- Benefit: Minimal cognitive overhead, everything on the client - -**2. Required Defaults for Resilience** -- LaunchDarkly: Every call requires a default value, never throws -- Langfuse: Encourage fallbacks, gracefully degrade on errors -- Benefit: Production resilience built-in - -**3. Configuration Object Pattern** -- LaunchDarkly: `Config` class with block initialization -- Langfuse: `Langfuse::Config` with global configuration -- Benefit: Clean Rails initialization, centralized settings - -**4. Simple Return Values** -- LaunchDarkly: `variation` returns value, `variation_detail` adds metadata -- Langfuse: `get_prompt` returns client, `get_prompt_detail` adds metadata -- Benefit: Simple common case, detailed when needed - -**5. Global Singleton Pattern** -- LaunchDarkly: Initialize once, use everywhere -- Langfuse: `Langfuse.client` for Rails convenience -- Benefit: No prop-drilling, simpler service objects - -### API Comparison - -| Feature | LaunchDarkly | Langfuse (This Design) | -|---------|-------------|------------------------| -| Initialization | `LDClient.new(sdk_key, config)` | `Client.new(config)` | -| Global config | `Config.new { \|c\| ... }` | `Langfuse.configure { \|c\| ... }` | -| Global client | Manual singleton | `Langfuse.client` | -| Primary method | `variation(key, user, default)` | `get_prompt(name, fallback: ...)` | -| Detail variant | `variation_detail(key, user, default)` | `get_prompt_detail(name, ...)` | -| Error handling | Returns default, logs error | Returns fallback or raises | -| State check | `initialized?` | `initialized?` | - ---- - -## Problem Statement - -### Current State - -The `langfuse-ruby` gem (v0.1.4) provides: -- Tracing functionality (trace, span, generation, event, score) -- Basic configuration and authentication -- Async processing via Sidekiq integration - -**Missing capabilities:** -- Prompt retrieval and management -- Prompt creation and updates -- Variable substitution/compilation -- Intelligent caching -- Placeholder support for chat prompts -- LangChain integration helpers - -### Business Context - -Langfuse prompts enable: -- **Centralized Prompt Management**: Single source of truth for LLM prompts -- **Version Control**: Track prompt changes over time -- **A/B Testing**: Multiple prompt versions with labels -- **Rapid Iteration**: Update prompts without code deployment -- **Collaboration**: Product/non-technical teams can manage prompts - -### Target Users - -1. **Rails Developers**: Building LLM-powered features in production Rails apps -2. **Data Scientists**: Experimenting with prompt engineering in Ruby notebooks -3. **Platform Teams**: Managing LLM integrations across microservices - ---- - -## Architecture Overview - -### High-Level System Design - -``` -┌─────────────────────────────────────────────────────────────┐ -│ Langfuse Module (Global Config) │ -│ │ -│ • configure { |config| ... } │ -│ • client (singleton) │ -│ • reset! (testing) │ -└──────────────────────────┬───────────────────────────────────┘ - │ - │ creates - ▼ -┌─────────────────────────────────────────────────────────────┐ -│ Langfuse::Client │ -│ │ -│ Prompt Methods (Flattened API): │ -│ • get_prompt(name, **options) │ -│ • get_prompt_detail(name, **options) │ -│ • compile_prompt(name, variables:, placeholders:) │ -│ • create_prompt(**body) │ -│ • update_prompt(name:, version:, labels:) │ -│ • invalidate_cache(name) │ -│ • initialized? │ -│ │ -│ ┌─────────────────────────────────────────────────────┐ │ -│ │ Langfuse::PromptCache │ │ -│ │ │ │ -│ │ • get_including_expired(key) │ │ -│ │ • set(key, value, ttl) │ │ -│ │ • invalidate(prompt_name) │ │ -│ │ • trigger_background_refresh │ │ -│ │ • Thread-safe operations (Mutex) │ │ -│ └─────────────────────────────────────────────────────┘ │ -│ │ -│ ┌─────────────────────────────────────────────────────┐ │ -│ │ Langfuse::ApiClient │ │ -│ │ │ │ -│ │ • get_prompt(name, version:, label:) │ │ -│ │ • create_prompt(body) │ │ -│ │ • update_prompt_version(name, version, labels) │ │ -│ └─────────────────────────────────────────────────────┘ │ -└─────────────────────────────────────────────────────────────┘ - │ - │ HTTP (Basic Auth) - ▼ - ┌──────────────────────┐ - │ Langfuse API │ - │ (cloud.langfuse.com)│ - └──────────────────────┘ - - -┌─────────────────────────────────────────────────────────────┐ -│ Prompt Client Classes │ -├─────────────────────────────────────────────────────────────┤ -│ │ -│ ┌────────────────────────────────────────────────────┐ │ -│ │ Langfuse::TextPromptClient │ │ -│ │ │ │ -│ │ • compile(variables = {}) │ │ -│ │ • to_langchain │ │ -│ │ • name, version, config, labels, tags │ │ -│ └────────────────────────────────────────────────────┘ │ -│ │ -│ ┌────────────────────────────────────────────────────┐ │ -│ │ Langfuse::ChatPromptClient │ │ -│ │ │ │ -│ │ • compile(variables = {}, placeholders = {}) │ │ -│ │ • to_langchain(placeholders: {}) │ │ -│ │ • name, version, config, labels, tags │ │ -│ └────────────────────────────────────────────────────┘ │ -└─────────────────────────────────────────────────────────────┘ -``` - -### Component Responsibilities - -| Component | Responsibility | Thread-Safe? | -|-----------|---------------|--------------| -| `Langfuse` (module) | Global configuration, singleton client | N/A | -| `Langfuse::Config` | Configuration object (keys, cache, logger) | N/A | -| `Langfuse::Client` | Main API surface, prompt operations, caching logic | Yes | -| `Langfuse::PromptCache` | In-memory TTL cache, stale-while-revalidate | Yes | -| `Langfuse::ApiClient` | HTTP communication with Langfuse API | Yes | -| `Langfuse::TextPromptClient` | Text prompt manipulation, compilation | N/A (immutable) | -| `Langfuse::ChatPromptClient` | Chat prompt manipulation, placeholders | N/A (immutable) | - -### Integration with Existing Gem - -The prompt management system integrates seamlessly: - -```ruby -# Global configuration (Rails initializer) -Langfuse.configure do |config| - config.public_key = ENV['LANGFUSE_PUBLIC_KEY'] - config.secret_key = ENV['LANGFUSE_SECRET_KEY'] -end - -# NEW: Get global client -client = Langfuse.client - -# NEW: Prompt management (flattened API) -prompt = client.get_prompt("greeting") -compiled = prompt.compile(name: "Alice") - -# Or: One-step convenience method -text = client.compile_prompt("greeting", variables: { name: "Alice" }) - -# Existing tracing (unchanged) -client.trace(name: "my-trace") do |trace| - trace.generation(name: "llm-call", input: compiled) -end -``` - ---- - -## Component Design - -### 1. Langfuse Module (Global Configuration) - -**Purpose**: Provide global configuration and singleton client for Rails convenience. - -```ruby -module Langfuse - class << self - attr_writer :configuration - - # Global configuration - def configuration - @configuration ||= Config.new - end - - # Configure block (Rails initializer) - def configure - yield(configuration) - configuration.validate! - end - - # Global singleton client - def client - @client ||= Client.new(configuration) - end - - # Reset for testing - def reset! - @configuration = nil - @client = nil - end - end -end -``` - -### 2. Langfuse::Config - -**Purpose**: Configuration object for client initialization. - -```ruby -module Langfuse - class Config - attr_accessor :public_key, :secret_key, :base_url, :timeout, :logger - attr_accessor :cache_ttl, :cache_max_size, :cache_backend - - DEFAULT_BASE_URL = "https://cloud.langfuse.com" - DEFAULT_TIMEOUT = 5 - DEFAULT_CACHE_TTL = 60 - DEFAULT_CACHE_MAX_SIZE = 1000 - - def initialize - @public_key = ENV['LANGFUSE_PUBLIC_KEY'] - @secret_key = ENV['LANGFUSE_SECRET_KEY'] - @base_url = ENV['LANGFUSE_BASE_URL'] || DEFAULT_BASE_URL - @timeout = DEFAULT_TIMEOUT - @cache_ttl = DEFAULT_CACHE_TTL - @cache_max_size = DEFAULT_CACHE_MAX_SIZE - @cache_backend = :memory - @logger = defined?(Rails) ? Rails.logger : Logger.new($stdout) - - yield(self) if block_given? - end - - def validate! - raise ConfigurationError, "public_key is required" if public_key.nil? || public_key.empty? - raise ConfigurationError, "secret_key is required" if secret_key.nil? || secret_key.empty? - end - end -end -``` - -**Usage Examples**: - -```ruby -# Rails initializer -Langfuse.configure do |config| - config.public_key = ENV['LANGFUSE_PUBLIC_KEY'] - config.secret_key = ENV['LANGFUSE_SECRET_KEY'] - config.cache_ttl = 120 # 2 minutes - config.logger = Rails.logger -end - -# Anywhere in app -client = Langfuse.client # Uses global config - -# Or: Custom client for multi-tenant -config = Langfuse::Config.new do |c| - c.public_key = tenant.langfuse_key - c.secret_key = tenant.langfuse_secret -end -custom_client = Langfuse::Client.new(config) -``` - -### 3. Langfuse::Client (Flattened API) - -**Purpose**: Main entry point for all prompt operations with caching logic. - -```ruby -module Langfuse - class Client - attr_reader :config, :api_client, :cache, :logger - - # Initialize with Config object or inline options - # - # @param config_or_options [Config, Hash] Config object or hash of options - def initialize(config_or_options = nil) - if config_or_options.is_a?(Config) - @config = config_or_options - elsif config_or_options.is_a?(Hash) - # Backward compatibility: convert hash to config - @config = Config.new do |c| - config_or_options.each { |k, v| c.send("#{k}=", v) if c.respond_to?("#{k}=") } - end - else - @config = Langfuse.configuration - end - - @config.validate! - @logger = @config.logger - @api_client = ApiClient.new( - public_key: @config.public_key, - secret_key: @config.secret_key, - base_url: @config.base_url, - timeout: @config.timeout, - logger: @logger - ) - @cache = PromptCache.new( - max_size: @config.cache_max_size, - logger: @logger - ) - end - - # Check if client is initialized and ready - def initialized? - !@api_client.nil? && !@cache.nil? - end - - # Get a prompt with caching and fallback support - # - # @param name [String] Prompt name - # @param options [Hash] Options hash - # @option options [Integer] :version Specific version - # @option options [String] :label Label filter (default: "production") - # @option options [Integer] :cache_ttl Cache TTL in seconds - # @option options [String, Array] :fallback Fallback content (RECOMMENDED) - # @option options [Symbol] :type Force type (:text or :chat) - # @option options [Integer] :timeout Request timeout in seconds - # - # @return [TextPromptClient, ChatPromptClient] - # @raise [ApiError, NotFoundError] if fetch fails and no fallback - # - # @example Get with fallback (recommended) - # prompt = client.get_prompt("greeting", - # fallback: "Hello {{name}}!", - # type: :text - # ) - def get_prompt(name, **options) - # Implementation: cache lookup -> API fetch -> fallback handling - # (see full implementation in detailed design below) - end - - # Get prompt with detailed metadata (for debugging/observability) - # - # @return [Hash] { prompt:, cached:, stale:, version:, fetch_time_ms:, source: } - def get_prompt_detail(name, **options) - # Implementation details - end - - # Convenience method: Get and compile in one step - # - # @param name [String] Prompt name - # @param variables [Hash] Variables for text prompts - # @param placeholders [Hash] Placeholders for chat prompts - # @param options [Hash] Same options as get_prompt - # - # @return [String, Array] Compiled result - # - # @example - # text = client.compile_prompt("greeting", - # variables: { name: "Alice" }, - # fallback: "Hello {{name}}!", - # type: :text - # ) - def compile_prompt(name, variables: {}, placeholders: {}, **options) - prompt = get_prompt(name, **options) - case prompt - when TextPromptClient - prompt.compile(variables) - when ChatPromptClient - prompt.compile(variables, placeholders) - end - end - - # Create a new prompt - # - # @param body [Hash] Prompt definition - # @return [TextPromptClient, ChatPromptClient] - def create_prompt(**body) - validate_create_body!(body) - response = api_client.create_prompt(body) - build_prompt_client(response) - end - - # Update prompt version labels - # - # @param name [String] Prompt name - # @param version [Integer] Version number - # @param labels [Array] New labels - # - # @return [Hash] Updated prompt metadata - def update_prompt(name:, version:, labels:) - validate_update_params!(name, version, labels) - response = api_client.update_prompt_version(name, version, labels) - cache.invalidate(name) # Only after successful update - response - end - - # Invalidate cache for a prompt - def invalidate_cache(name) - cache.invalidate(name) - logger.info("Langfuse: Invalidated cache for #{name}") - end - - private - - def validate_fallback_type!(fallback, type) - case type - when :text - unless fallback.is_a?(String) - raise ArgumentError, "Text prompt fallback must be a String, got #{fallback.class}" - end - when :chat - unless fallback.is_a?(Array) - raise ArgumentError, "Chat prompt fallback must be an Array, got #{fallback.class}" - end - fallback.each_with_index do |msg, i| - unless msg.is_a?(Hash) && (msg.key?(:role) || msg.key?(:type)) - raise ArgumentError, "Chat fallback message #{i} must have :role or :type" - end - end - end - end - - # ... additional private methods for fetch_and_cache, build_prompt_client, etc. - end -end -``` - -**Key Design Decisions:** - -1. **Flattened API**: All methods directly on `Client` (LaunchDarkly style) -2. **Keyword Arguments**: `**options` for flexibility and readability -3. **Smart Defaults**: `cache_ttl: 60`, `label: "production"` -4. **Graceful Degradation**: Fallback support encouraged, logs instead of raising when fallback provided -5. **Convenience Methods**: `compile_prompt` for common one-step use case -6. **Detail Variants**: `get_prompt_detail` for observability (LaunchDarkly pattern) -7. **Built-in Instrumentation**: ActiveSupport::Notifications for observability - -**Observability & Instrumentation:** - -```ruby -class Client - def get_prompt(name, **options) - start_time = Time.now - - # Fetch logic... - result = # ... - - # Emit instrumentation event - instrument('prompt.get', { - name: name, - cached: cache_hit?, - duration_ms: (Time.now - start_time) * 1000, - version: result.version, - fallback_used: result.is_fallback - }) - - result - end - - private - - def instrument(event, payload) - return unless defined?(ActiveSupport::Notifications) - ActiveSupport::Notifications.instrument("langfuse.#{event}", payload) - end -end - -# Subscribe to events for monitoring -ActiveSupport::Notifications.subscribe('langfuse.prompt.get') do |name, start, finish, id, payload| - # Log to stdout - Rails.logger.info("Langfuse prompt fetch", payload) - - # Send to StatsD/Datadog - StatsD.increment('langfuse.prompt.get') - StatsD.timing('langfuse.prompt.duration', payload[:duration_ms]) - StatsD.increment('langfuse.prompt.cache_hit') if payload[:cached] - StatsD.increment('langfuse.prompt.fallback') if payload[:fallback_used] -end -``` - -### 4. Langfuse::PromptCache - -**Purpose**: Thread-safe in-memory cache with TTL and stale-while-revalidate support. - -```ruby -module Langfuse - class PromptCache - DEFAULT_TTL_SECONDS = 60 - MAX_CACHE_SIZE = 1000 # Prevent unbounded memory growth - - class CacheItem - attr_reader :value, :expiry - - def initialize(value, ttl_seconds) - @value = value - @expiry = Time.now + ttl_seconds - end - - def expired? - Time.now > @expiry - end - end - - def initialize(max_size: MAX_CACHE_SIZE) - @cache = {} - @mutex = Mutex.new - @refreshing_keys = {} - @access_order = [] # Track access for LRU eviction - @max_size = max_size - end - - # Get item including expired entries (for stale-while-revalidate) - # Implements cache stampede protection - # - # @param key [String] Cache key - # @return [CacheItem, nil] - def get_including_expired(key) - @mutex.synchronize do - item = @cache[key] - - # Update access order for LRU - if item - @access_order.delete(key) - @access_order.push(key) - end - - item - end - end - - # Generate cache key from prompt parameters - # - # @param name [String] Prompt name - # @param version [Integer, nil] Version number - # @param label [String, nil] Label - # @return [String] Cache key - # - # @example - # create_key(name: "greeting", label: "production") - # # => "greeting-label:production" - def create_key(name:, version: nil, label: nil) - parts = [name] - if version - parts << "version:#{version}" - elsif label - parts << "label:#{label}" - else - parts << "label:production" - end - parts.join("-") - end - - # Store value in cache with TTL and LRU eviction - # - # @param key [String] Cache key - # @param value [TextPromptClient, ChatPromptClient] Prompt client - # @param ttl_seconds [Integer, nil] TTL (default: 60) - def set(key, value, ttl_seconds = nil) - ttl = ttl_seconds || DEFAULT_TTL_SECONDS - @mutex.synchronize do - # Evict LRU entry if at capacity - evict_lru if @cache.size >= @max_size && !@cache.key?(key) - - @cache[key] = CacheItem.new(value, ttl) - - # Update access order - @access_order.delete(key) - @access_order.push(key) - end - end - - # Track background refresh promise - # - # @param key [String] Cache key - # @param promise [Thread] Background refresh thread - def add_refreshing_promise(key, promise) - @mutex.synchronize { @refreshing_keys[key] = promise } - - # Non-blocking cleanup after thread completes - # This ensures stale-while-revalidate doesn't block the calling thread - Thread.new do - promise.join - @mutex.synchronize { @refreshing_keys.delete(key) } - end - end - - # Check if key is currently being refreshed - # - # @param key [String] Cache key - # @return [Boolean] - def refreshing?(key) - @mutex.synchronize { @refreshing_keys.key?(key) } - end - - # Invalidate all cache entries for a prompt - # - # @param prompt_name [String] Prompt name - def invalidate(prompt_name) - @mutex.synchronize do - @cache.keys.each do |key| - if key.start_with?(prompt_name) - @cache.delete(key) - @access_order.delete(key) - end - end - end - end - - private - - # Evict least recently used cache entry - def evict_lru - return if @access_order.empty? - - lru_key = @access_order.shift - @cache.delete(lru_key) - end - end -end -``` - -**Key Design Decisions:** - -1. **Mutex for Thread Safety**: All cache operations use mutex synchronization -2. **Stale-While-Revalidate**: Return expired cache while refreshing in background -3. **Simple Key Generation**: Deterministic cache keys based on name/version/label -4. **Rails.cache Integration**: Phase 2 will add optional Rails.cache backend - -**Alternative Considered: Rails.cache by Default** - -We could use `Rails.cache` immediately, but: -- **Pro**: Distributed caching across processes/servers -- **Con**: Requires Rails dependency, slower than in-memory -- **Decision**: Start with in-memory, add Rails.cache as opt-in in Phase 2 - -**Background Refresh with Thread Pool:** - -To prevent unbounded thread creation during cache refreshes, use a thread pool: - -```ruby -require 'concurrent-ruby' - -class PromptCache - def initialize(max_size: MAX_CACHE_SIZE) - @cache = {} - @mutex = Mutex.new - @refreshing_keys = {} - @access_order = [] - @max_size = max_size - # Thread pool for background refreshes (max 5 concurrent) - @thread_pool = Concurrent::FixedThreadPool.new(5) - end - - def trigger_background_refresh(key, &block) - return if refreshing?(key) - - @mutex.synchronize do - return if @refreshing_keys.key?(key) - - # Submit to thread pool instead of creating unbounded threads - future = Concurrent::Future.execute(executor: @thread_pool) do - block.call - end - - @refreshing_keys[key] = future - - # Clean up when done (non-blocking) - future.add_observer do |time, value, reason| - @mutex.synchronize { @refreshing_keys.delete(key) } - end - end - end -end -``` - -**Benefits:** -- Limits concurrent API calls to 5 (configurable) -- Prevents thread exhaustion under high load -- Graceful handling of refresh failures - -### 5. Langfuse::TextPromptClient - -**Purpose**: Represent and manipulate text-based prompts. - -```ruby -module Langfuse - class TextPromptClient - attr_reader :name, :version, :config, :labels, :tags, :prompt, :type, :is_fallback - - def initialize(response, is_fallback: false) - @name = response[:name] - @version = response[:version] - @config = response[:config] || {} - @labels = response[:labels] || [] - @tags = response[:tags] || [] - @prompt = response[:prompt] - @type = :text - @is_fallback = is_fallback - end - - # Compile prompt by substituting variables - # - # @param variables [Hash] Variable substitutions - # @return [String] Compiled prompt - # - # @example - # prompt.compile(name: "Alice", city: "NYC") - # # "Hello {{name}} from {{city}}!" => "Hello Alice from NYC!" - def compile(variables = {}) - Mustache.render(prompt, variables.transform_keys(&:to_s)) - end - - # Convert to LangChain PromptTemplate format - # - # @return [String] Prompt with {var} syntax - # - # @example - # prompt.to_langchain - # # "Hello {{name}}!" => "Hello {name}!" - def to_langchain - transform_to_langchain_variables(prompt) - end - - # Serialize to JSON - # - # @return [String] JSON representation - def to_json(*args) - { - name: name, - prompt: prompt, - version: version, - is_fallback: is_fallback, - tags: tags, - labels: labels, - type: type, - config: config - }.to_json(*args) - end - - private - - def transform_to_langchain_variables(content) - # Convert {{var}} to {var} - content.gsub(/\{\{(\w+)\}\}/, '{\1}') - end - end -end -``` - -**Key Design Decisions:** - -1. **Immutable**: All attributes are read-only (Ruby convention for value objects) -2. **Mustache Templating**: Use `mustache` gem for variable substitution -3. **Symbol Keys**: Return `:text` for type (Ruby convention) -4. **Simple Interface**: Focus on common use cases (compile, to_langchain) - -### 6. Langfuse::ChatPromptClient - -**Purpose**: Represent and manipulate chat-based prompts with placeholder support. - -```ruby -module Langfuse - class ChatPromptClient - attr_reader :name, :version, :config, :labels, :tags, :prompt, :type, :is_fallback - - # Chat message types - MESSAGE_TYPE_CHAT = "chatmessage" - MESSAGE_TYPE_PLACEHOLDER = "placeholder" - - def initialize(response, is_fallback: false) - @name = response[:name] - @version = response[:version] - @config = response[:config] || {} - @labels = response[:labels] || [] - @tags = response[:tags] || [] - @prompt = normalize_prompt(response[:prompt]) - @type = :chat - @is_fallback = is_fallback - end - - # Compile prompt by substituting variables and resolving placeholders - # - # @param variables [Hash] Variable substitutions for Mustache templates - # @param placeholders [Hash] Placeholder resolutions (name => array of messages) - # @param required_placeholders [Array] List of required placeholder names - # @return [Array] Array of chat messages with resolved placeholders - # @raise [ArgumentError] if required placeholder is missing or invalid - # - # @example - # messages = prompt.compile( - # { user_name: "Alice" }, - # { examples: [ - # { role: "user", content: "Hi" }, - # { role: "assistant", content: "Hello!" } - # ]} - # ) - def compile(variables = {}, placeholders = {}, required_placeholders: []) - # Validate required placeholders are provided - required_placeholders.each do |name| - unless placeholders.key?(name) || placeholders.key?(name.to_sym) || placeholders.key?(name.to_s) - raise ArgumentError, "Required placeholder '#{name}' not provided" - end - end - - messages = [] - - prompt.each do |item| - if item[:type] == MESSAGE_TYPE_PLACEHOLDER - # Resolve placeholder - placeholder_value = placeholders[item[:name].to_sym] || placeholders[item[:name]] - - if placeholder_value.nil? - # Keep unresolved placeholder for debugging - messages << item - elsif placeholder_value.is_a?(Array) - # Handle empty arrays - skip them - next if placeholder_value.empty? - - # Validate all messages have proper structure - unless valid_chat_messages?(placeholder_value) - raise ArgumentError, "Placeholder '#{item[:name]}' must contain valid chat messages with :role and :content" - end - - messages.concat(placeholder_value) - else - # Invalid placeholder value - raise ArgumentError, "Placeholder '#{item[:name]}' must be an Array of messages, got #{placeholder_value.class}" - end - elsif item[:type] == MESSAGE_TYPE_CHAT - # Regular message: substitute variables - messages << { - role: item[:role], - content: Mustache.render(item[:content], variables.transform_keys(&:to_s)) - } - end - end - - messages - end - - # Convert to LangChain ChatPromptTemplate format - # - # @param placeholders [Hash] Placeholder resolutions - # @return [Array] Array of messages and MessagesPlaceholder objects - # - # @example - # langchain_messages = prompt.to_langchain( - # placeholders: { examples: [...] } - # ) - def to_langchain(placeholders: {}) - messages = [] - - prompt.each do |item| - if item[:type] == MESSAGE_TYPE_PLACEHOLDER - placeholder_value = placeholders[item[:name].to_sym] || placeholders[item[:name]] - - if placeholder_value.is_a?(Array) && !placeholder_value.empty? - # Resolved placeholder: add messages with transformed variables - placeholder_value.each do |msg| - messages << { - role: msg[:role], - content: transform_to_langchain_variables(msg[:content]) - } - end - else - # Unresolved: convert to LangChain MessagesPlaceholder - messages << ["placeholder", "{#{item[:name]}}"] - end - elsif item[:type] == MESSAGE_TYPE_CHAT - messages << { - role: item[:role], - content: transform_to_langchain_variables(item[:content]) - } - end - end - - messages - end - - def to_json(*args) - { - name: name, - prompt: prompt.map { |item| - if item[:type] == MESSAGE_TYPE_CHAT - item.except(:type) - else - item - end - }, - version: version, - is_fallback: is_fallback, - tags: tags, - labels: labels, - type: type, - config: config - }.to_json(*args) - end - - private - - def normalize_prompt(messages) - # Ensure all messages have a type field - messages.map do |item| - if item[:type] - item # Already has type - else - # Legacy format: add type - { type: MESSAGE_TYPE_CHAT }.merge(item) - end - end - end - - def transform_to_langchain_variables(content) - content.gsub(/\{\{(\w+)\}\}/, '{\1}') - end - - def valid_chat_messages?(messages) - messages.all? { |m| m.is_a?(Hash) && m.key?(:role) && m.key?(:content) } - end - end -end -``` - -**Key Design Decisions:** - -1. **Placeholder Support**: First-class support for dynamic message insertion -2. **Type Normalization**: Handle both legacy and new message formats -3. **Flexible Placeholders**: Accept symbol or string keys for Ruby ergonomics -4. **Array Flattening**: `compile` returns flat array of resolved messages - -### 7. Langfuse::ApiClient Extensions - -**Purpose**: Add HTTP endpoints for prompt operations. - -```ruby -module Langfuse - class ApiClient - # ... existing methods ... - - # Fetch a prompt by name - # - # @param name [String] Prompt name - # @param version [Integer, nil] Specific version - # @param label [String, nil] Label filter - # @param timeout_seconds [Integer, nil] Request timeout - # - # @return [Hash] Prompt response - # @note Retries are handled by Faraday retry middleware (max: 2, interval: 0.5s) - def get_prompt(name, version: nil, label: nil, timeout_seconds: nil) - params = {} - params[:version] = version if version - params[:label] = label if label - - response = connection(timeout: timeout_seconds).get("/api/public/v2/prompts/#{name}") do |req| - req.params = params - end - - handle_response(response) - rescue Faraday::Error => e - raise ApiError, "Failed to fetch prompt '#{name}': #{e.message}" - end - - # Create a new prompt - # - # @param body [Hash] Prompt definition - # @return [Hash] Created prompt response - def create_prompt(body) - response = connection.post("/api/public/v2/prompts") do |req| - req.headers['Content-Type'] = 'application/json' - req.body = body.to_json - end - - handle_response(response) - end - - # Update prompt version labels - # - # @param name [String] Prompt name - # @param version [Integer] Version number - # @param labels [Array] New labels - # - # @return [Hash] Updated prompt - def update_prompt_version(name, version, labels) - response = connection.patch("/api/public/v2/prompts/#{name}/#{version}") do |req| - req.headers['Content-Type'] = 'application/json' - req.body = { labels: labels }.to_json - end - - handle_response(response) - end - - private - - def connection(timeout: nil) - if timeout - # Create dedicated connection for custom timeout - # to avoid mutating shared connection - build_connection(timeout: timeout) - else - @connection ||= build_connection - end - end - - def build_connection(timeout: nil) - Faraday.new( - url: base_url, - headers: { - 'Authorization' => authorization_header, - 'User-Agent' => "langfuse-ruby/#{Langfuse::VERSION}" - } - ) do |conn| - conn.request :retry, max: 2, interval: 0.5 - conn.response :json, content_type: /\bjson$/ - conn.adapter Faraday.default_adapter - conn.options.timeout = timeout if timeout - end - end - - def authorization_header - # Basic Auth: base64(public_key:secret_key) - credentials = "#{@public_key}:#{@secret_key}" - "Basic #{Base64.strict_encode64(credentials)}" - end - - def handle_response(response) - case response.status - when 200..299 - symbolize_keys(response.body) - when 401 - raise UnauthorizedError, "Invalid API credentials" - when 404 - raise NotFoundError, "Prompt not found" - when 429 - raise RateLimitError, "Rate limit exceeded" - else - raise ApiError, "HTTP #{response.status}: #{response.body}" - end - end - - def symbolize_keys(hash) - # Recursively convert string keys to symbols - JSON.parse(hash.to_json, symbolize_names: true) - end - end -end -``` - -**Key Design Decisions:** - -1. **Faraday for HTTP**: Industry standard, flexible middleware -2. **Basic Auth**: Use public_key:secret_key as per Langfuse spec -3. **Automatic Retries**: Built-in exponential backoff for transient errors -4. **Symbol Keys**: Return hashes with symbol keys (Ruby convention) -5. **Custom Exceptions**: Specific errors for different failure modes - ---- - -## API Design - -### Client Initialization - -```ruby -# Option 1: Global configuration (recommended for Rails) -Langfuse.configure do |config| - config.public_key = ENV['LANGFUSE_PUBLIC_KEY'] - config.secret_key = ENV['LANGFUSE_SECRET_KEY'] - config.cache_ttl = 120 # 2 minutes -end - -# Use global singleton client -client = Langfuse.client - -# Option 2: Per-client configuration -config = Langfuse::Config.new do |c| - c.public_key = "pk_..." - c.secret_key = "sk_..." -end -client = Langfuse::Client.new(config) - -# Option 3: Inline hash (backward compatible) -client = Langfuse::Client.new( - public_key: "pk_...", - secret_key: "sk_..." -) -``` - -### Get Prompt (Flattened API) - -```ruby -# Get latest production version -prompt = client.get_prompt("greeting") - -# Get specific version -prompt = client.get_prompt("greeting", version: 2) - -# Get by label -prompt = client.get_prompt("greeting", label: "staging") - -# Disable caching for testing -prompt = client.get_prompt("greeting", cache_ttl: 0) - -# With fallback for resilience (RECOMMENDED) -prompt = client.get_prompt("greeting", - fallback: "Hello {{name}}!", - type: :text -) - -# Chat prompt with fallback -prompt = client.get_prompt("conversation", - type: :chat, - fallback: [ - { role: "system", content: "You are a helpful assistant." }, - { role: "user", content: "{{user_message}}" } - ] -) - -# Get with detailed metadata (debugging/observability) -detail = client.get_prompt_detail("greeting") -# => { -# prompt: TextPromptClient, -# cached: true, -# stale: false, -# version: 3, -# fetch_time_ms: 1.2, -# source: :cache -# } -``` - -### Convenience Method: Compile in One Step - -```ruby -# Get and compile in single call -text = client.compile_prompt("greeting", - variables: { name: "Alice", city: "SF" }, - fallback: "Hello {{name}}!", - type: :text -) -# => "Hello Alice from SF!" - -# Chat prompt compilation -messages = client.compile_prompt("conversation", - variables: { user_name: "Alice" }, - placeholders: { - examples: [ - { role: "user", content: "Hi" }, - { role: "assistant", content: "Hello!" } - ] - }, - type: :chat -) -``` - -### Create Prompt - -```ruby -# Create text prompt -text_prompt = client.create_prompt( - name: "greeting", - prompt: "Hello {{name}} from {{city}}!", - type: :text, - labels: ["production"], - tags: ["customer-facing"], - config: { temperature: 0.7 } -) - -# Create chat prompt -chat_prompt = client.create_prompt( - name: "conversation", - type: :chat, - prompt: [ - { role: "system", content: "You are a helpful assistant." }, - { role: "user", content: "{{user_message}}" } - ], - labels: ["staging"] -) - -# Create chat prompt with placeholders -chat_prompt = client.create_prompt( - name: "rag-pipeline", - type: :chat, - prompt: [ - { role: "system", content: "You are a helpful assistant." }, - { type: "placeholder", name: "examples" }, - { role: "user", content: "{{user_question}}" } - ] -) -``` - -### Update Prompt - -```ruby -# Promote version to production -client.update_prompt( - name: "greeting", - version: 3, - labels: ["production", "stable"] -) - -# Tag for A/B testing -client.update_prompt( - name: "greeting", - version: 4, - labels: ["experiment-a"] -) -``` - -### Two-Step: Get + Compile - -```ruby -# Text prompt compilation -text_prompt = client.get_prompt("greeting", type: :text) -compiled_text = text_prompt.compile( - name: "Alice", - city: "San Francisco" -) -# => "Hello Alice from San Francisco!" - -# Chat prompt compilation -chat_prompt = client.get_prompt("conversation", type: :chat) -compiled_messages = chat_prompt.compile( - { user_name: "Alice" }, - { - examples: [ - { role: "user", content: "What's the weather?" }, - { role: "assistant", content: "Let me check for you." } - ] - } -) -# => [ -# { role: "system", content: "You are a helpful assistant." }, -# { role: "user", content: "What's the weather?" }, -# { role: "assistant", content: "Let me check for you." }, -# { role: "user", content: "Alice's message" } -# ] -``` - -### LangChain Integration - -```ruby -# Text prompt to LangChain -text_prompt = client.prompt.get("greeting", type: :text) -langchain_template = text_prompt.to_langchain -# => "Hello {name} from {city}!" - -# Chat prompt to LangChain -chat_prompt = client.prompt.get("conversation", type: :chat) -langchain_messages = chat_prompt.to_langchain -# => [ -# { role: "system", content: "You are a helpful assistant." }, -# ["placeholder", "{examples}"], -# { role: "user", content: "{user_message}" } -# ] -``` - -### Ruby Idioms - -```ruby -# Graceful error handling with fallback (recommended) -prompt = client.get_prompt("greeting", - fallback: "Hello {{name}}!", - type: :text -) -# Always succeeds - returns fallback on error - -# Or: Traditional exception handling -begin - prompt = client.get_prompt("greeting") -rescue Langfuse::NotFoundError => e - Rails.logger.error("Prompt not found: #{e.message}") - # Handle error -end - -# Rails integration with global client -class AiService - def initialize - @langfuse = Langfuse.client # Global singleton - end - - def generate_greeting(user) - # One-step compile with fallback - text = @langfuse.compile_prompt("greeting", - variables: { name: user.name, city: user.city }, - fallback: "Hello {{name}} from {{city}}!", - type: :text - ) - - # Use with OpenAI, Anthropic, etc. - OpenAI::Client.new.chat( - parameters: { - model: "gpt-4", - messages: [{ role: "user", content: text }] - } - ) - end -end -``` - ---- - -## Caching Strategy - -### Cache Behavior - -The caching system implements **stale-while-revalidate** pattern for optimal performance: - -``` -┌─────────────────────────────────────────────────────────────┐ -│ Cache State Transitions │ -└─────────────────────────────────────────────────────────────┘ - -[MISS] ──fetch──> [FRESH] ──60s──> [EXPIRED] - │ │ - │ │ - │ └──background refresh──> [FRESH] - │ │ - └──────return stale────────┘ - - -State: MISS -- No cache entry exists -- Fetch immediately from API -- Block until response received -- Store in cache with TTL - -State: FRESH -- Cache entry exists and not expired -- Return immediately from cache -- No API call made -- Best performance (<1ms) - -State: EXPIRED -- Cache entry exists but expired -- Return stale cache immediately -- Trigger background refresh (async) -- Next request will use fresh data -- Ensures fast response times -``` - -### Cache Key Generation - -```ruby -# Format: "name-{version|label}:value" - -# Latest production (default) -create_key(name: "greeting") -# => "greeting-label:production" - -# Specific version -create_key(name: "greeting", version: 2) -# => "greeting-version:2" - -# Specific label -create_key(name: "greeting", label: "staging") -# => "greeting-label:staging" -``` - -### TTL Configuration - -```ruby -# Default TTL: 60 seconds -prompt = client.prompt.get("greeting") - -# Custom TTL: 5 minutes -prompt = client.prompt.get("greeting", cache_ttl_seconds: 300) - -# Disable caching -prompt = client.prompt.get("greeting", cache_ttl_seconds: 0) - -# Very long TTL for stable prompts -prompt = client.prompt.get("greeting", cache_ttl_seconds: 3600) -``` - -### Thread Safety - -All cache operations are thread-safe using `Mutex`: - -```ruby -class PromptCache - def initialize - @cache = {} - @mutex = Mutex.new - @refreshing_keys = {} - end - - def get_including_expired(key) - @mutex.synchronize { @cache[key] } - end - - def set(key, value, ttl) - @mutex.synchronize do - @cache[key] = CacheItem.new(value, ttl) - end - end - - def refreshing?(key) - @mutex.synchronize { @refreshing_keys.key?(key) } - end -end -``` - -### Invalidation - -Cache invalidation happens automatically on updates: - -```ruby -# Update prompt labels -client.prompt.update(name: "greeting", version: 3, labels: ["production"]) - -# Cache automatically invalidated -# All keys starting with "greeting" are removed -# Next get("greeting") will fetch fresh data -``` - -### Rails.cache Integration (Phase 2) - -```ruby -# Opt-in to distributed caching -Langfuse.configure do |config| - config.cache_backend = :rails - config.cache_namespace = "langfuse_prompts" -end - -# Implementation -class PromptCache - def initialize(backend: :memory) - @backend = backend - @cache = backend == :rails ? Rails.cache : {} - @mutex = Mutex.new unless backend == :rails - end - - def get_including_expired(key) - if @backend == :rails - Rails.cache.read(cache_key(key)) - else - @mutex.synchronize { @cache[key] } - end - end - - private - - def cache_key(key) - "#{Langfuse.configuration.cache_namespace}:#{key}" - end -end -``` - -**Trade-off: In-memory vs Rails.cache** - -| Aspect | In-memory | Rails.cache | -|--------|-----------|-------------| -| Speed | 0.01ms | 1-10ms (Redis) | -| Shared across processes | No | Yes | -| Memory usage | Per-process | Shared | -| Ideal for | Single-server | Multi-server | -| Default | ✓ Phase 1 | Phase 2 option | - ---- - -## REST API Integration - -### Langfuse API Endpoints - -``` -Base URL: https://cloud.langfuse.com -Authentication: Basic Auth (public_key:secret_key) - -GET /api/public/v2/prompts/{name} - ?version={version} - &label={label} - -POST /api/public/v2/prompts - -PATCH /api/public/v2/prompts/{name}/{version} -``` - -### HTTP Client: Faraday - -**Why Faraday?** - -1. **Industry Standard**: Most popular Ruby HTTP client -2. **Middleware Support**: Easy to add logging, instrumentation -3. **Adapter Agnostic**: Works with Net::HTTP, Patron, HTTPClient -4. **Built-in Retry**: Exponential backoff for transient errors -5. **Already Used**: Likely in Gemfile for other integrations - -**Alternative Considered: HTTParty** - -- Simpler API, but less flexible -- No built-in retry middleware -- Harder to instrument for observability - -### Request/Response Handling - -```ruby -# GET Prompt -GET /api/public/v2/prompts/greeting?label=production - -Response 200 OK: -{ - "name": "greeting", - "version": 3, - "type": "text", - "prompt": "Hello {{name}}!", - "config": { "temperature": 0.7 }, - "labels": ["production"], - "tags": ["customer-facing"], - "createdAt": "2025-01-01T00:00:00Z", - "updatedAt": "2025-01-15T00:00:00Z" -} - -# CREATE Prompt -POST /api/public/v2/prompts -Content-Type: application/json - -{ - "name": "greeting", - "type": "text", - "prompt": "Hello {{name}}!", - "labels": ["staging"], - "config": {} -} - -Response 201 Created: -{ - "name": "greeting", - "version": 1, - ... -} - -# UPDATE Prompt -PATCH /api/public/v2/prompts/greeting/3 -Content-Type: application/json - -{ - "labels": ["production", "stable"] -} - -Response 200 OK: -{ - "name": "greeting", - "version": 3, - "labels": ["production", "stable"], - ... -} -``` - -### Error Handling - -```ruby -module Langfuse - class Error < StandardError; end - class ApiError < Error; end - class UnauthorizedError < ApiError; end - class NotFoundError < ApiError; end - class RateLimitError < ApiError; end - class TimeoutError < ApiError; end - - class ApiClient - def handle_response(response) - case response.status - when 200..299 - symbolize_keys(response.body) - when 401 - raise UnauthorizedError, "Invalid API credentials" - when 404 - raise NotFoundError, "Resource not found: #{response.body}" - when 429 - raise RateLimitError, "Rate limit exceeded. Retry after: #{response.headers['Retry-After']}" - when 500..599 - raise ApiError, "Server error: #{response.status}" - else - raise ApiError, "Unexpected response: #{response.status}" - end - end - end -end -``` - -### Retry Logic - -```ruby -# Faraday retry middleware configuration -connection = Faraday.new do |conn| - conn.request :retry, - max: 2, - interval: 0.5, - interval_randomness: 0.5, - backoff_factor: 2, - retry_statuses: [429, 500, 502, 503, 504], - retry_if: ->(env, exception) { - # Retry on network errors - exception.is_a?(Faraday::TimeoutError) || - exception.is_a?(Faraday::ConnectionFailed) - } -end - -# Exponential backoff: -# Attempt 1: immediate -# Attempt 2: 0.5s + random(0-0.25s) -# Attempt 3: 1.0s + random(0-0.5s) -``` - -### Authentication - -```ruby -# Basic Auth implementation -class ApiClient - def initialize(public_key:, secret_key:, base_url:) - @public_key = public_key - @secret_key = secret_key - @base_url = base_url - end - - private - - def authorization_header - credentials = "#{@public_key}:#{@secret_key}" - "Basic #{Base64.strict_encode64(credentials)}" - end - - def connection - @connection ||= Faraday.new( - url: @base_url, - headers: { - 'Authorization' => authorization_header, - 'User-Agent' => "langfuse-ruby/#{Langfuse::VERSION}", - 'Content-Type' => 'application/json' - } - ) - end -end -``` - -### Timeout Configuration - -```ruby -# Default timeout: 5 seconds -client = Langfuse::Client.new(timeout: 5) - -# Per-request timeout override -prompt = client.prompt.get( - "greeting", - fetch_timeout_ms: 2000 # 2 second timeout -) - -# Implementation -def get_prompt(name, timeout_seconds: nil, **options) - conn = connection.dup - conn.options.timeout = timeout_seconds if timeout_seconds - - response = conn.get("/api/public/v2/prompts/#{name}") do |req| - req.params = options - end - - handle_response(response) -rescue Faraday::TimeoutError => e - raise Langfuse::TimeoutError, "Request timed out after #{timeout_seconds}s" -end -``` - ---- - -## Variable Substitution - -### Templating Engine: Mustache - -**Why Mustache?** - -1. **Logic-less**: Simple, secure, no arbitrary code execution -2. **Cross-language**: Same syntax as JavaScript SDK (consistency) -3. **Ruby Gem**: Well-maintained `mustache` gem available -4. **Familiar**: Widely used in Rails ecosystem - -**Alternative Considered: ERB** - -- Pro: Built into Ruby stdlib -- Con: Allows Ruby code execution (security risk) -- Con: Different syntax than JS SDK (inconsistent) - -### Text Prompt Compilation - -```ruby -# Template -"Hello {{name}} from {{city}}!" - -# Variables -{ name: "Alice", city: "San Francisco" } - -# Compiled -"Hello Alice from San Francisco!" - -# Implementation -def compile(variables = {}) - # Mustache expects string keys - Mustache.render(prompt, variables.transform_keys(&:to_s)) -end -``` - -### Chat Prompt Compilation - -```ruby -# Template -[ - { role: "system", content: "You are helping {{user_name}}." }, - { type: "placeholder", name: "examples" }, - { role: "user", content: "{{user_question}}" } -] - -# Variables + Placeholders -variables = { user_name: "Alice", user_question: "What's the weather?" } -placeholders = { - examples: [ - { role: "user", content: "How are you?" }, - { role: "assistant", content: "I'm great!" } - ] -} - -# Compiled -[ - { role: "system", content: "You are helping Alice." }, - { role: "user", content: "How are you?" }, - { role: "assistant", content: "I'm great!" }, - { role: "user", content: "What's the weather?" } -] -``` - -### Placeholder Resolution - -```ruby -def compile(variables = {}, placeholders = {}) - messages = [] - - prompt.each do |item| - case item[:type] - when MESSAGE_TYPE_PLACEHOLDER - # Resolve placeholder - name = item[:name] - value = placeholders[name.to_sym] || placeholders[name] - - if valid_messages?(value) - # Flatten array of messages - messages.concat(value) - elsif value.nil? - # Keep unresolved for debugging - messages << item - else - # Invalid type: stringify - messages << { role: "system", content: value.to_s } - end - - when MESSAGE_TYPE_CHAT - # Regular message: apply Mustache - messages << { - role: item[:role], - content: Mustache.render(item[:content], variables.transform_keys(&:to_s)) - } - end - end - - messages -end - -def valid_messages?(value) - value.is_a?(Array) && - value.all? { |m| m.is_a?(Hash) && m.key?(:role) && m.key?(:content) } -end -``` - -### Escaping and Security - -```ruby -# Mustache escapes HTML by default -# Disable escaping for plain text -Mustache.escape = ->(text) { text } - -# Or use triple mustache for unescaped -"Hello {{{user_input}}}!" # No escaping - -# For chat prompts, always sanitize user input in variables -def compile(variables = {}, placeholders = {}) - # Sanitize string values to prevent injection and limit payload size - safe_variables = variables.transform_values do |v| - v.is_a?(String) ? sanitize(v) : v - end - - # ... compilation logic -end - -# Sanitize input to prevent control character injection and DoS attacks -# -# @param text [String] Input text to sanitize -# @param max_length [Integer] Maximum allowed length (default: 10,000) -# @return [String] Sanitized text -# -# Rationale for 10,000 char limit: -# - Most LLM prompts are <5K tokens (~20K chars) -# - Prevents memory exhaustion attacks -# - Large enough for legitimate use cases -# - Configurable via parameter if needed -def sanitize(text, max_length: 10_000) - # Remove control characters (null bytes, escape sequences, etc.) - sanitized = text.gsub(/[\x00-\x1F\x7F]/, '') - - # Truncate to prevent DoS - sanitized.length > max_length ? sanitized[0...max_length] : sanitized -end -``` - -### LangChain Variable Transformation - -```ruby -# Langfuse format: {{variable}} -# LangChain format: {variable} - -def transform_to_langchain_variables(content) - # Simple regex replacement - content.gsub(/\{\{(\w+)\}\}/, '{\1}') -end - -# Example -transform_to_langchain_variables("Hello {{name}} from {{city}}!") -# => "Hello {name} from {city}!" -``` - -### Edge Cases - -```ruby -# Empty variables -prompt.compile({}) -# => "Hello {{name}}!" (unchanged) - -# Missing variables -prompt.compile(name: "Alice") -# => "Hello Alice from {{city}}!" - -# Extra variables -prompt.compile(name: "Alice", city: "SF", unused: "value") -# => "Hello Alice from SF!" (unused ignored) - -# Nil values -prompt.compile(name: nil) -# => "Hello from {{city}}!" - -# Nested objects (not supported) -prompt.compile(user: { name: "Alice" }) -# => "Hello {{name}}!" (no nested access) -``` - ---- - -## Implementation Phases - -### Phase 1: Core Functionality (MVP) - -**Goal**: Basic prompt retrieval and caching - -**Scope**: -- `PromptManager#get` with caching -- `TextPromptClient` with `compile` -- `ChatPromptClient` with `compile` -- `PromptCache` (in-memory) -- `ApiClient` extensions for GET /prompts - -**Deliverables**: -- [ ] `Langfuse::PromptManager` class -- [ ] `Langfuse::TextPromptClient` class -- [ ] `Langfuse::ChatPromptClient` class -- [ ] `Langfuse::PromptCache` class -- [ ] `ApiClient#get_prompt` method -- [ ] Basic error handling -- [ ] Unit tests (>90% coverage) -- [ ] Integration tests with VCR -- [ ] Documentation and examples - -**Success Criteria**: -- Can fetch and cache prompts -- Can compile text and chat prompts -- Thread-safe for Rails apps -- <100ms cache hit latency - -**Estimated Effort**: 3-4 days (includes buffer for edge cases and thorough testing) - ---- - -### Phase 2: Advanced Features - -**Goal**: Prompt creation, updates, and advanced caching - -**Scope**: -- `PromptManager#create` -- `PromptManager#update` -- Placeholder support for chat prompts -- Rails.cache backend option -- Cache invalidation on updates - -**Deliverables**: -- [ ] `ApiClient#create_prompt` method -- [ ] `ApiClient#update_prompt_version` method -- [ ] Placeholder compilation in `ChatPromptClient` -- [ ] Rails.cache adapter -- [ ] Configuration for cache backend -- [ ] Additional tests for new features - -**Success Criteria**: -- Can create and update prompts -- Placeholders work correctly -- Rails.cache integration functional -- No breaking changes - -**Estimated Effort**: 2-3 days - ---- - -### Phase 3: LangChain Integration - -**Goal**: Seamless LangChain compatibility - -**Scope**: -- `TextPromptClient#to_langchain` -- `ChatPromptClient#to_langchain` -- LangChain MessagesPlaceholder format -- Variable syntax transformation - -**Deliverables**: -- [ ] LangChain format conversion methods -- [ ] Tests for LangChain compatibility -- [ ] Documentation with LangChain examples - -**Success Criteria**: -- Outputs work with langchain-ruby gem -- Variable syntax correctly transformed -- Placeholders converted to MessagesPlaceholder - -**Estimated Effort**: 1 day - ---- - -### Phase 4: Polish and Optimization - -**Goal**: Production-ready quality - -**Scope**: -- Performance optimization -- Enhanced error messages -- Observability hooks -- Comprehensive documentation - -**Deliverables**: -- [ ] Benchmarks and performance tests -- [ ] Instrumentation for monitoring (StatsD, Datadog) -- [ ] Detailed error messages with remediation hints -- [ ] Complete API documentation -- [ ] Migration guide from manual prompt management - -**Success Criteria**: -- <10ms p95 latency for cache hits -- Comprehensive error messages -- Full documentation coverage - -**Estimated Effort**: 2-3 days (comprehensive observability and documentation) - ---- - -### Total Implementation Timeline - -**Estimated Total**: 8-11 days (2-2.5 weeks with 30% contingency buffer) - -**Phases can be deployed incrementally:** -1. Phase 1 → Beta release for early adopters -2. Phase 2 → Feature-complete release -3. Phase 3 → LangChain integration (optional) -4. Phase 4 → Production-ready v1.0 - ---- - -## Testing Strategy - -### Unit Tests - -**Coverage Target**: >90% - -```ruby -# spec/langfuse/prompt_manager_spec.rb -RSpec.describe Langfuse::PromptManager do - let(:api_client) { instance_double(Langfuse::ApiClient) } - let(:manager) { described_class.new(api_client: api_client) } - - describe "#get" do - context "with cache miss" do - it "fetches from API and caches result" do - allow(api_client).to receive(:get_prompt).and_return(prompt_response) - - prompt = manager.get("greeting") - - expect(prompt).to be_a(Langfuse::TextPromptClient) - expect(prompt.name).to eq("greeting") - expect(api_client).to have_received(:get_prompt).once - end - end - - context "with cache hit" do - it "returns cached prompt without API call" do - manager.get("greeting") # Prime cache - - prompt = manager.get("greeting") - - expect(api_client).to have_received(:get_prompt).once # Only first call - end - end - - context "with expired cache" do - it "returns stale cache and refreshes in background" do - # Test stale-while-revalidate - end - end - - context "with fallback" do - it "returns fallback on API error" do - allow(api_client).to receive(:get_prompt).and_raise(Langfuse::ApiError) - - prompt = manager.get("greeting", fallback: "Hello!", type: :text) - - expect(prompt.is_fallback).to be true - expect(prompt.prompt).to eq("Hello!") - end - end - end - - describe "#create" do - it "creates text prompt and returns client" do - allow(api_client).to receive(:create_prompt).and_return(created_response) - - prompt = manager.create( - name: "greeting", - prompt: "Hello {{name}}!", - type: :text - ) - - expect(prompt).to be_a(Langfuse::TextPromptClient) - end - end - - describe "#update" do - it "updates labels and invalidates cache" do - manager.get("greeting") # Prime cache - - manager.update(name: "greeting", version: 1, labels: ["production"]) - - # Cache should be invalidated - expect(manager.cache.get_including_expired("greeting-label:production")).to be_nil - end - end -end - -# spec/langfuse/text_prompt_client_spec.rb -RSpec.describe Langfuse::TextPromptClient do - let(:response) do - { - name: "greeting", - version: 1, - type: "text", - prompt: "Hello {{name}} from {{city}}!", - config: {}, - labels: ["production"], - tags: [] - } - end - let(:client) { described_class.new(response) } - - describe "#compile" do - it "substitutes variables" do - result = client.compile(name: "Alice", city: "SF") - expect(result).to eq("Hello Alice from SF!") - end - - it "handles missing variables" do - result = client.compile(name: "Alice") - expect(result).to eq("Hello Alice from {{city}}!") - end - - it "accepts string keys" do - result = client.compile("name" => "Alice", "city" => "SF") - expect(result).to eq("Hello Alice from SF!") - end - end - - describe "#to_langchain" do - it "transforms mustache to langchain syntax" do - result = client.to_langchain - expect(result).to eq("Hello {name} from {city}!") - end - end - - describe "#to_json" do - it "serializes to JSON" do - json = JSON.parse(client.to_json) - expect(json["name"]).to eq("greeting") - expect(json["type"]).to eq("text") - end - end -end - -# spec/langfuse/chat_prompt_client_spec.rb -RSpec.describe Langfuse::ChatPromptClient do - let(:response) do - { - name: "conversation", - version: 1, - type: "chat", - prompt: [ - { type: "chatmessage", role: "system", content: "You are {{role}}." }, - { type: "placeholder", name: "examples" }, - { type: "chatmessage", role: "user", content: "{{question}}" } - ], - config: {}, - labels: [], - tags: [] - } - end - let(:client) { described_class.new(response) } - - describe "#compile" do - it "substitutes variables and resolves placeholders" do - result = client.compile( - { role: "a helper", question: "What?" }, - { - examples: [ - { role: "user", content: "Hi" }, - { role: "assistant", content: "Hello!" } - ] - } - ) - - expect(result).to eq([ - { role: "system", content: "You are a helper." }, - { role: "user", content: "Hi" }, - { role: "assistant", content: "Hello!" }, - { role: "user", content: "What?" } - ]) - end - - it "keeps unresolved placeholders" do - result = client.compile({ role: "a helper", question: "What?" }) - - expect(result[1]).to eq({ type: "placeholder", name: "examples" }) - end - end - - describe "#to_langchain" do - it "converts to langchain format" do - result = client.to_langchain - - expect(result).to eq([ - { role: "system", content: "You are {role}." }, - ["placeholder", "{examples}"], - { role: "user", content: "{question}" } - ]) - end - end -end - -# spec/langfuse/prompt_cache_spec.rb -RSpec.describe Langfuse::PromptCache do - let(:cache) { described_class.new } - let(:prompt) { instance_double(Langfuse::TextPromptClient) } - - describe "#set and #get_including_expired" do - it "stores and retrieves values" do - cache.set("key", prompt, 60) - - item = cache.get_including_expired("key") - - expect(item.value).to eq(prompt) - expect(item.expired?).to be false - end - - it "marks items as expired after TTL" do - cache.set("key", prompt, 0) # Immediate expiry - - sleep 0.01 - item = cache.get_including_expired("key") - - expect(item.expired?).to be true - expect(item.value).to eq(prompt) # Still returns value - end - end - - describe "#create_key" do - it "generates key with default label" do - key = cache.create_key(name: "greeting") - expect(key).to eq("greeting-label:production") - end - - it "generates key with version" do - key = cache.create_key(name: "greeting", version: 2) - expect(key).to eq("greeting-version:2") - end - - it "generates key with custom label" do - key = cache.create_key(name: "greeting", label: "staging") - expect(key).to eq("greeting-label:staging") - end - end - - describe "#invalidate" do - it "removes all keys for prompt name" do - cache.set("greeting-label:production", prompt, 60) - cache.set("greeting-version:2", prompt, 60) - cache.set("other-label:production", prompt, 60) - - cache.invalidate("greeting") - - expect(cache.get_including_expired("greeting-label:production")).to be_nil - expect(cache.get_including_expired("greeting-version:2")).to be_nil - expect(cache.get_including_expired("other-label:production")).not_to be_nil - end - end - - describe "thread safety" do - it "handles concurrent access" do - threads = 10.times.map do - Thread.new do - 100.times { |i| cache.set("key-#{i}", prompt, 60) } - end - end - - threads.each(&:join) - - # Should not raise or corrupt data - expect(cache.get_including_expired("key-0")).not_to be_nil - end - end - - describe "cache stampede protection" do - it "prevents duplicate background refreshes" do - # Simulate 100 threads hitting expired cache simultaneously - expired_key = "expired-prompt" - cache.set(expired_key, prompt, 0) # Immediately expired - sleep 0.01 - - refresh_count = Concurrent::AtomicFixnum.new(0) - allow(manager).to receive(:fetch_and_cache) do - refresh_count.increment - end - - threads = 100.times.map do - Thread.new { manager.get("expired-prompt") } - end - - threads.each(&:join) - - # Should only trigger 1 background refresh, not 100 - expect(refresh_count.value).to eq(1) - end - end - - describe "cache expiry edge cases" do - it "handles expiry exactly at read time" do - cache.set("key", prompt, 0.1) # 100ms TTL - sleep 0.1 # Expire exactly now - - item = cache.get_including_expired("key") - expect(item).not_to be_nil - expect(item.expired?).to be true - end - end - - describe "LRU eviction" do - it "evicts least recently used when at capacity" do - small_cache = described_class.new(max_size: 3) - - # Fill cache - small_cache.set("key1", prompt, 60) - small_cache.set("key2", prompt, 60) - small_cache.set("key3", prompt, 60) - - # Access key1 to make it recently used - small_cache.get_including_expired("key1") - - # Add key4 - should evict key2 (LRU) - small_cache.set("key4", prompt, 60) - - expect(small_cache.get_including_expired("key1")).not_to be_nil - expect(small_cache.get_including_expired("key2")).to be_nil - expect(small_cache.get_including_expired("key3")).not_to be_nil - expect(small_cache.get_including_expired("key4")).not_to be_nil - end - end -end -``` - -### Integration Tests with VCR - -```ruby -# spec/integration/prompt_manager_integration_spec.rb -RSpec.describe "Prompt Manager Integration", vcr: true do - let(:client) do - Langfuse::Client.new( - public_key: ENV["LANGFUSE_PUBLIC_KEY"], - secret_key: ENV["LANGFUSE_SECRET_KEY"], - base_url: "https://cloud.langfuse.com" - ) - end - - describe "fetching prompts" do - it "retrieves text prompt from API", vcr: { cassette_name: "get_text_prompt" } do - prompt = client.prompt.get("greeting") - - expect(prompt).to be_a(Langfuse::TextPromptClient) - expect(prompt.name).to eq("greeting") - expect(prompt.version).to be > 0 - end - - it "retrieves chat prompt from API", vcr: { cassette_name: "get_chat_prompt" } do - prompt = client.prompt.get("conversation", type: :chat) - - expect(prompt).to be_a(Langfuse::ChatPromptClient) - expect(prompt.prompt).to be_an(Array) - end - end - - describe "creating prompts" do - it "creates new text prompt", vcr: { cassette_name: "create_text_prompt" } do - prompt = client.prompt.create( - name: "test-#{SecureRandom.hex(4)}", - prompt: "Test {{variable}}", - type: :text - ) - - expect(prompt.version).to eq(1) - end - end - - describe "updating prompts" do - it "updates prompt labels", vcr: { cassette_name: "update_prompt" } do - result = client.prompt.update( - name: "greeting", - version: 1, - labels: ["test"] - ) - - expect(result[:labels]).to include("test") - end - end -end -``` - -### Performance Tests - -```ruby -# spec/performance/caching_performance_spec.rb -RSpec.describe "Caching Performance" do - let(:manager) { Langfuse::PromptManager.new(api_client: api_client) } - let(:api_client) { instance_double(Langfuse::ApiClient) } - - before do - allow(api_client).to receive(:get_prompt).and_return(prompt_response) - end - - it "cache hits are <1ms" do - manager.get("greeting") # Prime cache - - time = Benchmark.realtime do - 100.times { manager.get("greeting") } - end - - avg_time = (time / 100) * 1000 # Convert to ms - expect(avg_time).to be < 1 - end - - it "handles 1000 concurrent requests" do - threads = 1000.times.map do - Thread.new { manager.get("greeting") } - end - - expect { threads.each(&:join) }.not_to raise_error - end -end -``` - -### Test Coverage Requirements - -| Component | Coverage Target | -|-----------|----------------| -| PromptManager | >95% | -| TextPromptClient | >95% | -| ChatPromptClient | >95% | -| PromptCache | >95% | -| ApiClient | >90% | -| **Overall** | **>90%** | - ---- - -## Dependencies - -### Required Gems - -```ruby -# langfuse-ruby.gemspec -Gem::Specification.new do |spec| - spec.name = "langfuse-ruby" - spec.version = "0.2.0" - spec.authors = ["Langfuse"] - spec.summary = "Ruby SDK for Langfuse" - - # Runtime dependencies - spec.add_dependency "faraday", "~> 2.0" - spec.add_dependency "faraday-retry", "~> 2.0" - spec.add_dependency "mustache", "~> 1.1" - spec.add_dependency "concurrent-ruby", "~> 1.2" - - # Development dependencies - spec.add_development_dependency "rspec", "~> 3.12" - spec.add_development_dependency "vcr", "~> 6.1" - spec.add_development_dependency "webmock", "~> 3.18" - spec.add_development_dependency "rubocop", "~> 1.50" - spec.add_development_dependency "simplecov", "~> 0.22" -end -``` - -### Dependency Justification - -| Gem | Purpose | Why? | -|-----|---------|------| -| `faraday` | HTTP client | Industry standard, flexible, middleware support | -| `faraday-retry` | Retry logic | Exponential backoff, transient error handling | -| `mustache` | Templating | Logic-less, same as JS SDK, security | -| `concurrent-ruby` | Thread pool | Bounded concurrency for background refreshes, prevents thread exhaustion | -| `rspec` | Testing | Ruby standard, readable syntax | -| `vcr` | HTTP recording | Record real API responses for tests | -| `webmock` | HTTP stubbing | Mock HTTP for isolated tests | -| `rubocop` | Linting | Code quality, style enforcement | -| `simplecov` | Coverage | Track test coverage metrics | - -### Optional Dependencies - -```ruby -# Optional: Rails integration for testing -spec.add_development_dependency "rails", ">= 6.0" if ENV["RAILS_VERSION"] -``` - -### Version Constraints - -- **Ruby**: >= 2.7 (modern syntax, better performance) -- **Faraday**: ~> 2.0 (latest stable, HTTP/2 support) -- **Mustache**: ~> 1.1 (last updated 2016, but stable and widely used; logic-less design means few updates needed) -- **Concurrent-Ruby**: ~> 1.2 (actively maintained, production-ready thread primitives) - -### Gemfile.lock Considerations - -- Pin exact versions in CI for reproducibility -- Use pessimistic versioning (`~>`) for flexibility -- Test against multiple Ruby versions (2.7, 3.0, 3.1, 3.2) - ---- - -## Code Examples - -### Basic Usage - -```ruby -require "langfuse" - -# Configure globally -Langfuse.configure do |config| - config.public_key = ENV["LANGFUSE_PUBLIC_KEY"] - config.secret_key = ENV["LANGFUSE_SECRET_KEY"] -end - -# Get global client -client = Langfuse.client - -# Get a text prompt (two-step) -prompt = client.get_prompt("greeting") -compiled = prompt.compile(name: "Alice", city: "San Francisco") -puts compiled -# => "Hello Alice from San Francisco!" - -# Or: One-step convenience method -text = client.compile_prompt("greeting", - variables: { name: "Alice", city: "San Francisco" } -) -puts text - -# Get a chat prompt -chat_prompt = client.get_prompt("conversation", type: :chat) -messages = chat_prompt.compile( - { user_name: "Alice" }, - { - history: [ - { role: "user", content: "Hi!" }, - { role: "assistant", content: "Hello!" } - ] - } -) -``` - -### Rails Integration - -```ruby -# config/initializers/langfuse.rb -Langfuse.configure do |config| - config.public_key = ENV["LANGFUSE_PUBLIC_KEY"] - config.secret_key = ENV["LANGFUSE_SECRET_KEY"] - config.base_url = ENV.fetch("LANGFUSE_BASE_URL", "https://cloud.langfuse.com") - config.cache_ttl = 120 # 2 minutes - config.logger = Rails.logger -end - -# app/services/ai_greeting_service.rb -class AiGreetingService - def initialize - @langfuse = Langfuse.client # Global singleton - end - - def generate_greeting(user) - # Fetch and compile in one step with fallback - compiled = @langfuse.compile_prompt("user-greeting", - variables: { - name: user.name, - city: user.city, - subscription: user.subscription_tier - }, - fallback: "Hello {{name}}!", - type: :text - ) - - # Get prompt config for temperature - prompt = @langfuse.get_prompt("user-greeting") - temperature = prompt.config[:temperature] || 0.7 - - # Call OpenAI - response = openai_client.chat( - parameters: { - model: "gpt-4", - messages: [{ role: "user", content: compiled }], - temperature: temperature - } - ) - - # Trace with Langfuse - @langfuse.trace(name: "greeting-generation") do |trace| - trace.generation( - name: "openai-call", - input: compiled, - output: response.dig("choices", 0, "message", "content"), - model: "gpt-4", - metadata: { user_id: user.id } - ) - end - - response.dig("choices", 0, "message", "content") - end - - private - - def openai_client - @openai_client ||= OpenAI::Client.new(access_token: ENV["OPENAI_API_KEY"]) - end -end -``` - -### Chat Prompt with Placeholders - -```ruby -# Create a RAG prompt with placeholders -client.create_prompt( - name: "rag-qa", - type: :chat, - prompt: [ - { - role: "system", - content: "You are a helpful assistant. Use the context to answer questions." - }, - { - type: "placeholder", - name: "context_documents" - }, - { - role: "user", - content: "{{user_question}}" - } - ], - labels: ["production"] -) - -# Later: compile with dynamic context (two-step) -prompt = client.get_prompt("rag-qa", type: :chat) -messages = prompt.compile( - { user_question: "What is the capital of France?" }, - { - context_documents: [ - { role: "system", content: "Context: France is a country in Europe." }, - { role: "system", content: "Context: Paris is the capital of France." } - ] - } -) - -# Or: compile in one step -messages = client.compile_prompt("rag-qa", - variables: { user_question: "What is the capital of France?" }, - placeholders: { - context_documents: [ - { role: "system", content: "Context: France is a country in Europe." }, - { role: "system", content: "Context: Paris is the capital of France." } - ] - }, - type: :chat -) - -# Result: -# [ -# { role: "system", content: "You are a helpful assistant..." }, -# { role: "system", content: "Context: France is a country..." }, -# { role: "system", content: "Context: Paris is the capital..." }, -# { role: "user", content: "What is the capital of France?" } -# ] -``` - -### Error Handling - -```ruby -# Graceful degradation with fallback (RECOMMENDED) -# Never raises - returns fallback on any error -prompt = client.get_prompt("greeting", - fallback: "Hello {{name}}!", - type: :text -) - -# Traditional exception handling -begin - prompt = client.get_prompt("greeting") -rescue Langfuse::NotFoundError => e - Rails.logger.error("Prompt not found: #{e.message}") - # Fallback logic here -rescue Langfuse::ApiError => e - Rails.logger.error("Langfuse API error: #{e.message}") - # Handle error -end - -# Retry with exponential backoff -require "retryable" - -Retryable.retryable( - tries: 3, - on: [Langfuse::TimeoutError, Langfuse::RateLimitError], - sleep: ->(n) { 2**n } # 2s, 4s, 8s -) do - prompt = client.get_prompt("greeting") -end -``` - -### Testing with Mocks - -```ruby -# spec/services/ai_greeting_service_spec.rb -RSpec.describe AiGreetingService do - let(:langfuse_client) { instance_double(Langfuse::Client) } - let(:prompt) do - instance_double( - Langfuse::TextPromptClient, - compile: "Hello Alice from SF!", - config: { temperature: 0.7 } - ) - end - - before do - # Mock global client - allow(Langfuse).to receive(:client).and_return(langfuse_client) - - # Mock compile_prompt for one-step usage - allow(langfuse_client).to receive(:compile_prompt) - .and_return("Hello Alice from SF!") - - # Mock get_prompt for two-step usage - allow(langfuse_client).to receive(:get_prompt).and_return(prompt) - end - - it "generates personalized greeting" do - service = described_class.new - user = create(:user, name: "Alice", city: "SF") - - greeting = service.generate_greeting(user) - - expect(greeting).to be_present - expect(langfuse_client).to have_received(:compile_prompt) - .with("user-greeting", hash_including(variables: hash_including(name: "Alice"))) - end -end -``` - -### LangChain Integration - -```ruby -require "langchain" - -# Fetch prompt from Langfuse -prompt = client.prompt.get("greeting", type: :text) - -# Convert to LangChain format -langchain_template = prompt.to_langchain -# => "Hello {name} from {city}!" - -# Use with LangChain -llm = Langchain::LLM::OpenAI.new(api_key: ENV["OPENAI_API_KEY"]) -prompt_template = Langchain::Prompt::PromptTemplate.new( - template: langchain_template, - input_variables: ["name", "city"] -) - -result = llm.complete( - prompt: prompt_template.format(name: "Alice", city: "SF") -) - -# Chat prompts with LangChain -chat_prompt = client.prompt.get("conversation", type: :chat) -langchain_messages = chat_prompt.to_langchain( - placeholders: { - history: [ - { role: "user", content: "Hi" }, - { role: "assistant", content: "Hello!" } - ] - } -) - -# Use with ChatOpenAI -chat_model = Langchain::LLM::OpenAIChat.new(api_key: ENV["OPENAI_API_KEY"]) -response = chat_model.chat(messages: langchain_messages) -``` - ---- - -## Migration Strategy - -### For Existing Langfuse Users - -**Before (Manual Prompt Management)**: - -```ruby -# Hardcoded prompts -def greeting_prompt(user) - "Hello #{user.name}! Welcome to our service." -end - -# Or: stored in database -class Prompt < ApplicationRecord - def compile(variables) - content.gsub(/\{\{(\w+)\}\}/) { variables[$1.to_sym] } - end -end -``` - -**After (Langfuse Prompt Management)**: - -```ruby -# Centralized in Langfuse -def greeting_prompt(user) - # Option 1: Two-step - prompt = @langfuse.get_prompt("user-greeting") - prompt.compile(name: user.name, tier: user.tier) - - # Option 2: One-step with fallback - @langfuse.compile_prompt("user-greeting", - variables: { name: user.name, tier: user.tier }, - fallback: "Hello {{name}}!", - type: :text - ) -end -``` - -### Migration Steps - -1. **Install Updated Gem**: - ```ruby - # Gemfile - gem "langfuse-ruby", "~> 0.2.0" - ``` - -2. **Add Global Configuration**: - ```ruby - # config/initializers/langfuse.rb - Langfuse.configure do |config| - config.public_key = ENV['LANGFUSE_PUBLIC_KEY'] - config.secret_key = ENV['LANGFUSE_SECRET_KEY'] - config.cache_ttl = 120 - config.logger = Rails.logger - end - ``` - -3. **Create Prompts in Langfuse**: - ```ruby - # scripts/migrate_prompts.rb - client = Langfuse.client - - # Migrate each hardcoded prompt - client.create_prompt( - name: "user-greeting", - prompt: "Hello {{name}}! Welcome to {{tier}} tier.", - type: :text, - labels: ["production"] - ) - ``` - -4. **Update Application Code**: - ```ruby - # Before - - greeting = "Hello #{user.name}!" - - # After (two-step) - + prompt = Langfuse.client.get_prompt("user-greeting") - + greeting = prompt.compile(name: user.name) - - # Or after (one-step with fallback) - + greeting = Langfuse.client.compile_prompt("user-greeting", - + variables: { name: user.name }, - + fallback: "Hello {{name}}!", - + type: :text - + ) - ``` - -5. **Test with Fallbacks**: - ```ruby - # Safe rollout with fallback - prompt = Langfuse.client.get_prompt("user-greeting", - fallback: "Hello {{name}}!", # Old hardcoded version - type: :text - ) - ``` - -6. **Monitor and Iterate**: - - Check Langfuse dashboard for prompt usage - - A/B test new prompt versions - - Update prompts without code deployment - -### Backward Compatibility - -**No breaking changes to existing API**: - -```ruby -# Existing tracing still works -client = Langfuse.client -client.trace(name: "my-trace") do |trace| - trace.generation(name: "llm-call", input: "test") -end - -# NEW prompt management is additive (flattened API) -client.get_prompt("greeting") -client.compile_prompt("greeting", variables: { name: "Alice" }) -``` - -### Rollback Plan - -If issues arise: - -1. **Use Fallbacks**: All prompts have fallback option -2. **Disable Caching**: Set `cache_ttl_seconds: 0` -3. **Revert to Hardcoded**: Fallback to old prompt logic -4. **Pin Old Version**: `gem "langfuse-ruby", "~> 0.1.4"` - ---- - -## Trade-offs and Alternatives - -### Design Decision Matrix - -| Decision | Chosen Approach | Alternative | Trade-off | -|----------|----------------|-------------|-----------| -| **Templating** | Mustache | ERB | Security vs. convenience | -| **HTTP Client** | Faraday | HTTParty | Flexibility vs. simplicity | -| **Caching** | In-memory | Rails.cache | Speed vs. distribution | -| **Thread Safety** | Mutex | Thread-local | Simplicity vs. performance | -| **API Style** | Keyword args | Positional | Readability vs. brevity | - -### 1. Templating: Mustache vs. ERB - -**Chosen: Mustache** - -- **Pro**: Logic-less, secure, cross-SDK consistency -- **Pro**: No arbitrary code execution risk -- **Con**: No conditionals or loops (must be done in code) - -**Alternative: ERB** - -- **Pro**: Built-in, no dependency -- **Pro**: Full Ruby power (conditionals, loops) -- **Con**: Security risk (code injection) -- **Con**: Different from JS SDK - -**Decision Rationale**: Security and consistency outweigh convenience. - -### 2. Caching: In-memory vs. Rails.cache - -**Chosen: In-memory (Phase 1), Rails.cache optional (Phase 2)** - -**In-memory**: -- **Pro**: Extremely fast (<1ms) -- **Pro**: No external dependencies -- **Con**: Not shared across processes -- **Con**: Higher memory per-process - -**Rails.cache (Redis)**: -- **Pro**: Shared across all processes/servers -- **Pro**: Centralized cache management -- **Con**: 10-100x slower than in-memory -- **Con**: Requires Redis dependency - -**Decision Rationale**: Start with simplest approach, add distribution as opt-in. - -### 3. Thread Safety: Mutex vs. Thread-local - -**Chosen: Mutex** - -- **Pro**: Simple, proven approach -- **Pro**: Shared cache across threads -- **Con**: Lock contention under high load - -**Alternative: Thread-local Storage** - -- **Pro**: No locking, faster -- **Con**: Duplicated cache per thread -- **Con**: Higher memory usage - -**Decision Rationale**: Rails apps typically have limited threads per process, mutex overhead acceptable. - -### 4. Async Refresh: Threads vs. Fibers - -**Chosen: Threads (Phase 1), consider Fibers (Phase 2)** - -**Threads**: -- **Pro**: Built-in, familiar -- **Con**: Heavier weight - -**Fibers**: -- **Pro**: Lightweight concurrency -- **Pro**: Better for high-concurrency scenarios -- **Con**: Requires Ruby 3.0+ -- **Con**: Less familiar to developers - -**Decision Rationale**: Threads are sufficient for MVP, evaluate Fibers based on real-world performance. - -### 5. API Style: Keyword Args vs. Options Hash - -**Chosen: Keyword Arguments** - -```ruby -# Keyword args (chosen) -prompt.get("name", version: 2, label: "production") - -# Options hash (alternative) -prompt.get("name", { version: 2, label: "production" }) -``` - -**Rationale**: Keyword args provide better IDE autocomplete and explicit API. - ---- - -## Open Questions - -### 1. Rails.cache Integration Priority - -**Question**: Should Rails.cache integration be Phase 1 or Phase 2? - -**Options**: -- **A**: Phase 1 - Implement both backends from start -- **B**: Phase 2 - Start simple with in-memory - -**Recommendation**: **Phase 2** -- Rationale: In-memory is sufficient for most use cases, easier to test, faster to ship MVP - -### 2. Async Background Refresh - -**Question**: How to implement background refresh in stale-while-revalidate? - -**Options**: -- **A**: Simple threads (`Thread.new { ... }`) -- **B**: Sidekiq jobs (requires Sidekiq dependency) -- **C**: Fibers (requires Ruby 3.0+) - -**Recommendation**: **Simple threads** -- Rationale: No additional dependencies, sufficient for prompt refresh use case - -### 3. Prompt Validation - -**Question**: Should we validate prompt structure before sending to API? - -**Options**: -- **A**: Client-side validation (check required fields) -- **B**: Rely on API validation (simpler) - -**Recommendation**: **API validation** -- Rationale: Avoid duplicating server logic, API is source of truth - -### 4. LangChain Dependency - -**Question**: Should we depend on `langchain-ruby` gem for `to_langchain` methods? - -**Options**: -- **A**: Hard dependency (import LangChain types) -- **B**: Soft dependency (return plain Ruby hashes) -- **C**: Optional dependency (only load if available) - -**Recommendation**: **Soft dependency** -- Rationale: Return plain hashes that work with LangChain without requiring the gem - -### 5. Observability Hooks - -**Question**: What observability should be built-in? - -**Options**: -- **A**: Logging only (simple) -- **B**: StatsD metrics (cache hits, API latency) -- **C**: OpenTelemetry traces (full observability) - -**Recommendation**: **Logging + StatsD hooks** -- Rationale: Logging is essential, StatsD is common in Rails, OpenTelemetry can be Phase 3 - -### 6. Configuration Pattern - -**Question**: Global config vs. per-client config? - -**Options**: -- **A**: Global: `Langfuse.configure { |c| ... }` -- **B**: Per-client: `Langfuse::Client.new(config)` -- **C**: Both (global defaults, per-client overrides) - -**Recommendation**: **Both** -- Rationale: Global config for Rails initializer, per-client for multi-tenant apps - -```ruby -# Global config -Langfuse.configure do |config| - config.public_key = ENV["LANGFUSE_PUBLIC_KEY"] - config.secret_key = ENV["LANGFUSE_SECRET_KEY"] - config.cache_backend = :rails -end - -# Per-client override -client = Langfuse::Client.new( - public_key: tenant.langfuse_key, - cache_backend: :memory -) -``` - -### 7. Prompt Versioning Strategy - -**Question**: How to handle version conflicts between cache and API? - -**Scenario**: Prompt version 1 is cached, version 2 is promoted to production - -**Options**: -- **A**: Cache by label (current approach - auto-updates) -- **B**: Cache by version (explicit, never changes) -- **C**: Configurable cache key strategy - -**Recommendation**: **Cache by label (default), support version caching** -- Rationale: Labels enable dynamic updates, versions for stability - ---- - -## Appendix: ASCII Diagrams - -### Caching Flow Diagram - -``` -┌─────────────────────────────────────────────────────────────┐ -│ get("greeting") │ -└──────────────────────┬──────────────────────────────────────┘ - │ - ▼ - ┌─────────────────────────┐ - │ Check cache for key │ - │ "greeting-label:prod" │ - └─────────┬───────────────┘ - │ - ┌─────────┴─────────┐ - │ │ - ▼ ▼ - ┌────────┐ ┌─────────┐ - │ MISS │ │ HIT │ - └────┬───┘ └────┬────┘ - │ │ - │ ┌─────┴──────┐ - │ │ │ - │ ▼ ▼ - │ ┌────────┐ ┌──────────┐ - │ │ FRESH │ │ EXPIRED │ - │ └───┬────┘ └────┬─────┘ - │ │ │ - │ ▼ ▼ - │ ┌──────────┐ ┌────────────────┐ - │ │ Return │ │ Return stale + │ - │ │ cached │ │ refresh async │ - │ └──────────┘ └────────────────┘ - │ - ▼ - ┌────────────────┐ - │ Fetch from API │ - └────────┬───────┘ - │ - ┌─────┴──────┐ - │ │ - ▼ ▼ - ┌─────────┐ ┌──────────┐ - │ SUCCESS │ │ ERROR │ - └────┬────┘ └────┬─────┘ - │ │ - │ ┌─────┴──────┐ - │ │ │ - │ ▼ ▼ - │ ┌─────────┐ ┌─────────┐ - │ │Fallback?│ │ Raise │ - │ └────┬────┘ └─────────┘ - │ │ - │ ▼ - │ ┌──────────────┐ - │ │Return fallback│ - │ └──────────────┘ - │ - ▼ - ┌──────────────┐ - │ Store cache │ - └──────┬───────┘ - │ - ▼ - ┌──────────────┐ - │Return prompt │ - └──────────────┘ -``` - -### Class Hierarchy - -``` -Langfuse::Client -│ -├── prompt: PromptManager -│ │ -│ ├── cache: PromptCache -│ │ ├── CacheItem (value, expiry) -│ │ └── Methods: get, set, invalidate -│ │ -│ ├── api_client: ApiClient -│ │ └── Methods: get_prompt, create_prompt, update_prompt -│ │ -│ └── Methods: get, create, update -│ -└── (existing methods: trace, generation, etc.) - - -Prompt Client Hierarchy: - -BasePromptClient (abstract) -│ -├── TextPromptClient -│ ├── compile(variables) -│ └── to_langchain() -│ -└── ChatPromptClient - ├── compile(variables, placeholders) - └── to_langchain(placeholders:) -``` - -### Request Flow - -``` -Application Code - │ - │ client.prompt.get("greeting") - ▼ -PromptManager - │ - │ 1. Check cache - ▼ -PromptCache - │ - ├──[MISS]──────────┐ - │ │ - │ ▼ - │ ApiClient - │ │ - │ │ 2. GET /api/v2/prompts/greeting - │ ▼ - │ Langfuse API - │ │ - │ │ 3. Response: { name, version, prompt, ... } - │ ▼ - │ ApiClient - │ │ - │ │ 4. Parse response - │ ▼ - │ PromptManager - │ │ - │ │ 5. Build client (Text/Chat) - │ ▼ - │ TextPromptClient / ChatPromptClient - │ │ - │ │ 6. Store in cache - │ ▼ - │ PromptCache - │ - │ [HIT: FRESH] - ├──────────────────┐ - │ │ - │ │ 7. Return from cache - │ ▼ - │ TextPromptClient / ChatPromptClient - │ - │ [HIT: EXPIRED] - └──────────────────┐ - │ - │ 8a. Return stale - ▼ - TextPromptClient / ChatPromptClient - │ - │ 8b. Background refresh - ▼ - (Async: steps 2-6) -``` - ---- - -## Design Revisions and Improvements - -This section documents critical fixes and improvements made to the design based on technical review. - -### Critical Fixes Applied - -1. **Non-blocking Background Refresh** (Lines 389-400) - - **Issue**: `promise.join` blocked calling thread, defeating stale-while-revalidate - - **Fix**: Cleanup happens in separate thread to maintain non-blocking behavior - - **Impact**: Ensures fast response times even with expired cache - -2. **Connection Singleton Bug** (Lines 816-831) - - **Issue**: Mutable timeout on shared connection affected all subsequent requests - - **Fix**: Create dedicated connection instances for custom timeouts - - **Impact**: Prevents timeout configuration from leaking between requests - -3. **Removed Double Retry Logic** (Lines 771-783) - - **Issue**: Manual retries + Faraday middleware = up to 4 retries instead of 2 - - **Fix**: Rely solely on Faraday retry middleware - - **Impact**: Predictable retry behavior, cleaner code - -4. **Cache Stampede Protection** (Lines 379-387) - - **Issue**: 1000 concurrent requests on expired cache = 1000 API calls - - **Fix**: Track refreshing keys, only first requester triggers refresh - - **Impact**: Prevents API rate limit exhaustion - -5. **Bounded Thread Pool** (Lines 514-557) - - **Issue**: Unbounded thread creation during cache refreshes - - **Fix**: Use `concurrent-ruby` FixedThreadPool (max 5 threads) - - **Impact**: Prevents thread exhaustion, controls concurrent API calls - -### Important Enhancements - -6. **LRU Cache Eviction** (Lines 362-378, 417-434) - - **Addition**: Max cache size (1000 entries) with LRU eviction policy - - **Benefit**: Prevents unbounded memory growth - - **Implementation**: Track access order, evict least recently used - -7. **Fallback Type Validation** (Lines 281-298) - - **Addition**: Validate fallback matches specified type (text/chat) - - **Benefit**: Prevent runtime errors from type mismatches - - **Example**: Reject text fallback when type is :chat - -8. **Cache Invalidation Safety** (Lines 268-276) - - **Issue**: Cache invalidated even on failed API updates - - **Fix**: Only invalidate after successful API response - - **Impact**: Prevents serving stale data after failed updates - -9. **Placeholder Validation** (Lines 588-630) - - **Addition**: Validate placeholder structure, handle empty arrays - - **Addition**: Support required_placeholders parameter - - **Benefit**: Better error messages, fail fast on invalid data - -10. **Security Documentation** (Lines 1527-1545) - - **Addition**: Document sanitization rationale (DoS prevention) - - **Addition**: Configurable max_length parameter - - **Benefit**: Clear security posture, flexible limits - -### Testing Enhancements - -11. **Edge Case Coverage** (Lines 2010-2074) - - Added: Cache stampede protection tests - - Added: Cache expiry edge case tests - - Added: LRU eviction tests - - Added: Concurrent access tests - - **Impact**: >95% confidence in production behavior - -### Observability Additions - -12. **Built-in Instrumentation** (Lines 318-357) - - **Addition**: ActiveSupport::Notifications integration - - **Metrics**: cache hits, duration, fallback usage - - **Integration**: Easy StatsD/Datadog hookup - - **Impact**: Production visibility without custom code - -### Timeline Adjustments - -13. **Realistic Estimates** (Lines 1619, 1648, 1698, 1704) - - Phase 1: 2-3 days → **3-4 days** (edge cases + thorough testing) - - Phase 2: 2 days → **2-3 days** - - Phase 4: 1-2 days → **2-3 days** (comprehensive observability) - - Total: 6-8 days → **8-11 days** (30% contingency buffer) - - **Rationale**: Account for code review, testing, documentation - -### Dependency Updates - -14. **New Required Dependency** (Line 2235) - - **Added**: `concurrent-ruby ~> 1.2` - - **Purpose**: Thread pool for bounded concurrency - - **Justification**: Production-grade thread primitives, prevents resource exhaustion - ---- - -## API Evolution: Original vs LaunchDarkly-Inspired Design - -This section documents the evolution from the initial nested API design to the final LaunchDarkly-inspired flattened API. - -### Design Comparison - -| Aspect | Original Design | LaunchDarkly-Inspired (Final) | -|--------|----------------|-------------------------------| -| **API Structure** | Nested: `client.prompt.get()` | Flat: `client.get_prompt()` | -| **Configuration** | Inline only | Global config + per-client | -| **Global Client** | No | `Langfuse.client` singleton | -| **Fallback Pattern** | Optional, raises on error | Encouraged, returns fallback | -| **Convenience Methods** | No | `compile_prompt()` for one-step | -| **Detail Variants** | No | `get_prompt_detail()` for debugging | -| **Method Count** | 3 (get, create, update) | 6 (get_prompt, get_prompt_detail, compile_prompt, create_prompt, update_prompt, invalidate_cache) | -| **State Checking** | No | `initialized?` | - -### Code Comparison - -#### Initialization - -```ruby -# Original -client = Langfuse::Client.new( - public_key: ENV['LANGFUSE_PUBLIC_KEY'], - secret_key: ENV['LANGFUSE_SECRET_KEY'] -) - -# LaunchDarkly-Inspired -Langfuse.configure do |config| - config.public_key = ENV['LANGFUSE_PUBLIC_KEY'] - config.secret_key = ENV['LANGFUSE_SECRET_KEY'] - config.cache_ttl = 120 -end -client = Langfuse.client -``` - -#### Get Prompt - -```ruby -# Original (nested) -prompt = client.prompt.get("greeting") -compiled = prompt.compile(name: "Alice") - -# LaunchDarkly-Inspired (flattened, two options) -# Option 1: Two-step (same as original) -prompt = client.get_prompt("greeting") -compiled = prompt.compile(name: "Alice") - -# Option 2: One-step convenience -compiled = client.compile_prompt("greeting", variables: { name: "Alice" }) -``` - -#### With Fallback - -```ruby -# Original -begin - prompt = client.prompt.get("greeting") -rescue Langfuse::NotFoundError - prompt = client.prompt.get("greeting", fallback: "Hello!", type: :text) -end - -# LaunchDarkly-Inspired (graceful by default) -prompt = client.get_prompt("greeting", - fallback: "Hello {{name}}!", - type: :text -) -# Never raises - returns fallback on error -``` - -#### Debugging - -```ruby -# Original (no built-in support) -start = Time.now -prompt = client.prompt.get("greeting") -duration = Time.now - start -Rails.logger.info("Fetched in #{duration}s") - -# LaunchDarkly-Inspired (built-in) -detail = client.get_prompt_detail("greeting") -# => { -# prompt: ..., -# cached: true, -# version: 3, -# fetch_time_ms: 1.2, -# source: :cache -# } -``` - -### Why the Change? - -The LaunchDarkly-inspired design provides: - -1. **Simpler Mental Model**: Everything on `Client`, no nested managers -2. **Better Rails Integration**: Global config and singleton pattern -3. **More Resilient**: Fallbacks encouraged, graceful degradation -4. **Better DX**: Convenience methods reduce boilerplate -5. **Better Observability**: Detail variants for debugging -6. **Industry Pattern**: Familiar to developers using LaunchDarkly - -The additional methods and slight API surface increase are worth the improved developer experience and production reliability. - ---- - -## Summary and Next Steps - -### Summary - -This design document outlines a comprehensive plan to add prompt management functionality to the `langfuse-ruby` gem, achieving feature parity with the JavaScript SDK while incorporating LaunchDarkly's exceptional API design patterns. - -**Key Highlights**: - -1. **LaunchDarkly-Inspired API**: Flattened API surface, global configuration, singleton pattern -2. **Architecture**: Clean separation of concerns (Config, Client, Cache, Clients, API) -3. **Performance**: Sub-ms cache hits with stale-while-revalidate -4. **Thread Safety**: Mutex-based synchronization for Rails apps -5. **Developer Experience**: Intuitive API, convenience methods, fallback support, observability -6. **Incremental Rollout**: 4 phases from MVP to production-ready - -### Next Steps - -1. **Review and Feedback**: - - [ ] Architecture review with team - - [ ] API design feedback from early users - - [ ] Security review (authentication, input validation) - -2. **Phase 1 Implementation** (Week 1-2): - - [ ] Set up project structure - - [ ] Implement core classes (Manager, Cache, Clients) - - [ ] Add ApiClient extensions - - [ ] Write comprehensive tests - - [ ] Documentation and examples - -3. **Beta Release**: - - [ ] Publish `0.2.0.beta1` to RubyGems - - [ ] Gather feedback from early adopters - - [ ] Iterate based on real-world usage - -4. **Phase 2-4** (Week 3-4): - - [ ] Advanced features (create, update, placeholders) - - [ ] Rails.cache integration - - [ ] LangChain helpers - - [ ] Performance optimization - -5. **Production Release**: - - [ ] Final QA and testing - - [ ] Complete documentation - - [ ] Migration guides - - [ ] Publish `0.2.0` stable - -### Success Criteria - -**Technical**: -- [ ] All JavaScript SDK features implemented -- [ ] >90% test coverage -- [ ] Thread-safe for production Rails apps -- [ ] <100ms p95 latency for cached prompts - -**Documentation**: -- [ ] Complete API reference -- [ ] Migration guide from hardcoded prompts -- [ ] Integration examples (Rails, LangChain) -- [ ] Troubleshooting guide - -**Adoption**: -- [ ] 10+ beta users providing feedback -- [ ] Zero critical bugs in production -- [ ] Positive developer feedback - ---- - -**Document End** diff --git a/docs/future-enhancements/STALE_WHILE_REVALIDATE_DESIGN.md b/docs/future-enhancements/STALE_WHILE_REVALIDATE_DESIGN.md deleted file mode 100644 index 149a558..0000000 --- a/docs/future-enhancements/STALE_WHILE_REVALIDATE_DESIGN.md +++ /dev/null @@ -1,510 +0,0 @@ -# Stale-While-Revalidate (SWR) Design Document - -**Status:** Design Only - Not Implemented -**Phase:** 7.3 (Future Enhancement) -**Created:** 2025-10-16 - ---- - -## Problem Statement - -With current caching (Phases 7.1 + 7.2), the first request after cache expiry must wait for the Langfuse API call (~100ms). Even with stampede protection preventing 1,200 simultaneous API calls, one user still pays the latency cost. - -**Current Timeline:** -``` -Time: 10:00:00 - Prompt cached (TTL: 300s) -Time: 10:05:00 - Cache expires -Time: 10:05:00.001 - Request arrives - → Check cache: MISS (expired) - → Acquire lock: SUCCESS - → Call Langfuse API: 100ms ⏳ (user waits) - → Populate cache - → Return to user -Total latency: ~100ms for first user -``` - ---- - -## Solution: Stale-While-Revalidate - -Serve slightly outdated (stale) data immediately while refreshing in the background. Users get instant responses (~1ms) even after cache "expires". - -**With SWR Timeline:** -``` -Time: 10:00:00 - Prompt cached - - fresh_until: 10:05:00 (TTL: 5 minutes) - - stale_until: 10:10:00 (grace period: 5 more minutes) - -Time: 10:05:01 - Request arrives (cache expired but not stale) - → Return STALE data immediately (1ms latency) ✨ - → Trigger background refresh (doesn't block user) - → Background: Fetch from API, update cache -``` - ---- - -## Design Overview - -### Three Cache States - -1. **FRESH** (`Time.now < fresh_until`): Return immediately, no action needed -2. **REVALIDATE** (`fresh_until <= Time.now < stale_until`): Return stale data + trigger background refresh -3. **STALE** (`Time.now >= stale_until`): Must fetch fresh data synchronously - -### Cache Entry Structure - -**Current (Phase 7.1/7.2):** -```ruby -CacheEntry = Struct.new(:data, :expires_at) -``` - -**With SWR (Phase 7.3):** -```ruby -CacheEntry = Struct.new(:data, :fresh_until, :stale_until) do - def fresh? - Time.now < fresh_until - end - - def stale? - Time.now > stale_until - end - - def revalidate? - !fresh? && !stale? - end -end -``` - ---- - -## Implementation Approach - -### 1. Configuration - -```ruby -Langfuse.configure do |config| - config.cache_backend = :rails - config.cache_ttl = 300 # Fresh for 5 minutes - config.cache_stale_while_revalidate = true # Enable SWR (opt-in) - config.cache_stale_ttl = 300 # Serve stale for 5 more minutes - config.cache_refresh_threads = 5 # Thread pool size (see analysis below) -end -``` - -**New config options:** -- `cache_stale_while_revalidate` (Boolean, default: false) - Enable SWR -- `cache_stale_ttl` (Integer, default: same as cache_ttl) - Grace period duration -- `cache_refresh_threads` (Integer, default: 5) - Background thread pool size - -### 2. RailsCacheAdapter Enhancement - -```ruby -require 'concurrent' - -class RailsCacheAdapter - def initialize(ttl:, stale_ttl: nil, refresh_threads: 5, ...) - @ttl = ttl - @stale_ttl = stale_ttl || ttl - @thread_pool = Concurrent::CachedThreadPool.new( - max_threads: refresh_threads, - min_threads: 2, - max_queue: 50, - fallback_policy: :discard # Drop oldest if queue full - ) - end - - # New method for SWR - def fetch_with_stale_while_revalidate(key, &block) - entry = get_entry_with_metadata(key) - - if entry && entry[:fresh_until] > Time.now - # FRESH - return immediately - return entry[:data] - elsif entry && entry[:stale_until] > Time.now - # REVALIDATE - return stale + refresh in background - schedule_refresh(key, &block) - return entry[:data] # Instant response! ✨ - else - # STALE or MISS - must fetch synchronously - fetch_and_cache_with_metadata(key, &block) - end - end - - private - - def schedule_refresh(key, &block) - # Prevent duplicate refreshes - refresh_lock_key = "#{namespaced_key(key)}:refreshing" - return unless acquire_refresh_lock(refresh_lock_key) - - @thread_pool.post do - begin - value = block.call - set_with_metadata(key, value) - ensure - release_lock(refresh_lock_key) - end - end - end - - def get_entry_with_metadata(key) - # Fetch from Redis including timestamps - raw = Rails.cache.read("#{namespaced_key(key)}:metadata") - return nil unless raw - - JSON.parse(raw, symbolize_names: true) - end - - def set_with_metadata(key, value) - now = Time.now - entry = { - data: value, - fresh_until: now + @ttl, - stale_until: now + @ttl + @stale_ttl - } - - # Store both data and metadata - Rails.cache.write(namespaced_key(key), value, expires_in: @ttl + @stale_ttl) - Rails.cache.write("#{namespaced_key(key)}:metadata", entry.to_json, expires_in: @ttl + @stale_ttl) - - value - end - - def acquire_refresh_lock(lock_key) - # Short-lived lock (60s) to prevent duplicate background refreshes - Rails.cache.write(lock_key, true, unless_exist: true, expires_in: 60) - end -end -``` - -### 3. ApiClient Integration - -```ruby -# In ApiClient#get_prompt -def get_prompt(name, version: nil, label: nil) - raise ArgumentError, "Cannot specify both version and label" if version && label - - cache_key = PromptCache.build_key(name, version: version, label: label) - - # Use SWR if cache supports it and SWR is enabled - if cache&.respond_to?(:fetch_with_stale_while_revalidate) - cache.fetch_with_stale_while_revalidate(cache_key) do - fetch_prompt_from_api(name, version: version, label: label) - end - elsif cache&.respond_to?(:fetch_with_lock) - # Rails.cache with stampede protection (Phase 7.2) - cache.fetch_with_lock(cache_key) do - fetch_prompt_from_api(name, version: version, label: label) - end - elsif cache - # In-memory cache - simple get/set - cached_data = cache.get(cache_key) - return cached_data if cached_data - - prompt_data = fetch_prompt_from_api(name, version: version, label: label) - cache.set(cache_key, prompt_data) - prompt_data - else - # No cache - fetch_prompt_from_api(name, version: version, label: label) - end -end -``` - ---- - -## Thread Pool Sizing Analysis - -### Calculation - -``` -Threads = (Number of prompts × API latency) / Desired refresh time - -Example (SimplePractice): -- Prompts: 50 unique prompts -- API latency: 200ms -- Desired refresh time: 5 seconds (before users notice stale data) - -Threads = (50 × 0.2) / 5 = 2 threads minimum -Add 25% buffer: 2 × 1.25 = 2.5 → 3 threads - -With 100 prompts: -Threads = (100 × 0.2) / 5 = 4 threads minimum -Add 25% buffer: 4 × 1.25 = 5 threads ✅ -``` - -### Scenarios - -**Scenario 1: Steady State (Distributed Expiry)** -``` -TTL: 5 minutes = 300 seconds -Prompts: 50 total - -Expiry rate: 50 prompts / 300 seconds = 0.16 prompts/second - = ~1 prompt every 6 seconds - -Thread requirement: 1 thread sufficient -``` - -**Scenario 2: Post-Deploy (Worst Case - All Expire Together)** -``` -Prompts: 50 all cached at T=0 -At T=5min: All 50 hit "revalidate" state simultaneously - -With 2 threads: 50 ÷ 2 = 25 batches × 200ms = 5 seconds ⚠️ -With 5 threads: 50 ÷ 5 = 10 batches × 200ms = 2 seconds ✅ -With 10 threads: 50 ÷ 10 = 5 batches × 200ms = 1 second ✅ -``` - -### Recommendations - -**Option A: Fixed Pool (Simplest)** -```ruby -config.cache_refresh_threads = 5 # Default, configurable -@thread_pool = Concurrent::FixedThreadPool.new(5) -``` -- **Pros**: Simple, predictable, easy to reason about -- **Cons**: May be too few (large apps) or too many (small apps) - -**Option B: Auto-Sizing Pool (Recommended)** -```ruby -@thread_pool = Concurrent::CachedThreadPool.new( - max_threads: 10, # Cap at 10 - min_threads: 2, # Keep 2 warm - max_queue: 50, # Queue up to 50 refreshes - fallback_policy: :discard # Drop oldest if queue full -) -``` -- **Pros**: Self-adjusts to load, efficient resource usage -- **Cons**: Slightly more complex behavior - -**Option C: Calculated Based on Config** -```ruby -def default_refresh_threads - # Estimate: 1 thread per 25 prompts, min 2, max 10 - estimated_prompts = config.cache_estimated_prompts || 50 - threads = (estimated_prompts / 25.0).ceil - [[threads, 2].max, 10].min -end -``` -- **Pros**: Automatically sized based on expected load -- **Cons**: Requires estimating number of prompts - -**Recommendation**: Use **Option B (Auto-Sizing Pool)** - best balance of simplicity and efficiency. - ---- - -## Benefits - -### 1. Better User Experience -- Users almost never wait for API calls -- Consistent low latency (~1ms cache reads) -- Only "too stale" requests pay the 100ms cost - -### 2. Reduced Perceived Latency -``` -Without SWR: -- 99% of requests: 1ms (cached) -- 1% of requests: 100ms (first after expiry) -- P99 latency: 100ms - -With SWR: -- 99.9% of requests: 1ms (cached or stale) -- 0.1% of requests: 100ms (truly stale) -- P99 latency: 1ms ✨ -``` - -### 3. Graceful Degradation -- If Langfuse API is slow/down, users still get stale data -- Only after grace period do requests fail -- Gives time to fix issues without user impact - -### 4. Smoother Load Pattern -- Background refreshes happen asynchronously -- No thundering herd at expiry time -- API load is distributed over time - ---- - -## Trade-offs - -### Pros -✅ Near-instant response times (serve stale data) -✅ Background refresh doesn't block requests -✅ Dramatically reduces P99 latency -✅ More resilient to API slowdowns -✅ Smooth cache warming (no cold-start spikes) - -### Cons -❌ Users might get slightly outdated data -❌ More complex caching logic -❌ Requires background thread pool (~10-20MB memory) -❌ Stale data could be incorrect if prompts change frequently -❌ Adds dependency on concurrent-ruby gem - ---- - -## When to Use SWR - -**Good for:** -- ✅ Prompts that don't change often (production prompts are typically stable) -- ✅ High-traffic applications where latency matters -- ✅ Systems where eventual consistency is acceptable -- ✅ Apps with many processes (background refresh amortized) - -**Not ideal for:** -- ❌ Prompts that change frequently (users might see old versions) -- ❌ Critical data that must always be fresh -- ❌ Low-traffic apps (background refresh overhead not worth it) -- ❌ Apps sensitive to memory usage (thread pool overhead) - ---- - -## Example: SimplePractice Impact - -**Without SWR (current with Phase 7.2):** -``` -- 1,200 processes -- 50 prompts -- Cache expires every 5 minutes -- First request after expiry: 100ms latency -- Other 1,199 requests: 1ms (stampede protection) -``` - -**With SWR:** -``` -- ALL 1,200 requests: 1ms latency ✨ -- Background refresh happens without blocking -- Stale data served for up to 5 more minutes if refresh fails -- Same 50 API calls every 5 minutes (no extra API load) -``` - ---- - -## Testing Strategy - -### Unit Tests - -1. **Cache state transitions** - - Fresh → Revalidate → Stale - - Timestamps correctly set - -2. **Background refresh** - - Scheduled correctly - - Not duplicated (refresh lock) - - Executes asynchronously - -3. **Thread pool behavior** - - Queues refreshes - - Discards on overflow - - Scales up/down - -### Integration Tests - -1. **With ApiClient** - - Returns stale data immediately - - Background refresh completes - - Next request gets fresh data - -2. **Concurrency** - - Multiple processes hit revalidate state - - Only one background refresh happens - -3. **Error handling** - - Background refresh fails → keep serving stale - - Background refresh succeeds → cache updated - -### Load Tests - -1. **Post-deploy scenario** - - All prompts expire simultaneously - - Measure refresh time with different thread pool sizes - -2. **Steady state** - - Measure latency distribution (P50, P99, P999) - - Verify background refreshes don't impact user requests - ---- - -## Dependencies - -**New Gem:** -- `concurrent-ruby ~> 1.2` - Thread pool management - -**Existing:** -- Rails.cache (Redis) - Already required for Phase 7.1 - ---- - -## Estimated Effort - -**Lines of Code:** ~200-250 new lines -- RailsCacheAdapter: ~100 lines (fetch_with_stale_while_revalidate, metadata methods) -- Config: ~20 lines (new options, validation) -- ApiClient: ~20 lines (integration) -- Tests: ~60-100 lines - -**Complexity:** Medium -- Thread pool management (concurrent-ruby handles this) -- Metadata storage in Redis (straightforward) -- Background refresh scheduling (lock-based deduplication) - -**Testing Effort:** Medium-High -- Background/async behavior harder to test -- Need timing-based tests (sleep, wait for refresh) -- Concurrency edge cases - -**Time Estimate:** 4-6 hours -- 2 hours: Implementation -- 2 hours: Testing -- 1 hour: Documentation -- 1 hour: Buffer/debugging - ---- - -## Future Enhancements - -### Phase 7.3.1: Smart Refresh Scheduling -Instead of refreshing immediately on first stale request, schedule refreshes intelligently: -- Predict when prompts will expire based on usage patterns -- Pre-refresh popular prompts before they go stale -- Distribute refreshes to avoid spikes - -### Phase 7.3.2: Adaptive TTL -Automatically adjust TTL based on prompt change frequency: -- Track how often prompts change in Langfuse -- Increase TTL for stable prompts -- Decrease TTL for frequently updated prompts - -### Phase 7.3.3: Metrics & Observability -Add instrumentation for: -- Stale hit rate -- Background refresh success rate -- Time spent in each cache state -- Thread pool utilization - ---- - -## Decision: Not Implementing (Yet) - -**Rationale:** -- Phase 7.1 (Rails.cache adapter) + Phase 7.2 (stampede protection) already provide excellent performance -- Stampede protection ensures only 1 API call per cache miss (not 1,200) -- The 100ms latency hit happens very infrequently (once per TTL window) -- Added complexity (thread pool, metadata, concurrent-ruby dependency) may not be worth the marginal latency improvement -- Can revisit if P99 latency becomes a problem in production - -**When to Reconsider:** -- Users complain about latency spikes -- P99 latency metrics show cache expiry causing issues -- Langfuse API becomes slower (>500ms) -- Need to support very high traffic (10,000+ requests/sec) - ---- - -## References - -- **HTTP Stale-While-Revalidate**: [RFC 5861](https://datatracker.ietf.org/doc/html/rfc5861) -- **SWR Pattern**: [Vercel SWR Library](https://swr.vercel.app/) -- **concurrent-ruby**: [GitHub](https://github.com/ruby-concurrency/concurrent-ruby) -- **Thread Pool Sizing**: [Little's Law](https://en.wikipedia.org/wiki/Little%27s_law)