Skip to content

Feature/interrupt handler sh4shv4t#497

Open
sh4shv4t wants to merge 2 commits intoDark-Sys-Jenkins:mainfrom
sh4shv4t:feature/interrupt-handler-shashvat
Open

Feature/interrupt handler sh4shv4t#497
sh4shv4t wants to merge 2 commits intoDark-Sys-Jenkins:mainfrom
sh4shv4t:feature/interrupt-handler-shashvat

Conversation

@sh4shv4t
Copy link

@sh4shv4t sh4shv4t commented Feb 2, 2026

Intelligent Interruption Handling with Fuzzy Matching by Shashvat Singh

Summary

Implements intelligent interruption detection using configurable fuzzy string matching to distinguish between backchannel responses (e.g., "yeah", "okay", "hmm") and genuine interruptions (e.g., "stop", "wait"). Enables natural conversation flow by allowing users to acknowledge they're listening without disrupting the agent.

Demo

πŸŽ₯ Video walkthrough & technical deep-dive: Video Link

# Quick test
uv run pytest tests/test_intelligent_interruption.py -v

Problem

Traditional voice agents treat all user speech uniformly, causing poor UX:

  • Users say "yeah" or "hmm" while agent speaks β†’ agent incorrectly stops
  • STT variations like "yeahh" aren't recognized as backchannel
  • No configurability for different use cases (customer service vs accessibility)
  • Edge cases (empty strings, unicode) can crash interruption logic

Solution

Fuzzy string matching with rapidfuzz for intelligent classification:

  • Recognizes 16 default backchannel words with configurable threshold (default 80%)
  • Handles STT typos automatically ("yeahh" β†’ "yeah" at 88% similarity)
  • Sub-millisecond performance using process.extractOne
  • State-aware: only filters when agent is speaking

Technical decision: Chose fuzzy matching over semantic embeddings due to latency requirements (<1ms vs 50-200ms). Real-time voice demands immediate responses; embeddings would add user-perceivable delay.

Agent State User Says Result
Speaking "yeah", "hmm" Continues (backchannel)
Speaking "yeahh", "okayy" Continues (fuzzy match)
Speaking "wait", "stop" Stops (real interruption)
Speaking "yeah but wait" Stops (mixed input)

Changes

livekit-agents/livekit/agents/voice/agent_activity.py

  • _is_soft_input(): Fuzzy matching with configurable threshold, error handling, debug logging
  • _should_ignore_interruption(): State-aware interruption logic
  • Performance: O(n) with process.extractOne vs O(nΒ²) nested loops

livekit-agents/livekit/agents/voice/agent_session.py

  • DEFAULT_IGNORED_WORDS: 16 common backchannel words
  • AgentSessionOptions.fuzzy_match_threshold: Configurable 0-100% (default 80)
  • Environment variable: LIVEKIT_AGENT_IGNORED_WORDS

tests/test_intelligent_interruption.py

  • 24 comprehensive tests across 5 classes
  • Coverage: exact matching, fuzzy matching, edge cases, configurable thresholds
  • All tests passing βœ…

examples/voice_agents/intelligent_interruption_demo.py

  • Full demo agent with configuration examples
  • Displays ignored words and threshold at startup

Documentation

  • README.md: Enhanced feature description
  • demonstration_walkthrough.py: Interactive demo script

Acceptance Criteria βœ…

  • βœ… Agent continues over "yeah/okay/hmm" while speaking
  • βœ… Fuzzy matching handles typos ("yeahh" β†’ "yeah")
  • βœ… Real interruptions work ("stop", "wait" β†’ agent stops)
  • βœ… Mixed input interrupts ("yeah but wait" β†’ stops)
  • βœ… Edge cases handled (empty, whitespace, unicode)
  • βœ… Configurable thresholds (50%, 80%, 90%, 100%)
  • βœ… Error resilient (doesn't crash on corrupt input)

Testing

Automated

# All 24 tests
uv run pytest tests/test_intelligent_interruption.py -v

# With coverage
uv run pytest tests/test_intelligent_interruption.py --cov=livekit.agents.voice

Manual

# Setup
cp examples/.env.example examples/.env  # Add your API keys
uv run examples/voice_agents/intelligent_interruption_demo.py download-files

# Test in console mode
uv run examples/voice_agents/intelligent_interruption_demo.py console

# Test with real voice in LiveKit playground
uv run examples/voice_agents/intelligent_interruption_demo.py dev
python3 generate_token.py  # Get connection URL

Configuration Examples

# Default (80% threshold, balanced)
agent = VoiceAgent()

# Lenient (70% - noisy environments, accents)
agent = VoiceAgent(fuzzy_match_threshold=70)

# Strict (90% - formal contexts, clear audio)
agent = VoiceAgent(fuzzy_match_threshold=90)

# Exact match only (100% - testing/debugging)
agent = VoiceAgent(fuzzy_match_threshold=100)

# Custom ignored words
agent = VoiceAgent(
    ignored_words=["yeah", "ok", "hmm", "sure", "right"],
    fuzzy_match_threshold=80
)

Key Features

  • Performance: Sub-millisecond fuzzy matching, no user-perceivable latency
  • Robustness: Try-except with safe fallback, detailed error logging
  • Flexibility: Threshold 0-100%, custom word lists, env var config
  • Observability: Debug logs with full transcript context

Dependencies

dependencies = ["rapidfuzz>=3.0.0"]

Breaking Changes

None. Backward-compatible enhancement - existing agents work without changes.


Author: Shashvat Singh
Date: February 2, 2026

Implement context-aware interruption filtering that distinguishes between
passive acknowledgements ("yeah", "ok", "hmm") and intentional interruptions
("stop", "wait") based on agent speaking state.

Key changes:
- Add configurable `ignored_words` option to AgentSession with sensible defaults
- Add `_is_soft_input()` and `_should_ignore_interruption()` helpers in AgentActivity
- Pass STT transcript to interruption handler for semantic filtering
- Support LIVEKIT_AGENT_IGNORED_WORDS env var for runtime configuration

When agent is speaking, backchannel words are ignored seamlessly without
pause or stutter. When agent is silent, all input is processed normally.

Includes demo script and unit tests (11 tests passing).
…uzzy matching

Add context-aware interruption detection to distinguish between backchannel
responses ("yeah", "okay", "hmm") and genuine interruptions ("stop", "wait").
This enables natural conversation flow where users can acknowledge they're
listening without disrupting the agent.

Key Features:
- Configurable fuzzy string matching using rapidfuzz (default 80% threshold)
- Handles STT typos and variations automatically ("yeahh" β†’ "yeah" @ 88%)
- Sub-millisecond performance with process.extractOne optimization
- State-aware: only filters interruptions when agent is speaking
- Robust error handling with safe fallback behavior
- 16 default backchannel words (configurable via param or env var)
- Comprehensive debug logging for production troubleshooting

Technical Implementation:
- agent_activity.py: Add _is_soft_input() and _should_ignore_interruption()
  with fuzzy matching, error handling, and performance optimizations
- agent_session.py: Add DEFAULT_IGNORED_WORDS, fuzzy_match_threshold param,
  and environment variable support (LIVEKIT_AGENT_IGNORED_WORDS)
- Chose fuzzy matching over semantic embeddings due to latency (<1ms vs 50-200ms)

Testing & Documentation:
- 24 comprehensive tests covering exact/fuzzy matching, edge cases, thresholds
- Demo application with usage examples and configuration display
- Complete technical specification in PLAN.md with 8-minute video script
- Interactive demonstration_walkthrough.py script with mock scenarios
- Enhanced README.md with detailed feature description
- PR_MESSAGE.md with comprehensive implementation details
- Token generation utility (generate_token.py) for LiveKit playground

Behavior Matrix:
- "yeah/okay/hmm" while speaking β†’ agent continues (backchannel)
- "yeahh/okayy" while speaking β†’ agent continues (fuzzy match)
- "wait/stop/no" while speaking β†’ agent stops (real interruption)
- "yeah but wait" while speaking β†’ agent stops (mixed input)
- Any input when silent β†’ processed normally

Configuration:
- Default: fuzzy_match_threshold=80 (balanced)
- Lenient: fuzzy_match_threshold=70 (noisy/accents)
- Strict: fuzzy_match_threshold=90 (formal/clear audio)
- Exact: fuzzy_match_threshold=100 (testing/debugging)

Breaking Changes: None (backward compatible)

Dependencies: Added rapidfuzz>=3.0.0 for fuzzy string matching

Closes: Intelligent interruption handling implementation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant