Skip to content

Implement new foundation with three-phase preprocessing and mini-pipelines#1

Open
odilitime wants to merge 13 commits into1.xfrom
v2
Open

Implement new foundation with three-phase preprocessing and mini-pipelines#1
odilitime wants to merge 13 commits into1.xfrom
v2

Conversation

@odilitime
Copy link
Member

@odilitime odilitime commented Dec 8, 2025

Summary by CodeRabbit

  • New Features
    • NLP utilities for text analysis including phonetic matching, entity extraction, multilingual keyword analysis, and semantic similarity
    • Multi-phase pipeline architecture (Phase 0–3) with intelligent preprocessing and autonomous thinking loops
    • Autonomy modes system enabling flexible agent behavior control (autonomous, cooperative, reactive, assistant, observer)
    • Action and provider intelligent matching registry with semantic indexing
    • Enhanced engagement policies framework for customizable agent participation
    • Improved token budgeting and resource tracking with per-agent cost attribution
    • Language detection with automatic translation support
    • Advanced memory management with monitoring and cleanup utilities

✏️ Tip: You can customize this high-level summary in your review settings.


Note

Low Risk
Documentation-only changes: large reorg/rewrite of architecture and developer docs, with no runtime behavior changes. Main risk is misinformation or drift vs actual implementation.

Overview
Documentation overhaul for Plugin-Autonomous v2. Rewrites ARCHITECTURE.md to describe a new three-phase (Phase 0 mini-pipelines → Phase 1 batch review → Phase 2 conditional response) model, including diagrams and detailed module layout.

Adds new reference/guide docs (CHANGELOG.md, DESIGN.md, DEVELOPER_GUIDE.md, FUTURE_ROADMAP.md, NLP_UTILITIES.md) and removes a large set of older, issue-specific markdown notes (e.g., batching/addressee/runtime lookup fix writeups and various summaries).

Written by Cursor Bugbot for commit 4833d50. This will update automatically on new commits. Configure here.

@coderabbitai
Copy link

coderabbitai bot commented Dec 8, 2025

Walkthrough

Major architectural v2 refactor introducing a three-phase pipeline (preprocessing, batch planning, response), nine mini-pipelines for LLM-free analysis, NLP utilities for phonetic/similarity matching and entity extraction, engagement policy system, Phase 3 autonomous thinking loop, action/provider registry, and comprehensive supporting infrastructure with extensive test coverage and documentation.

Changes

Cohort / File(s) Summary
Documentation: Architecture & Design
README.md, ARCHITECTURE.md, DESIGN.md, DEVELOPER_GUIDE.md, FUTURE_ROADMAP.md, PIPELINE.md, NLP_UTILITIES.md
Comprehensive refactor of v2 architecture narrative, three-phase model, design philosophy, cost pyramids, implementation guide, roadmap, and API reference docs; replaces time-slice approach with phase-driven orchestration.
Documentation: Removed
ADDRESSEE_FIX.md, AGENT_RUNTIME_LOOKUP_FIX.md, ALWAYS_ON_BATCHING.md, BATCHING_COMPARISON.md, DESIGN_DECISIONS.md, CHAT_IMPROVEMENTS.md, COMPLETE_SUMMARY.md, DEAD_CODE_ANALYSIS.md, DECISION_LOGGING.md, EXAMPLE_LOGS.md, LLM_CALL_ANALYSIS.md, MULTI_AGENT_BATCHING_FIX.md, NAME_VARIATION_REGISTRY.md, NLP_IMPLEMENTATION.md, PROMPT_IMPROVEMENTS.md, PROMPT_REFACTORING.md, REPLY_THREAD_DETECTION.md, TEST_COVERAGE_GAP.md
Removal of v1 design notes and interim documentation; consolidated into v2 structure.
NLP Module
src/nlp/phonetic.ts, src/nlp/similarity.ts, src/nlp/keywords.ts, src/nlp/entities.ts, src/nlp/index.ts
New LLM-free NLP utilities: Double Metaphone phonetic matching, Jaro-Winkler/Levenshtein similarity, multilingual keyword extraction with stopwords, and regex-based entity extraction (URLs, emails, mentions, addresses, UUIDs, code blocks, names).
NLP Tests
src/__tests__/nlp/phonetic.test.ts, src/__tests__/nlp/similarity.test.ts, src/__tests__/nlp/keywords.test.ts, src/__tests__/nlp/entities.test.ts
Comprehensive test coverage for NLP utilities including phonetic matching, string similarity algorithms, keyword extraction with frequency analysis, and entity extraction with limits.
Pipeline Module: Core
src/pipeline/types.ts, src/pipeline/context.ts, src/pipeline/profiler.ts, src/pipeline/preprocessor.ts, src/pipeline/index.ts
Pipeline type system, context management with cross-agent coordination, profiling/bottleneck detection, Phase 0 orchestration, and barrel exports for pipeline infrastructure.
Pipeline Module: Mini-Pipelines
src/pipeline/mini/language.ts, src/pipeline/mini/classify.ts, src/pipeline/mini/embedding.ts, src/pipeline/mini/topics.ts, src/pipeline/mini/agent-match.ts, src/pipeline/mini/complexity.ts, src/pipeline/mini/entity.ts, src/pipeline/mini/action-provider.ts, src/pipeline/mini/character.ts
Nine mini-pipelines (MP1–MP9): language detection, action classification, embedding queueing, keyword extraction, agent matching, complexity estimation, entity extraction, action/provider matching, and character context assembly.
Pipeline Tests
src/__tests__/pipeline/context.test.ts, src/__tests__/pipeline/mp1-language.test.ts, src/__tests__/pipeline/mp2-classify.test.ts, src/__tests__/pipeline/mp7-entity.test.ts, src/__tests__/pipeline/phase0-integration.test.ts
Unit and integration tests for pipeline context, MPs, and Phase 0 orchestration covering language detection, classification, entity extraction, and cross-MP coordination.
Engagement Policy System
src/engagement/types.ts, src/engagement/engine.ts, src/engagement/policies/core.ts, src/engagement/policies/budget.ts, src/engagement/policies/none.ts, src/engagement/index.ts
Pluggable engagement policy system with willingness modifiers, effort levels, budget tracking per agent, and configurable policy chain composition.
Phase 3 Autonomous Loop
src/phase3/types.ts, src/phase3/service.ts, src/phase3/prompt.ts, src/phase3/parser.ts, src/phase3/executor.ts, src/phase3/action-context.ts, src/phase3/action-memory.ts, src/phase3/budget.ts, src/phase3/catchup.ts, src/phase3/index.ts
Phase 3 autonomous thinking loop with cycle orchestration, thought/qualia/task/action management, token budgeting, room attention tracking, constraint-based action execution, and procedural memory.
Registry System
src/registry/types.ts, src/registry/action-provider-registry.ts, src/registry/index.ts
Action/provider compilation and caching at startup with keyword indexing, stability monitoring, complexity estimation, and parameter hint extraction.
Core Services & Data
src/services/autonomous-message-service.ts, src/services/resource-tracker.ts, src/name-variation-registry.ts, src/topic-registry.ts, src/planning-engine.ts
Enhanced services: proxy fix for useModel, per-agent token tracking, per-agent cost breakdown, name variation batching with phonetic fallback, keyword relevance calculation, index-based UUID mapping in planning.
Utilities & Data
src/data/default-vocabulary.ts, src/debug/memory-stats.ts, src/banner.ts, src/constants.ts
Default emoji vocabulary with fallback logic, memory stats collection/monitoring, banner UI refactor, and extensive new constants for budgets, autonomy modes, QoS profiles, Phase 3 settings, and engagement defaults.
Documentation (New)
src/docs/action-intent-detection.md, src/docs/deferred-intent-tracking.md, src/docs/dual-pressure-scoring.md, src/docs/entity-language-provider.md, src/docs/react-vocabulary.md
Technical specifications for action intent detection, deferred processing, dual pressure scoring, entity language persistence, and emoji vocabulary system.
Main Plugin
src/index.ts
Major expansion of public API surface: engagement engine, memory stats, Phase 3 service registration, room stats/context, monitoring endpoints, motivation/goal integration, providers array, event handlers, and initialization enhancements.
Tests: Updated
src/__tests__/execution-flow.test.ts, src/__tests__/planning-engine.test.ts, src/__tests__/test-helpers.ts, src/__tests__/benchmark-nlp.test.ts
Test infrastructure updates for new batch structure (agentMessages), index-based UUID mapping, expanded mock methods, and NLP performance benchmarks.
Configuration
package.json, CHANGELOG.md
Dependencies added: @echogarden/fasttext-wasm, compromise, double-metaphone, franc, tinyld, stopword; changelog documenting v2 features, cost reductions, autonomy modes, NLP integration, smart routing, resource tracking, and design decisions.

Sequence Diagram(s)

sequenceDiagram
    participant Msg as Message
    participant P0 as Phase 0<br/>Preprocessing
    participant MPs as Mini-Pipelines<br/>(MP1-MP9)
    participant CTX as Pipeline Context
    participant P1 as Phase 1<br/>Planning
    participant P2 as Phase 2<br/>Response
    participant P3 as Phase 3<br/>Thinking Loop

    Msg->>P0: executePhase0(message)
    P0->>CTX: createPipelineContext()
    P0->>MPs: Execute in parallel by dependency
    MPs->>MPs: MP1: Language Detection
    MPs->>MPs: MP4: Topics from keywords
    MPs->>MPs: MP5: Agent matching (depends MP4)
    MPs->>MPs: MP7: Entity extraction
    MPs->>MPs: MP8: Action/Provider matching (depends MP7)
    MPs->>CTX: storeMPResult(name, result)
    MPs-->>P0: Results with profiles
    P0->>P0: identifyBottlenecks()
    P0-->>P1: Phase0Result
    P1->>P1: Batch LLM planning
    P1-->>P2: Phase1Result (with decisions)
    P2->>P2: Per-agent LLM generation
    P2-->>P3: Phase2Result (draft responses)
    P3->>P3: buildPrompt(CycleContext)
    P3->>P3: LLM cycle: thoughts/tasks/actions
    P3->>P3: parseCycleOutput()
    P3->>P3: executeCycleOutput()
    P3-->>Msg: Autonomous actions/messages
Loading
sequenceDiagram
    participant RT as Runtime
    participant EP as Engagement<br/>Policy Engine
    participant CP as Core Policy
    participant BP as Budget Policy
    participant CTX as EngagementContext

    RT->>EP: evaluate(runtime, ctx)
    EP->>EP: getPolicyChain(runtime)
    EP->>CP: getWillingnessModifier(ctx)
    CP-->>EP: { modifier: +40, effort: 'full' }
    EP->>BP: getWillingnessModifier(ctx)
    BP-->>EP: { modifier: -30, effort: 'normal' }
    EP->>EP: Aggregate modifiers (40-30=10)
    EP->>EP: Compare to threshold
    EP-->>RT: { engage: true, willingness: 60 }
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Poem

Whiskers twitch with joy, 🐰
V2 pipelines flow with grace,
Phases dance in sync—
NLP sparkles bright,
Autonomy takes flight!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main architectural change: a new three-phase preprocessing foundation with mini-pipelines, which is the primary focus of this large changeset.
Docstring Coverage ✅ Passed Docstring coverage is 96.67% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch v2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

🧹 Nitpick comments (35)
V2_FINAL_SUMMARY.md (1)

63-96: Consider adding language specifier to fenced code blocks.

The ASCII diagram code blocks (lines 63, 171, 300) lack a language specifier. While this is purely cosmetic for diagrams, adding text or plaintext as the language identifier would satisfy markdown linting rules.

-```
+```text
 ┌─────────────────────────────────────────────────────────┐
 │  PHASE 0: Parallel Mini-Pipelines (No LLM, <30ms)      │
src/name-variation-registry.ts (1)

205-225: Placeholder method with clear future roadmap.

The stub implementation with underscore-prefixed unused parameters follows TypeScript conventions. The comments clearly document the planned NLP integration for Sprint 5.

Consider adding a TODO comment or JSDoc @todo tag to make this more discoverable in IDE task lists:

   /**
    * Find phonetically similar agent names
    * Enhanced with NLP utilities for better matching
    * 
+   * @todo Implement in Sprint 5 using phoneticSimilarity from nlp/phonetic.ts
    * @param _targetName - Name to find matches for (placeholder for future enhancement)
    * @param _threshold - Similarity threshold (0-1), default 0.75 (placeholder)
    * @returns Array of agent IDs with similar names
    */
src/services/resource-tracker.ts (1)

745-791: Per-agent cost breakdown enhances observability.

The implementation follows the same ANSI formatting pattern as the existing efficiency report. The top-10 limit with overflow message is a sensible approach to prevent output flooding.

One minor note: the agentId.substring(0, 10) truncation (line 766) may make it difficult to identify agents if multiple agents share the same ID prefix. Consider using a lookup to display agent names if available, or truncating from the end of the UUID instead.

-      const agentIdShort = cost.agentId.substring(0, 10).padEnd(12);
+      // Use last 10 chars of UUID as they're more likely to be unique
+      const agentIdShort = cost.agentId.slice(-10).padEnd(12);
V2_TESTS_COMPLETE.md (1)

5-12: Consider adding languages to fenced code blocks for markdownlint compatibility

The status code block currently uses bare triple backticks; markdownlint (MD040) prefers specifying a language (e.g., text or bash). Same applies to other fences later in this file if you care about clean lint output.

src/planning-engine.ts (1)

176-180: Per‑agent token tracking looks correct; consider also wiring it for time‑slice planning

Passing runtime.agentId into resourceTracker.trackTokens in both planSingle and planBatch cleanly attributes TEXT_SMALL usage to the calling agent without touching control flow. This is a good observability improvement.

For consistency, you may eventually want to mirror the same per‑agent tracking in planTimeSlice (currently it only reports aggregate tokensUsed in the return value) so all planning modes share the same accounting model.

Also applies to: 359-366

DEVELOPER_GUIDE.md (1)

10-13: Add languages to fenced code blocks to satisfy markdownlint and improve highlighting

Several code fences (e.g., the v1/v2 flow examples around Lines 10–13, the Phase 1 batch example, and the debugging snippets later) use bare without a language. If you care about markdownlint (MD040) and syntax highlighting, consider annotating them (e.g.,text, typescript, or bash as appropriate).

Also applies to: 82-88, 315-332

src/__tests__/pipeline/phase0-integration.test.ts (1)

65-145: Integration tests are appropriately skipped; plan for proper mocking before enabling

Marking these Phase 0 integration tests with test.skip and documenting the CLD3/fastText dependency is reasonable for now. Before unskipping, you’ll want a deterministic detectLanguage() mock and likely a more complete IAgentRuntime mock (beyond logger/queueEmbeddingGeneration) so executePhase0 and all MPs can run without external services.

If you want, I can help sketch a minimal mock detectLanguage()/runtime setup that will let you safely convert a subset of these tests from skip to live integration coverage.

PIPELINE.md (1)

153-159: Specify languages on example code fences for consistency and lint cleanliness

The scoring and phase2Hint examples are in bare fenced blocks. Adding a language (e.g., text or typescript) would align with the rest of the docs and clear the MD040 warnings from markdownlint.

Also applies to: 188-195

src/__tests__/pipeline/mp7-entity.test.ts (1)

134-140: Performance threshold may be flaky in CI environments.

The 50ms threshold is quite aggressive and could cause intermittent failures on slower CI runners or under load. Consider either increasing the threshold (e.g., 100-200ms) or marking this as a benchmark test that doesn't fail the build.

-    expect(result.profile.durationMs).toBeLessThan(50); // Should be very fast
+    expect(result.profile.durationMs).toBeLessThan(200); // Allow headroom for CI variability
TEST_SUMMARY.md (1)

5-12: Add language specifier to fenced code blocks.

The linter flags these code blocks as missing a language specifier. While these contain status output rather than code, adding a language helps with consistent rendering.

-```
+```text
 ✅ 245 tests passing
 ⏭️  15 tests skipped (require language detector setup)
 ⚠️  7 tests failing (pre-existing from earlier work)

 Total: 267 tests across 17 files
 Time: 434ms

</blockquote></details>
<details>
<summary>src/services/autonomous-message-service.ts (1)</summary><blockquote>

`1172-1174`: **Consider cleaning up `originalUseModels` on agent unregistration.**

When an agent is unregistered, the `originalUseModels` entry persists, which could cause a minor memory leak if agents are frequently registered/unregistered. Consider adding cleanup.



```diff
   unregisterAgent(agentId: UUID): void {
     this.topicRegistry.unregisterAgent(agentId);
+    this.originalUseModels.delete(agentId);
   }
src/__tests__/pipeline/mp1-language.test.ts (1)

88-103: Tighten fallback test assertions for MP1 error behavior

Right now the fallback test only checks result.data when result.success is false, so it won’t catch a case where empty text is treated as a “successful” detection but still returns English defaults. If the contract is “on error, always provide English fallback,” consider asserting the expected success value explicitly, or moving the fallback checks outside the if (!result.success) guard so the test validates behavior in all code paths.

V2_COMPLETION_SUMMARY.md (1)

9-41: Consider minor markdown cleanup for lint compatibility

The content reads well, but markdownlint is flagging a couple of style points (MD036, MD040 in static analysis hints):

  • Several bolded lines (e.g., **1. NLP Utilities (Sprint 1)**, **2. Mini-Pipeline Framework (Sprints 2-3)**) are treated as “emphasis used instead of a heading.” If you want a clean lint run, you could convert these to actual headings like ### 1. NLP Utilities (Sprint 1)`.
  • The large file tree/code block doesn’t declare a language; adding ```text (or similar) after the backticks will address MD040 without changing rendering.

Purely cosmetic, but may help keep docs lint‑clean.

As per static_analysis_hints.

Also applies to: 86-128

src/pipeline/mini/topics.ts (1)

33-52: Use LanguageResult instead of any for MP1 await to improve type safety

The execution flow for MP4 looks correct, but ctx.await<any>('mp1-language') loses useful type checking even though LanguageResult is defined in pipeline/types.ts and already used in tests.

You could tighten this without changing behavior:

-import type { MiniPipelineResult, TopicsResult, PipelineContext } from '../types.ts';
+import type { MiniPipelineResult, TopicsResult, PipelineContext, LanguageResult } from '../types.ts';-        const languageResult = await ctx.await<any>('mp1-language');
+        const languageResult = await ctx.await<LanguageResult>('mp1-language');

This will catch mismatches between MP1’s output and MP4’s expectations at compile time while keeping the runtime logic the same.

FUTURE_ROADMAP.md (1)

101-128: Optional: add fence languages and tweak minor grammar for markdown linting

For a cleaner markdownlint run (per static_analysis_hints):

  • The fenced blocks that show diagrams/flows (e.g., under “Agent Energy/Fatigue Model” and the v1/v2 flow diagrams) don’t declare a language. Adding ```text to those will address MD040 without altering rendering.
  • A couple of phrases like “2 second gaps” and “decision making” could be hyphenated (“2‑second gaps”, “decision‑making”) if you want to appease grammar tools, but that’s purely stylistic.

Content is solid as-is; this is just for tooling friendliness.

As per static_analysis_hints.

Also applies to: 356-371

DESIGN_PHILOSOPHY.md (1)

5-29: Minor markdown style nits (headings vs emphasis, fence languages)

If you care about keeping markdownlint quiet (see static_analysis_hints):

  • Several bolded lines (e.g., "Use the cheapest tool that gets the job done.") are flagged as “emphasis used instead of a heading” (MD036). Converting some of these to proper headings (###) would clear that, but it’s optional.
  • A few code/diagram blocks (like the cost pyramid and test pyramid ASCII) don’t specify a language; adding ```text will satisfy MD040.

No content changes are needed; this is purely about markdown style and tooling.

As per static_analysis_hints.

Also applies to: 331-336

src/pipeline/mini/language.ts (1)

127-128: Misleading fallback detector name.

When language detection fails entirely and falls back to assuming English, the detectorUsed is set to 'plugin-translate'. This is misleading since plugin-translate wasn't actually used in the fallback path.

-        detectorUsed: 'plugin-translate',
+        detectorUsed: 'fallback-english',
src/__tests__/pipeline/context.test.ts (1)

138-164: Consider adding a timeout guard for the polling test.

This test relies on the ctx.await() polling mechanism completing within reasonable time after a 50ms delay. While this works in most cases, it could become flaky in slow CI environments if the polling interval is long.

Consider wrapping the await with a test timeout or increasing the delay margin to ensure robustness:

+    test('should wait for result to become available', async () => {
       const ctx = createPipelineContext(mockMessage, mockRuntime);
       
       const awaitPromise = ctx.await('mp-test');
       
-      setTimeout(() => {
+      setTimeout(() => {
         const mpResult: MiniPipelineResult<any> = {
           // ...
         };
         storeMPResult(ctx, mpResult);
-      }, 50);
+      }, 50);
       
       const result = await awaitPromise;
       expect(result).toEqual({ value: 'late' });
-    });
+    }, 5000); // Explicit test timeout
src/pipeline/mini/entity.ts (1)

39-46: Consider >= for limitHit detection to be defensive.

The current check uses === to detect when limits are hit. While this works correctly based on the current extractAllEntities implementation, using >= would be more defensive against future changes to the extraction logic.

         const limitHit =
-          entities.urls.length === ENTITY_LIMITS.maxUrls ||
-          entities.emails.length === ENTITY_LIMITS.maxEmails ||
-          entities.mentions.length === ENTITY_LIMITS.maxMentions ||
-          entities.addresses.length === ENTITY_LIMITS.maxAddresses ||
-          entities.uuids.length === ENTITY_LIMITS.maxUuids ||
-          entities.codeBlocks.length === ENTITY_LIMITS.maxCodeBlocks ||
-          entities.names.length === ENTITY_LIMITS.maxNames;
+          entities.urls.length >= ENTITY_LIMITS.maxUrls ||
+          entities.emails.length >= ENTITY_LIMITS.maxEmails ||
+          entities.mentions.length >= ENTITY_LIMITS.maxMentions ||
+          entities.addresses.length >= ENTITY_LIMITS.maxAddresses ||
+          entities.uuids.length >= ENTITY_LIMITS.maxUuids ||
+          entities.codeBlocks.length >= ENTITY_LIMITS.maxCodeBlocks ||
+          entities.names.length >= ENTITY_LIMITS.maxNames;
src/pipeline/mini/action-provider.ts (1)

56-97: Document that this is placeholder matching logic.

The comment on line 57 mentions "simple keyword matching - will be enhanced with registry", but this could be clearer. Consider adding a TODO or more explicit documentation about the expected future behavior.

         // Match entities to actions/providers
-        // For now, simple keyword matching - will be enhanced with registry
+        // TODO: Replace hardcoded rules with registry-based matching
+        // Current implementation is a bootstrap placeholder
         const matchedActions: ActionProviderResult['matchedActions'] = [];
src/pipeline/mini/agent-match.ts (2)

59-60: Unused MP4 result.

The code awaits mp4-topics but doesn't use the returned result. If topic keywords from MP4 are needed for relevance calculation, capture and use them. Otherwise, consider documenting why the await is still necessary (e.g., for ordering/synchronization).

-        // Get keywords from MP4 (not used in current implementation but available for future)
-        await ctx.await<any>('mp4-topics');
+        // Ensure MP4 completes before scoring (topic data used by topicRegistry)
+        const topicResult = await ctx.await<any>('mp4-topics');
+        // topicResult contains extracted keywords that could enhance matching

47-51: Apply stronger typing for registry parameters using actual registry interfaces.

The registries topicRegistry and nameRegistry are correctly typed as TopicRegistry and NameVariationRegistry in the codebase. Both types are properly exported and contain the required methods (getAllAgents, calculateTopicRelevance, and getVariations). Replacing any with these concrete types will improve type safety.

 export async function executeMP5(
   ctx: PipelineContext,
-  topicRegistry: any,
-  nameRegistry: any
+  topicRegistry: TopicRegistry,
+  nameRegistry: NameVariationRegistry
 ): Promise<MiniPipelineResult<AgentMatchResult>> {
src/pipeline/mini/complexity.ts (2)

30-38: Consider documenting the undeclared MP7 dependency.

MP6_DEPENDENCIES only declares 'mp4-topics', but executeMP6 also awaits 'mp7-entity' (line 58). While the code handles MP7 being unavailable gracefully, the declared dependencies don't reflect this relationship. If MP7 is truly optional, consider adding a comment explaining why it's not in the dependency array.

 /**
- * MP6 dependencies: topics (for depth analysis)
+ * MP6 dependencies: topics (for depth analysis)
+ * Note: MP7 (entity) is also used if available, but is optional - not declared as a dependency
  */
 export const MP6_DEPENDENCIES = ['mp4-topics'];

54-66: Consider adding type annotations for awaited MP results.

Using any for the awaited results (lines 54, 58) sacrifices type safety. Additionally, line 66 accesses entities?.entities.codeBlocks which suggests a nested structure—verify this matches the actual EntityResult shape from MP7.

If the MP result types are defined, consider:

-        const topicsResult = await ctx.await<any>('mp4-topics');
+        const topicsResult = await ctx.await<TopicsResult>('mp4-topics');
         const keywords = topicsResult?.keywords || [];
         
-        const entityResult = await ctx.await<any>('mp7-entity');
+        const entityResult = await ctx.await<EntityResult>('mp7-entity');
         const entities = entityResult?.entities;
src/pipeline/preprocessor.ts (1)

47-55: Consider adding proper types for registry dependencies.

The dependencies parameter uses any for all registry types. If registry interfaces exist (e.g., TopicRegistry, NameRegistry, ActionProviderRegistry), consider using them for better type safety.

   dependencies: {
-    topicRegistry: any;
-    nameRegistry: any;
-    actionProviderRegistry?: any;
+    topicRegistry: TopicRegistry;
+    nameRegistry: NameRegistry;
+    actionProviderRegistry?: ActionProviderRegistry;
   }
README.md (3)

51-80: Consider adding language identifiers to fenced code blocks.

Multiple code blocks in this README lack language identifiers (e.g., lines 51, 68, 136, 162, etc.). Adding language hints like ```text or ```plaintext improves syntax highlighting and satisfies markdown linting rules.

Example fix for the code block at line 51:

-```
+```text
 1 message arrives

116-127: Redundant horizontal rules.

Lines 116-127 contain multiple consecutive --- separators with empty sections between them. This creates visual noise without adding structure.

 ---
 
----
-
 ## 📊 Concrete Examples
 
 ---
-
-## Real-World Scenarios (Detailed)
-
----
-
----

913-917: Test statistics may become stale.

The README states "267 tests (245 passing)" which implies 22 failing tests. Hardcoding test counts in documentation can quickly become outdated. Consider either removing specific counts or noting these are approximate/as-of-date figures.

src/pipeline/profiler.ts (1)

115-124: Unused parameter _thresholdMs.

The _thresholdMs parameter is defined but never used. The function relies on prof.bottleneck which was computed using the config threshold during profiling. Either remove this parameter or use it to re-filter profiles.

 export function identifyBottlenecks(
   profiles: Record<string, MiniPipelineProfile>,
-  _thresholdMs = 100,
   excludeNames: string[] = ['translation'] // Exclude known slow operations
 ): string[] {
src/nlp/entities.ts (1)

100-111: The RegExp construction from constant is safe but inefficient.

The static analysis flags new RegExp(PATTERNS.mention) as potentially dangerous. This is a false positive since PATTERNS.mention is a module-level constant, not user input. However, creating a new RegExp object on each call is unnecessary overhead.

Consider using matchAll with the original regex to avoid constructing new objects:

 export function extractMentions(text: string, limit = ENTITY_LIMITS.maxMentions): string[] {
-  const matches: string[] = [];
-  let match;
-  
-  const pattern = new RegExp(PATTERNS.mention);
-  while ((match = pattern.exec(text)) !== null) {
-    if (matches.length >= limit) break;
-    matches.push(match[1]); // Extract just the username without @
-  }
-  
-  return [...new Set(matches)];
+  const allMatches = [...text.matchAll(PATTERNS.mention)];
+  const usernames = allMatches.map((match) => match[1]);
+  return [...new Set(usernames)].slice(0, limit);
 }
src/registry/action-provider-registry.ts (1)

104-152: Consider resetting cached actions/providers on re‑compile

compile appends compiled actions/providers into this.state.actions and this.state.providers without clearing existing entries. If you ever re-run compile after plugins are removed or action/provider sets change, stale entries will linger.

If re-compilation is expected (e.g., after fileHash invalidation), consider clearing or rebuilding the maps:

  async compile(runtime: IAgentRuntime): Promise<void> {
@@
-    try {
+    try {
+      // Rebuild from scratch to avoid stale entries
+      this.state.actions.clear();
+      this.state.providers.clear();

This keeps the registry consistent with the current runtime snapshot.

src/pipeline/mini/character.ts (1)

44-147: MP9 character context assembly is consistent with dependencies and types

The flow from MP4/MP5 results → per‑agent filtering → current‑agent deep extraction (topics, knowledge, style, adjectives, traits) → skeletons for other agents matches the documented design and CharacterContext shape.

The only behavioral nuance is non‑determinism from Math.random() when shuffling adjectives; if tests ever become flaky, consider injecting an overridable RNG, but it’s fine for now.

src/pipeline/mini/classify.ts (1)

48-137: Heuristic classification is coherent and matches the current MessageAction contract

The pattern set and decision tree are consistent with the documented goal of cheap, high‑coverage heuristics. Mapping “stop/goodbye” intents to ignore (given there’s no stop action in MessageAction) is explicitly called out and implemented correctly.

If you later decide that “urgent” semantics should override @mention handling, you may want to move the PATTERNS.urgent branch above the hasAddressedName check, but that’s a tuning question rather than a bug.

src/nlp/similarity.ts (1)

118-223: Levenshtein and combined similarity utilities are straightforward

levenshteinDistance/levenshteinSimilarity, calculateSimilarity, and findSimilarStrings are implemented in a clear, idiomatic way and match their documented contracts. For typical (short) strings, the O(n²) matrix is fine, and weighting logic is easy to tune via the weights parameter.

No changes required.

src/pipeline/context.ts (1)

132-213: Multi‑message context manager behaves correctly but only cleans up completed messages

MultiMessageContextManager correctly coalesces multiple agent events for the same message via getMessageKey, tracks expectedAgentCount and receivedAgentIds, and deletes contexts once all expected agents have checked in.

Currently, cleanup(_maxAgeMs) ignores age and only removes “complete” contexts. If there’s any chance expectedAgentCount is set too high or some agents never process, you may eventually want to timestamp contexts and honor _maxAgeMs to prevent leaks, but the current behavior matches the inline docs.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 54a07a9 and db813ba.

📒 Files selected for processing (50)
  • ARCHITECTURE_V2.md (1 hunks)
  • DESIGN_PHILOSOPHY.md (1 hunks)
  • DEVELOPER_GUIDE.md (1 hunks)
  • FUTURE_ROADMAP.md (1 hunks)
  • NLP_UTILITIES.md (1 hunks)
  • PIPELINE.md (1 hunks)
  • README.md (3 hunks)
  • TEST_SUMMARY.md (1 hunks)
  • V2_COMPLETION_SUMMARY.md (1 hunks)
  • V2_FINAL_SUMMARY.md (1 hunks)
  • V2_IMPLEMENTATION_STATUS.md (1 hunks)
  • V2_TESTS_COMPLETE.md (1 hunks)
  • WHY_DOCUMENTATION.md (1 hunks)
  • package.json (1 hunks)
  • src/__tests__/nlp/entities.test.ts (1 hunks)
  • src/__tests__/nlp/keywords.test.ts (1 hunks)
  • src/__tests__/nlp/phonetic.test.ts (1 hunks)
  • src/__tests__/nlp/similarity.test.ts (1 hunks)
  • src/__tests__/pipeline/context.test.ts (1 hunks)
  • src/__tests__/pipeline/mp1-language.test.ts (1 hunks)
  • src/__tests__/pipeline/mp2-classify.test.ts (1 hunks)
  • src/__tests__/pipeline/mp7-entity.test.ts (1 hunks)
  • src/__tests__/pipeline/phase0-integration.test.ts (1 hunks)
  • src/name-variation-registry.ts (1 hunks)
  • src/nlp/entities.ts (1 hunks)
  • src/nlp/index.ts (1 hunks)
  • src/nlp/keywords.ts (1 hunks)
  • src/nlp/phonetic.ts (1 hunks)
  • src/nlp/similarity.ts (1 hunks)
  • src/pipeline/context.ts (1 hunks)
  • src/pipeline/index.ts (1 hunks)
  • src/pipeline/mini/action-provider.ts (1 hunks)
  • src/pipeline/mini/agent-match.ts (1 hunks)
  • src/pipeline/mini/character.ts (1 hunks)
  • src/pipeline/mini/classify.ts (1 hunks)
  • src/pipeline/mini/complexity.ts (1 hunks)
  • src/pipeline/mini/embedding.ts (1 hunks)
  • src/pipeline/mini/entity.ts (1 hunks)
  • src/pipeline/mini/language.ts (1 hunks)
  • src/pipeline/mini/topics.ts (1 hunks)
  • src/pipeline/preprocessor.ts (1 hunks)
  • src/pipeline/profiler.ts (1 hunks)
  • src/pipeline/types.ts (1 hunks)
  • src/planning-engine.ts (2 hunks)
  • src/registry/action-provider-registry.ts (1 hunks)
  • src/registry/index.ts (1 hunks)
  • src/registry/types.ts (1 hunks)
  • src/services/autonomous-message-service.ts (4 hunks)
  • src/services/resource-tracker.ts (2 hunks)
  • src/topic-registry.ts (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (16)
src/pipeline/mini/agent-match.ts (4)
src/pipeline/types.ts (4)
  • PipelineContext (41-55)
  • MiniPipelineResult (29-36)
  • AgentMatchResult (130-134)
  • AgentScore (136-144)
src/pipeline/profiler.ts (1)
  • profile (37-94)
src/nlp/phonetic.ts (1)
  • phoneticSimilarity (67-105)
src/nlp/similarity.ts (1)
  • calculateSimilarity (183-204)
src/pipeline/mini/embedding.ts (3)
src/pipeline/types.ts (2)
  • MiniPipelineResult (29-36)
  • EmbeddingResult (110-114)
src/pipeline/profiler.ts (1)
  • profile (37-94)
src/priority.ts (1)
  • calculatePriority (28-106)
src/__tests__/pipeline/mp2-classify.test.ts (1)
src/pipeline/types.ts (2)
  • PipelineContext (41-55)
  • LanguageResult (77-88)
src/pipeline/mini/action-provider.ts (3)
src/pipeline/index.ts (2)
  • executeMP8 (27-27)
  • profile (17-17)
src/pipeline/types.ts (3)
  • PipelineContext (41-55)
  • MiniPipelineResult (29-36)
  • ActionProviderResult (178-189)
src/pipeline/profiler.ts (1)
  • profile (37-94)
src/__tests__/pipeline/mp7-entity.test.ts (1)
src/pipeline/types.ts (1)
  • PipelineContext (41-55)
src/pipeline/mini/complexity.ts (3)
src/pipeline/types.ts (3)
  • PipelineContext (41-55)
  • MiniPipelineResult (29-36)
  • ComplexityResult (149-164)
src/pipeline/profiler.ts (1)
  • profile (37-94)
src/types.ts (1)
  • PipelineTier (12-12)
src/__tests__/pipeline/context.test.ts (2)
src/pipeline/types.ts (1)
  • MiniPipelineResult (29-36)
src/pipeline/context.ts (1)
  • getMPResult (93-98)
src/__tests__/pipeline/mp1-language.test.ts (1)
src/pipeline/types.ts (1)
  • PipelineContext (41-55)
src/pipeline/mini/topics.ts (5)
src/pipeline/index.ts (2)
  • executeMP4 (23-23)
  • profile (17-17)
src/pipeline/types.ts (3)
  • PipelineContext (41-55)
  • MiniPipelineResult (29-36)
  • TopicsResult (119-125)
src/pipeline/profiler.ts (1)
  • profile (37-94)
src/nlp/index.ts (2)
  • extractKeywords (28-28)
  • extractTopKeywords (30-30)
src/nlp/keywords.ts (2)
  • extractKeywords (81-117)
  • extractTopKeywords (160-168)
src/nlp/keywords.ts (1)
src/nlp/index.ts (7)
  • KeywordAnalysis (33-33)
  • extractKeywords (28-28)
  • KeywordFrequency (34-34)
  • extractKeywordsWithFrequency (29-29)
  • extractTopKeywords (30-30)
  • calculateKeywordOverlap (31-31)
  • containsKeywords (32-32)
src/pipeline/mini/classify.ts (3)
src/pipeline/types.ts (3)
  • PipelineContext (41-55)
  • MiniPipelineResult (29-36)
  • ClassifyResult (93-105)
src/pipeline/profiler.ts (1)
  • profile (37-94)
src/types.ts (1)
  • MessageAction (13-13)
src/pipeline/mini/entity.ts (3)
src/pipeline/types.ts (2)
  • MiniPipelineResult (29-36)
  • EntityResult (169-173)
src/pipeline/profiler.ts (1)
  • profile (37-94)
src/nlp/entities.ts (2)
  • extractAllEntities (189-199)
  • ENTITY_LIMITS (28-37)
src/registry/action-provider-registry.ts (2)
src/registry/types.ts (4)
  • RegistryState (44-50)
  • CompiledAction (21-28)
  • ParameterHint (10-16)
  • CompiledProvider (33-39)
src/nlp/keywords.ts (1)
  • extractKeywords (81-117)
src/pipeline/profiler.ts (2)
src/types.ts (1)
  • IResourceTracker (251-297)
src/pipeline/types.ts (1)
  • MiniPipelineProfile (18-24)
src/pipeline/types.ts (3)
src/types.ts (2)
  • MessageAction (13-13)
  • PipelineTier (12-12)
src/nlp/similarity.ts (1)
  • SimilarityScore (170-175)
src/nlp/entities.ts (1)
  • ExtractedEntities (42-50)
src/pipeline/context.ts (1)
src/pipeline/types.ts (3)
  • SharedTextAnalysis (61-68)
  • PipelineContext (41-55)
  • MiniPipelineResult (29-36)
🪛 ast-grep (0.40.0)
src/nlp/entities.ts

[warning] 103-103: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(PATTERNS.mention)
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)

🪛 Biome (2.1.2)
src/nlp/phonetic.ts

[error] 24-24: Unexpected control character in a regular expression.

Control characters are unusual and potentially incorrect inputs, so they are disallowed.

(lint/suspicious/noControlCharactersInRegex)

src/pipeline/profiler.ts

[error] 89-90: Unsafe usage of 'return'.

'return' in 'finally' overwrites the control flow statements inside 'try' and 'catch'.

(lint/correctness/noUnsafeFinally)


[error] 93-95: This code will never be reached ...

... because either this statement will throw an exception ...

... or this statement will return from the function beforehand

(lint/correctness/noUnreachable)

🪛 LanguageTool
PIPELINE.md

[grammar] ~54-~54: Ensure spelling is correct
Context: ...Performance:* - Language detection: 0-5ms (very fast) - Translation: 500-2000ms (...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

DEVELOPER_GUIDE.md

[grammar] ~171-~171: Ensure spelling is correct
Context: ...mise.all - MP1 (no deps) completes in 0-5ms - MP2 (depends on MP1) waits ~5ms, then pr...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~469-~469: Ensure spelling is correct
Context: ... 4 MPs with no deps run simultaneously (0ms overhead) - 5 MPs with deps wait for th...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

FUTURE_ROADMAP.md

[grammar] ~124-~124: Use a hyphen to join words.
Context: ...s should reply: - Stagger responses (2 second gaps) - Make them reference eac...

(QB_NEW_EN_HYPHEN)


[grammar] ~240-~240: Use a hyphen to join words.
Context: ...determines action - Distributed decision making ### Room-Specific Personas Same...

(QB_NEW_EN_HYPHEN)

README.md

[grammar] ~880-~880: Ensure spelling is correct
Context: ...ynamic slice duration based on load (50-500ms adaptive) - Priority express lane for c...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

V2_COMPLETION_SUMMARY.md

[grammar] ~192-~192: Ensure spelling is correct
Context: ...y Bottleneck Detection: - Any MP > 100ms flagged - Translation excluded from bot...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~192-~192: Ensure spelling is correct
Context: ...ck Detection:** - Any MP > 100ms flagged - Translation excluded from bottleneck cal...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[style] ~214-~214: This adverb was used twice in the sentence. Consider removing one of them or replacing them with a synonym.
Context: ...ssing data gracefully - System degrades gracefully under failure --- ## Next Developer T...

(ADVERB_REPETITION_PREMIUM)

WHY_DOCUMENTATION.md

[grammar] ~71-~71: Use a hyphen to join words.
Context: ...n optimal for name matching Why max 4 character prefix? - Research shows 4 c...

(QB_NEW_EN_HYPHEN)


[grammar] ~158-~158: Use a hyphen to join words.
Context: ...- Multi-factor more accurate than single factor - "babylon.market" → Cassandra (t...

(QB_NEW_EN_HYPHEN)


[grammar] ~266-~266: Ensure spelling is correct
Context: ...gets clear Why bottleneck threshold (100ms)? - Most MPs are <5ms - >100ms is out...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~412-~412: Ensure spelling is correct
Context: ...%: Refine heuristics in MP2 3. If MP5 > 100ms frequently: Pre-compute phonetic encodi...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

V2_FINAL_SUMMARY.md

[style] ~47-~47: The words ‘explained’ and ‘explanations’ are quite similar. Consider replacing ‘explained’ with a different word.
Context: ...- WHY behind every decision, trade-offs explained 6. WHY_DOCUMENTATION.md - Index of ...

(VERB_NOUN_SENT_LEVEL_REP)

🪛 markdownlint-cli2 (0.18.1)
PIPELINE.md

153-153: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


188-188: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

DEVELOPER_GUIDE.md

10-10: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


18-18: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


82-82: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


315-315: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


332-332: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


348-348: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


454-454: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


474-474: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

FUTURE_ROADMAP.md

101-101: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


251-251: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

README.md

51-51: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


68-68: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


80-80: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


136-136: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


162-162: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


190-190: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


217-217: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


285-285: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


319-319: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


391-391: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


415-415: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


444-444: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


470-470: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


496-496: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


515-515: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


538-538: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


849-849: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


855-855: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


861-861: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


868-868: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

V2_COMPLETION_SUMMARY.md

9-9: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


16-16: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


23-23: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


29-29: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


35-35: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


86-86: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

V2_TESTS_COMPLETE.md

5-5: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


250-250: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

TEST_SUMMARY.md

5-5: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


20-20: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


41-41: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


76-76: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

DESIGN_PHILOSOPHY.md

9-9: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


16-16: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


23-23: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


29-29: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


35-35: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


86-86: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

V2_FINAL_SUMMARY.md

34-34: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


63-63: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


171-171: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


300-300: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


520-520: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

🔇 Additional comments (62)
V2_FINAL_SUMMARY.md (1)

1-11: Well-structured implementation summary.

The documentation provides excellent coverage of the v2 architecture, including clear success criteria, rollout strategy, and developer guidance. The claims about dependencies (double-metaphone, stopword) align with package.json changes.

src/topic-registry.ts (1)

207-233: Verify intentional difference in empty-input handling.

The method returns 0 when keywords.length === 0 (line 214), but returns 50 when the agent has no topics defined (line 210). This differs from calculateTopicRelevance which returns 50 for both empty cases.

If intentional (no keywords = no relevance, no topics = unknown/neutral), consider adding a brief comment explaining this design choice for clarity.

src/services/resource-tracker.ts (1)

394-400: Formatting improvement for readability.

The line break changes improve code readability without altering behavior.

package.json (1)

27-28: New NLP dependencies are appropriate and well-maintained.

The double-metaphone (2.0.1, pinned to latest) and stopword (3.1.5, actively maintained as of June 2025) packages are stable, established libraries for phonetic encoding and keyword extraction. Both show strong health metrics with no reported security vulnerabilities. While double-metaphone was last updated in November 2022, it remains stable and suitable for this use case.

src/__tests__/pipeline/mp7-entity.test.ts (3)

9-33: Well-structured mock context setup.

The mock context properly implements the PipelineContext interface requirements with necessary properties. The use of as any for the runtime mock is acceptable for test isolation purposes.


35-93: Good coverage of entity extraction types.

The tests cover URLs, mentions, blockchain addresses, emails, code blocks, and names extraction with appropriate assertions. Each test validates both success status and extracted content.


106-116: Limit handling test is well-designed.

The test correctly verifies that when 25 URLs are provided (exceeding the 20-item limit), the limitHit flag is set and the returned URLs are capped at 20. This validates the entity extraction safety limits.

src/__tests__/nlp/phonetic.test.ts (3)

10-28: Solid encoding tests with edge case coverage.

The tests properly validate basic encoding, empty string handling, and non-alphabetic character cleaning. The O'Brien/OBrien test is a good real-world edge case.


47-51: Good documentation of Double Metaphone limitations.

The test explicitly documents that Double Metaphone doesn't handle non-ASCII characters (returning 0.0 for José/Jose and 田中/Tanaka comparisons). This is valuable for API consumers to understand the limitations.


67-92: Comprehensive findPhoneticMatches tests.

Tests cover finding matches, empty results, and sort order verification. The sorting test at lines 84-91 correctly validates descending similarity order.

TEST_SUMMARY.md (2)

88-96: Clear documentation of pre-existing test failures.

The summary correctly identifies that the 7 failing tests are from earlier work and explains their original purpose. The recommendation to update or remove them is appropriate.


116-135: Good visibility into test coverage gaps.

The document transparently lists MPs without tests (MP3-MP6, MP8-MP9) and integration test gaps. This provides a clear roadmap for future test work.

src/services/autonomous-message-service.ts (4)

54-59: Good defensive approach to prevent nested proxy wrapping.

The inline comments clearly explain the problem being solved: storing the original useModel once at registration prevents nested proxies when withMaxTokens() is called multiple times. The Map-based per-agent storage is appropriate.


1154-1158: Original useModel correctly stored at registration time.

Using bind(runtime) ensures the stored function maintains the correct this context. The conditional check prevents overwriting if already registered.


1219-1230: Defensive fallback handles unregistered agents gracefully.

The fallback path logs a warning and executes without proxying if the original isn't stored. This prevents failures while surfacing the unexpected state for debugging.


1336-1339: Restoration to stored original prevents proxy accumulation.

The finally block correctly restores to the stored original rather than a captured "current" value, ensuring consistent state even with nested or overlapping calls.

src/__tests__/nlp/similarity.test.ts (4)

11-44: Comprehensive Jaro-Winkler similarity tests.

Good coverage of identical strings, empty string handling, similar strings, common prefix bonus, and case sensitivity. The test at line 17 correctly expects 1.0 for two empty strings (they are identical).


46-66: Levenshtein distance tests cover key edit operations.

Tests validate substitution, insertion, and deletion operations correctly. The classic examples (kitten→sitting, saturday→sunday) are good test cases.


89-116: calculateSimilarity tests validate weighting behavior.

The tests verify default weights, custom weights, and that different weight configurations produce different results. This is important for ensuring the weighting logic works correctly.


118-151: findSimilarStrings tests validate filtering and sorting.

Tests verify threshold filtering, sorting by weighted score, and empty results when nothing matches. The sorting verification loop at lines 136-140 is a robust way to check ordering.

src/__tests__/pipeline/mp2-classify.test.ts (1)

38-196: Solid coverage of MP2 classification behaviors

This suite exercises the key MP2 paths (greeting/question/command/thanks/mention/urgent/unclear, short-message and reasoning heuristics, plus MP1 dependency and profiling), and the mock await('mp1-language') usage aligns with the LanguageResult shape. No issues spotted here.

src/__tests__/nlp/keywords.test.ts (1)

10-197: Keyword utility tests look comprehensive and aligned with implementation

These tests exercise core behaviors (stopword removal, min length, multilingual handling, frequency sorting/normalization, overlap, and case-insensitive containment). They match the keywords.ts API and should give good regression protection.

src/registry/index.ts (1)

1-8: Registry barrel exports look correct

Centralizing ActionProviderRegistry, createActionProviderRegistry, and the associated types here gives a clean public API for the registry module. No issues from this file.

src/pipeline/mini/language.ts (1)

26-141: LGTM on the overall MP1 structure.

The failable pattern with English fallback is well-implemented. The profiling integration is correct, and the metadata caching for reusing prior detection results is a nice optimization.

src/__tests__/pipeline/context.test.ts (2)

249-266: Good test coverage for originalId deduplication.

This test correctly verifies that messages with different UUIDs but the same originalId (e.g., swizzled per-agent copies of the same Discord message) share the same context. This is critical for the multi-agent coordination feature.


1-299: Comprehensive test coverage for pipeline context module.

The tests cover all key behaviors: context creation, result storage/retrieval, async awaiting with polling, and the MultiMessageContextManager lifecycle. The mocking approach is clean and appropriate.

src/pipeline/mini/entity.ts (1)

77-79: Correct error handling for non-failable MP.

Throwing an error is the right approach for a non-failable mini-pipeline, as it signals that the pipeline cannot proceed without this result.

src/pipeline/mini/embedding.ts (1)

65-81: LGTM on error handling for failable MP.

The catch block correctly returns a failed result with the error message, allowing the pipeline to continue without blocking on embedding queue failures.

src/pipeline/mini/action-provider.ts (1)

110-134: LGTM on the overall MP8 structure.

The failable pattern is correctly implemented with proper error handling, and the dependency on MP7 is properly awaited.

NLP_UTILITIES.md (1)

1-346: Excellent documentation for the NLP utilities module.

The API reference is comprehensive, well-structured, and includes practical usage examples. The limitations section is particularly helpful for developers to understand the constraints of phonetic matching with non-ASCII text.

src/pipeline/mini/agent-match.ts (2)

1-21: Excellent inline documentation explaining design rationale.

The WHY comments provide valuable context for future maintainers, explaining the cost savings from early filtering and the multi-factor relevance approach.


164-166: Correct error handling for non-failable MP.

Throwing with context about the MP being non-failable is appropriate and will surface issues clearly during debugging.

src/pipeline/mini/complexity.ts (2)

1-28: LGTM! Clear documentation and proper imports.

The module-level documentation effectively explains the purpose of complexity estimation and why phase2Hint exists. The imports are minimal and appropriate.


141-184: LGTM! Clean decision logic for Phase 2 hints.

The function uses clear early-return patterns with well-documented thresholds. The decision criteria align with the complexity estimation logic above.

src/nlp/index.ts (1)

1-52: LGTM! Well-organized barrel module.

Clean re-exports with appropriate section comments. The consolidated API surface makes the NLP utilities easy to consume.

src/__tests__/nlp/entities.test.ts (4)

1-14: LGTM! Proper test imports.

Imports are clean and include all necessary utilities and constants from the entities module.


98-129: Verify the Ethereum address regex pattern in the test.

Line 104's regex ^0x[a-fA-F0-9]{39,42}$ expects 39-42 hex characters after 0x, but Ethereum addresses are exactly 40 hex characters (42 total including 0x). This test may pass for wrong reasons or fail unexpectedly.

-      expect(addresses[0]).toMatch(/^0x[a-fA-F0-9]{39,42}$/);
+      expect(addresses[0]).toMatch(/^0x[a-fA-F0-9]{40}$/);

155-187: LGTM! Good coverage of code block extraction.

Tests cover both fenced and inline code blocks, with proper limit enforcement including the byte limit for large code blocks.


268-289: LGTM! Presence and summary tests are appropriate.

hasEntities correctly tests the OR logic for entity type detection, and getEntitySummary validates the count structure.

src/pipeline/preprocessor.ts (5)

1-40: LGTM! Excellent documentation and clean imports.

The module-level documentation clearly explains the execution pattern, timing expectations, and design rationale. Imports are well-organized by category.


66-84: LGTM! Clean parallel execution pattern.

The wave-based parallelism with internal dependency resolution via ctx.await() is well-designed. MPs start together and naturally serialize where needed.


86-131: LGTM! Comprehensive result collection and profiling.

The profiling aggregation with bottleneck detection (excluding the known-slow translation MP) provides good observability for pipeline optimization.


133-211: LGTM! Safe fallback defaults for failed MPs.

The fallback values are well-chosen—defaulting to action: 'ignore' and shouldProcess: false ensures the system fails safely when MPs encounter errors.


216-221: LGTM! Clean public API exports.

The re-exports provide a clear public surface for the preprocessor module.

src/registry/types.ts (2)

10-16: LGTM with minor note on possibleValues typing.

The ParameterHint interface is well-structured. The possibleValues?: any[] provides flexibility but sacrifices type safety. This is acceptable for a generic registry, but if specific value types are known at compile time, a generic type parameter could be considered in the future.


21-50: LGTM! Well-designed registry types.

CompiledAction, CompiledProvider, and RegistryState provide a clean model for precompiled metadata. The fileHash for cache invalidation and compiling flag for coordination are thoughtful additions.

src/nlp/phonetic.ts (4)

23-25: The control character regex is intentional and correct.

The static analysis tool flags /[^\x00-\x7F]/ as containing control characters. This is a false positive — the regex intentionally matches characters outside the ASCII range (0x00-0x7F) to detect non-ASCII input. This is a standard pattern for ASCII detection.


37-51: LGTM - Clean phonetic encoding implementation.

The function properly cleans input (lowercase, alphabetic only), handles empty strings gracefully, and correctly normalizes the secondary encoding to null when absent. The WHY comments explain the design rationale well.


67-105: LGTM - Well-structured similarity scoring.

The tiered scoring (1.0 → primary match, 0.9 → primary-secondary cross-match, 0.8 → secondary-secondary match) provides meaningful differentiation. Early return for non-ASCII correctly signals to callers that they should use alternative algorithms.


117-129: LGTM - Functional composition for phonetic matching.

findPhoneticMatches correctly composes phoneticSimilarity, filters by threshold, and sorts by descending similarity. The implementation is concise and correct.

src/pipeline/profiler.ts (1)

153-193: LGTM - Summary statistics implementation.

The summarizeProfiles function correctly handles empty profiles, computes aggregate metrics, and identifies the slowest profile. The implementation is straightforward and efficient.

src/nlp/keywords.ts (4)

18-54: LGTM - Good language coverage with sensible fallback.

The STOPWORD_MAP covers major languages, and falling back to English is a reasonable default. The WHY comments explaining the 11-language coverage are helpful.


81-117: LGTM - Robust keyword extraction.

The implementation handles empty input, preserves word-internal punctuation (apostrophes, hyphens), applies language-specific stopwords with fallback, and returns useful metadata about the extraction process.


174-189: LGTM - Correct Jaccard similarity implementation.

The overlap calculation properly handles edge cases (empty keyword sets return 0) and uses the standard Jaccard formula: intersection size / union size.


194-211: LGTM - Case-insensitive keyword matching.

The containsKeywords function correctly lowercases both the extracted keywords and search keywords before comparison, ensuring consistent matching behavior.

src/nlp/entities.ts (2)

55-79: Ethereum address regex is intentionally permissive.

The comment notes "39-42 hex chars for flexibility" which allows some tolerance. Standard Ethereum addresses are exactly 40 hex chars. This flexibility may be intentional for edge cases, but could also match invalid addresses.

Consider whether strict validation (exactly 40 chars) would be preferable:

-  // Ethereum/hex addresses (0x followed by 39-42 hex chars for flexibility)
-  ethAddress: /\b0x[a-fA-F0-9]{39,42}\b/g,
+  // Ethereum/hex addresses (0x followed by exactly 40 hex chars)
+  ethAddress: /\b0x[a-fA-F0-9]{40}\b/g,

186-199: LGTM - Clean aggregation of entity extraction.

extractAllEntities composes all individual extractors with configurable limits. The implementation is straightforward and maintains consistent resource protection across all entity types.

src/registry/action-provider-registry.ts (1)

157-205: Action compilation and parameter hint extraction look sound

Keyword/topic extraction, complexity estimation, and simple URL/address parameter hint heuristics are coherent and defensive (null‑checks, deduplication). The try/catch with structured logging ensures bad actions fail soft without breaking the whole compile.

No blocking issues here.

src/pipeline/mini/character.ts (1)

173-197: Trait extraction helper is straightforward and type‑safe

extractRelevantTraits cleanly scans bio and lore arrays using containsKeywords, guards shapes, and caps the output to 5 entries. This keeps traits bounded and avoids surprising growth as characters evolve.

No changes needed.

src/pipeline/types.ts (1)

170-231: Phase 0 MP result wiring looks internally consistent

Phase0Result’s fields (language, classification, embedding, keywords, agentScores, complexity, entities, actionProviderMatches, characterContext, decision) align with the individual MP result types defined above and with the mini‑pipeline modules that consume/produce them.

No structural problems spotted here; this should give the preprocessor a clear, typed contract.

src/pipeline/index.ts (1)

10-28: Central pipeline barrel exports are well-organized

The index cleanly re-exports the preprocessor surface, all pipeline types, profiler utilities, and the individual MPs (MP1–MP9). Paths look consistent with the mini‑pipeline modules and will simplify downstream imports.

Looks good as a single entry point.

src/pipeline/context.ts (1)

25-78: PipelineContext and ctx.await implementation match the documented contract

The context wiring (message/runtime IDs, shared results map, and await<T> with a 5s timeout) behaves as described in the comments and the PipelineContext type. Failed MPs and timeouts both surface as null with appropriate logging, which is what dependent MPs can reasonably handle.

For now, the 10ms polling interval is a pragmatic choice given the small number of MPs.

Comment on lines +303 to +310
const messageText = 'Send 1 ETH to 0x742d35Cc6634C0532925a3b844Bc9e7595f0bEb';
const entities = extractAllEntities(messageText);

if (entities.addresses.length > 0) {
// Suggest transfer action
const suggestedAction = 'transfer-token';
const parameters = { address: entities.addresses[0], amount: '1' };
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Truncated Ethereum address in example.

The example Ethereum address 0x742d35Cc6634C0532925a3b844Bc9e7595f0bEb is 41 characters instead of the standard 42 characters (0x + 40 hex chars). This could confuse readers or cause copy-paste issues.

-const messageText = 'Send 1 ETH to 0x742d35Cc6634C0532925a3b844Bc9e7595f0bEb';
+const messageText = 'Send 1 ETH to 0x742d35Cc6634C0532925a3b844Bc454e83bDe7595';

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In NLP_UTILITIES.md around lines 303–310 the example Ethereum address is missing
one hex character (it is 41 chars total), so replace the truncated address with
a valid 42-character Ethereum address (i.e., "0x" followed by 40 hex
characters); update the example in that block to use a correct full address (you
can append a single hex digit to the existing string or swap in a known
canonical example) and verify the surrounding text or examples still parse/copy
correctly.

Comment on lines +225 to +245
describe('extractAllEntities', () => {
test('should extract all entity types', () => {
const text = `
Contact @alice at alice@example.com
Website: https://example.com
Address: 0x742d35Cc6634C0532925a3b844Bc9e7595f0bEb
ID: 550e8400-e29b-41d4-a716-446655440000
Code: \`console.log('test')\`
People: Bob, Charlie
`;

const entities = extractAllEntities(text);

expect(entities.mentions.length).toBeGreaterThan(0);
expect(entities.emails.length).toBeGreaterThan(0);
expect(entities.urls.length).toBeGreaterThan(0);
expect(entities.addresses.length).toBeGreaterThan(0);
expect(entities.uuids.length).toBeGreaterThan(0);
expect(entities.codeBlocks.length).toBeGreaterThan(0);
expect(entities.names.length).toBeGreaterThan(0);
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Test data contains a truncated Ethereum address.

Line 230 has 0x742d35Cc6634C0532925a3b844Bc9e7595f0bEb which is only 39 hex characters (should be 40). This may cause the test to fail or pass for wrong reasons depending on the implementation's strictness.

-        Address: 0x742d35Cc6634C0532925a3b844Bc9e7595f0bEb
+        Address: 0x742d35Cc6634C0532925a3b844Bc9e7595f0bEbf
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
describe('extractAllEntities', () => {
test('should extract all entity types', () => {
const text = `
Contact @alice at alice@example.com
Website: https://example.com
Address: 0x742d35Cc6634C0532925a3b844Bc9e7595f0bEb
ID: 550e8400-e29b-41d4-a716-446655440000
Code: \`console.log('test')\`
People: Bob, Charlie
`;
const entities = extractAllEntities(text);
expect(entities.mentions.length).toBeGreaterThan(0);
expect(entities.emails.length).toBeGreaterThan(0);
expect(entities.urls.length).toBeGreaterThan(0);
expect(entities.addresses.length).toBeGreaterThan(0);
expect(entities.uuids.length).toBeGreaterThan(0);
expect(entities.codeBlocks.length).toBeGreaterThan(0);
expect(entities.names.length).toBeGreaterThan(0);
});
describe('extractAllEntities', () => {
test('should extract all entity types', () => {
const text = `
Contact @alice at alice@example.com
Website: https://example.com
Address: 0x742d35Cc6634C0532925a3b844Bc9e7595f0bEbf
ID: 550e8400-e29b-41d4-a716-446655440000
Code: \`console.log('test')\`
People: Bob, Charlie
`;
const entities = extractAllEntities(text);
expect(entities.mentions.length).toBeGreaterThan(0);
expect(entities.emails.length).toBeGreaterThan(0);
expect(entities.urls.length).toBeGreaterThan(0);
expect(entities.addresses.length).toBeGreaterThan(0);
expect(entities.uuids.length).toBeGreaterThan(0);
expect(entities.codeBlocks.length).toBeGreaterThan(0);
expect(entities.names.length).toBeGreaterThan(0);
});
🤖 Prompt for AI Agents
In src/__tests__/nlp/entities.test.ts around lines 225 to 245, the Ethereum
address in the test data (line ~230) is truncated
(0x742d35Cc6634C0532925a3b844Bc9e7595f0bEb — 39 hex chars); update it to a valid
40-hex-character address (0x + 40 hex chars) so the entity extractor sees a
properly formed address; simply replace the string with a correctly-length
example address (keeping the 0x prefix and case as desired) and re-run the
tests.

Comment on lines +168 to +184
export function extractNames(text: string, limit = ENTITY_LIMITS.maxNames): string[] {
const matches = text.match(PATTERNS.capitalizedWord) || [];

// Filter out common false positives
const commonWords = new Set([
'The', 'A', 'An', 'This', 'That', 'These', 'Those',
'I', 'You', 'He', 'She', 'It', 'We', 'They',
'My', 'Your', 'His', 'Her', 'Its', 'Our', 'Their',
'Is', 'Are', 'Was', 'Were', 'Be', 'Been', 'Being',
'Have', 'Has', 'Had', 'Do', 'Does', 'Did',
'Will', 'Would', 'Should', 'Could', 'May', 'Might', 'Must',
'Can', 'Could', 'Should', 'Would',
]);

const names = matches.filter((word) => !commonWords.has(word));
return [...new Set(names)].slice(0, limit);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Duplicate entry in commonWords set.

'Could' appears twice in the commonWords set (lines 178 and 179). While Sets deduplicate automatically, this is unnecessary repetition.

     'Will', 'Would', 'Should', 'Could', 'May', 'Might', 'Must',
-    'Can', 'Could', 'Should', 'Would',
+    'Can',
   ]);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export function extractNames(text: string, limit = ENTITY_LIMITS.maxNames): string[] {
const matches = text.match(PATTERNS.capitalizedWord) || [];
// Filter out common false positives
const commonWords = new Set([
'The', 'A', 'An', 'This', 'That', 'These', 'Those',
'I', 'You', 'He', 'She', 'It', 'We', 'They',
'My', 'Your', 'His', 'Her', 'Its', 'Our', 'Their',
'Is', 'Are', 'Was', 'Were', 'Be', 'Been', 'Being',
'Have', 'Has', 'Had', 'Do', 'Does', 'Did',
'Will', 'Would', 'Should', 'Could', 'May', 'Might', 'Must',
'Can', 'Could', 'Should', 'Would',
]);
const names = matches.filter((word) => !commonWords.has(word));
return [...new Set(names)].slice(0, limit);
}
export function extractNames(text: string, limit = ENTITY_LIMITS.maxNames): string[] {
const matches = text.match(PATTERNS.capitalizedWord) || [];
// Filter out common false positives
const commonWords = new Set([
'The', 'A', 'An', 'This', 'That', 'These', 'Those',
'I', 'You', 'He', 'She', 'It', 'We', 'They',
'My', 'Your', 'His', 'Her', 'Its', 'Our', 'Their',
'Is', 'Are', 'Was', 'Were', 'Be', 'Been', 'Being',
'Have', 'Has', 'Had', 'Do', 'Does', 'Did',
'Will', 'Would', 'Should', 'Could', 'May', 'Might', 'Must',
'Can',
]);
const names = matches.filter((word) => !commonWords.has(word));
return [...new Set(names)].slice(0, limit);
}
🤖 Prompt for AI Agents
In src/nlp/entities.ts around lines 168 to 184, the commonWords Set contains a
duplicate entry ('Could' appears twice); remove the redundant 'Could' entry so
the list has each word only once (no behavior change needed since Set
deduplicates, just clean up the duplicate to avoid confusion).

Comment on lines +27 to +78
function jaroSimilarity(str1: string, str2: string): number {
if (str1 === str2) return 1.0;
if (!str1 || !str2) return 0.0;

const len1 = str1.length;
const len2 = str2.length;

// Calculate match window
const matchWindow = Math.floor(Math.max(len1, len2) / 2) - 1;
if (matchWindow < 1) {
return str1 === str2 ? 1.0 : 0.0;
}

const matches1 = new Array(len1).fill(false);
const matches2 = new Array(len2).fill(false);

let matches = 0;
let transpositions = 0;

// Find matches
for (let i = 0; i < len1; i++) {
const start = Math.max(0, i - matchWindow);
const end = Math.min(i + matchWindow + 1, len2);

for (let j = start; j < end; j++) {
if (matches2[j] || str1[i] !== str2[j]) continue;
matches1[i] = true;
matches2[j] = true;
matches++;
break;
}
}

if (matches === 0) return 0.0;

// Find transpositions
let k = 0;
for (let i = 0; i < len1; i++) {
if (!matches1[i]) continue;
while (!matches2[k]) k++;
if (str1[i] !== str2[k]) transpositions++;
k++;
}

const jaro = (
matches / len1 +
matches / len2 +
(matches - transpositions / 2) / matches
) / 3;

return jaro;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -type f -name "*.test.*" -o -name "*.spec.*" | grep -i similarity

Repository: elizaos-plugins/plugin-autonomous

Length of output: 115


🏁 Script executed:

cd src && find . -type f \( -name "*.ts" -o -name "*.js" \) | head -20

Repository: elizaos-plugins/plugin-autonomous

Length of output: 756


🏁 Script executed:

rg -l "jaroSimilarity|jaro" --type ts --type js

Repository: elizaos-plugins/plugin-autonomous

Length of output: 152


🏁 Script executed:

cat -n src/__tests__/nlp/similarity.test.ts

Repository: elizaos-plugins/plugin-autonomous

Length of output: 6699


🏁 Script executed:

cat -n src/nlp/similarity.ts

Repository: elizaos-plugins/plugin-autonomous

Length of output: 8232


🌐 Web query:

Jaro distance algorithm match window formula short strings

💡 Result:

The Jaro matching window (maximum offset a character may match) is:

match_window = max(0, floor( max(|s1|, |s2|) / 2 ) - 1).

So for example:

  • if the longer string length = 5 → match_window = floor(5/2) - 1 = 2 - 1 = 1.
  • if the longer length = 2 → match_window = floor(2/2) - 1 = 1 - 1 = 0 (no offset allowed).

Sources: Wikipedia (Jaro–Winkler) and common implementations. [1][2]

References
[1] Wikipedia — Jaro–Winkler distance.
[2] GeeksforGeeks — Jaro and Jaro-Winkler similarity.


🏁 Script executed:

git log --oneline -p src/nlp/similarity.ts | grep -A 20 -B 5 "matchWindow < 1" | head -50

Repository: elizaos-plugins/plugin-autonomous

Length of output: 807


🏁 Script executed:

# Verify the actual behavior by simulating the algorithm with matchWindow = 0
python3 << 'EOF'
# Test case: "ab" vs "ac" with matchWindow = 0
str1 = "ab"
str2 = "ac"
len1, len2 = 2, 2
matchWindow = 0  # floor(2/2) - 1 = 0

matches1 = [False] * len1
matches2 = [False] * len2
matches = 0
transpositions = 0

# Find matches with window = 0
for i in range(len1):
    start = max(0, i - matchWindow)
    end = min(i + matchWindow + 1, len2)
    for j in range(start, end):
        if matches2[j] or str1[i] != str2[j]:
            continue
        matches1[i] = True
        matches2[j] = True
        matches += 1
        break

print(f"Matches found: {matches}")
print(f"matches1: {matches1}")
print(f"matches2: {matches2}")

if matches > 0:
    # Find transpositions
    k = 0
    for i in range(len1):
        if not matches1[i]:
            continue
        while k < len2 and not matches2[k]:
            k += 1
        if k < len2 and str1[i] != str2[k]:
            transpositions += 1
        if k < len2:
            k += 1
    
    print(f"Transpositions: {transpositions}")
    jaro = (matches/len1 + matches/len2 + (matches - transpositions/2)/matches) / 3
    print(f"Jaro score for '{str1}' vs '{str2}': {jaro:.4f}")
else:
    print("No matches found -> Jaro = 0.0")

print("\nWith current guard (if matchWindow < 1, return 0.0):")
print("Returns: 0.0 (INCORRECT)")
EOF

Repository: elizaos-plugins/plugin-autonomous

Length of output: 163


jaroSimilarity guard incorrectly rejects valid short-string similarity calculations

The early return:

const matchWindow = Math.floor(Math.max(len1, len2) / 2) - 1;
if (matchWindow < 1) {
  return str1 === str2 ? 1.0 : 0.0;
}

violates the standard Jaro algorithm specification. The algorithm explicitly permits matchWindow = 0 (which occurs when the longer string has length ≤ 2), and characters must match at their exact positions. For example, comparing "ab" vs "ac" should yield a Jaro score of ~0.667 (one matching character 'a'), not 0.0.

Remove the guard to allow the algorithm to execute with matchWindow = 0:

-  // Calculate match window
-  const matchWindow = Math.floor(Math.max(len1, len2) / 2) - 1;
-  if (matchWindow < 1) {
-    return str1 === str2 ? 1.0 : 0.0;
-  }
+  // Calculate match window (can be 0 for very short strings)
+  const matchWindow = Math.floor(Math.max(len1, len2) / 2) - 1;

This aligns the implementation with standard Jaro distance behavior and properly handles short text comparisons.

🤖 Prompt for AI Agents
In src/nlp/similarity.ts around lines 27 to 78, the early guard that returns
when matchWindow < 1 incorrectly short-circuits valid short-string Jaro
calculations; remove that guard and instead clamp matchWindow to a non-negative
value (e.g., const matchWindow = Math.max(0, Math.floor(Math.max(len1, len2) /
2) - 1)) so the algorithm runs with matchWindow = 0 for short strings and
computes correct matches/transpositions per the standard Jaro algorithm.

Comment on lines +25 to +28
export async function executeMP8(
ctx: PipelineContext,
actionProviderRegistry?: any
): Promise<MiniPipelineResult<ActionProviderResult>> {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for ActionProviderRegistry type definition
rg -n 'ActionProviderRegistry' --type=ts -i

Repository: elizaos-plugins/plugin-autonomous

Length of output: 1300


🏁 Script executed:

#!/bin/bash
# Look for registry-related interfaces and types
rg -n 'interface.*Registry|type.*Registry' --type=ts | head -20

Repository: elizaos-plugins/plugin-autonomous

Length of output: 493


🏁 Script executed:

#!/bin/bash
# Read the executeMP8 function to see how actionProviderRegistry is used
sed -n '25,134p' src/pipeline/mini/action-provider.ts | head -80

Repository: elizaos-plugins/plugin-autonomous

Length of output: 2649


Type actionProviderRegistry parameter with the ActionProviderRegistry class.

The ActionProviderRegistry class is already defined and exported from src/registry/index.ts. Using the proper type instead of any would improve type safety and IDE support.

+import { ActionProviderRegistry } from '../registry/index.ts';
+
 export async function executeMP8(
   ctx: PipelineContext,
-  actionProviderRegistry?: any
+  actionProviderRegistry?: ActionProviderRegistry
 ): Promise<MiniPipelineResult<ActionProviderResult>> {

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/pipeline/mini/action-provider.ts around lines 25 to 28, the parameter
actionProviderRegistry is currently typed as any; update its type to the
exported ActionProviderRegistry from src/registry/index.ts. Import
ActionProviderRegistry at the top of the file from 'src/registry' (or correct
relative path) and change the function signature to use that type so callers and
the implementation gain proper type checking and IDE completion.

Comment on lines +36 to +39
const agentRole = ctx.results.get('mp5-agent-match')?.data?.scores?.get(runtime.agentId);
const priorityScore = agentRole
? calculatePriority(message, runtime.agentId, agentRole.agentName)
: { priority: 'normal' as const, score: 50, factors: {} };
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

rg -n 'queueEmbeddingGeneration' --type=ts

Repository: elizaos-plugins/plugin-autonomous

Length of output: 551


🏁 Script executed:

cd src/pipeline/mini && wc -l embedding.ts

Repository: elizaos-plugins/plugin-autonomous

Length of output: 92


🏁 Script executed:

sed -n '1,60p' src/pipeline/mini/embedding.ts | cat -n

Repository: elizaos-plugins/plugin-autonomous

Length of output: 2380


🏁 Script executed:

rg -n 'ctx\.await\|ctx\.results\.get' src/pipeline/mini/embedding.ts -A 2 -B 2

Repository: elizaos-plugins/plugin-autonomous

Length of output: 59


🏁 Script executed:

rg -n 'mp5-agent-match' --type=ts -B 2 -A 2

Repository: elizaos-plugins/plugin-autonomous

Length of output: 2291


🏁 Script executed:

rg -n 'MP5\|mp5' src/pipeline/mini --type=ts

Repository: elizaos-plugins/plugin-autonomous

Length of output: 59


🏁 Script executed:

fd '.*\.ts$' src/pipeline/mini --exec basename {} \;

Repository: elizaos-plugins/plugin-autonomous

Length of output: 194


🏁 Script executed:

rg -n '_DEPENDENCIES = ' src/pipeline/mini --type=ts -B 1

Repository: elizaos-plugins/plugin-autonomous

Length of output: 1169


Add MP5 dependency or await it explicitly.

MP3 declares no dependencies (MP3_DEPENDENCIES = []) but synchronously reads ctx.results.get('mp5-agent-match') at line 36. Compare this to MP9 (character.ts), which properly declares ['mp4-topics', 'mp5-agent-match'] as dependencies and uses ctx.await<any>('mp5-agent-match'). Either add 'mp5-agent-match' to MP3_DEPENDENCIES and use ctx.await() to guarantee MP5 has completed, or document that the silent fallback to default priority is intentional when MP5 hasn't finished.

🤖 Prompt for AI Agents
In src/pipeline/mini/embedding.ts around lines 36 to 39, the code reads
ctx.results.get('mp5-agent-match') synchronously while MP3 currently declares no
dependencies; update MP3_DEPENDENCIES to include 'mp5-agent-match' and replace
the direct ctx.results.get call with an awaited retrieval (e.g., await
ctx.await('mp5-agent-match')) before using the result so MP5 has completed, or
if the silent fallback to default priority is intentional, add a clear comment
in the MP3 declaration explaining that absence of MP5 results is expected and
that the default priority path is deliberate.

Comment on lines +53 to +54
detectorUsed: detected.detector,
} as LanguageResult;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Inconsistent property name: detected.detector vs detected.detectorUsed

Line 53 accesses detected.detector but line 100 accesses detected.detectorUsed. This inconsistency will cause one path to return undefined for detectorUsed if the property names differ on the actual detectLanguage return type.

-            detectorUsed: detected.detector,
+            detectorUsed: detected.detectorUsed,
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
detectorUsed: detected.detector,
} as LanguageResult;
detectorUsed: detected.detectorUsed,
} as LanguageResult;
🤖 Prompt for AI Agents
In src/pipeline/mini/language.ts around lines 53-54 (and also update the usage
at ~line 100), the code sets detectorUsed: detected.detector while elsewhere it
reads detected.detectorUsed; make the property name consistent with the actual
detectLanguage return type by replacing detected.detector with
detected.detectorUsed (or vice versa if the source function uses detector) in
both places, and update any corresponding type/interface (LanguageResult or the
detectLanguage return type) to match the chosen property name so detectorUsed is
never undefined.

Comment on lines +43 to +94
): Promise<[T, MiniPipelineProfile]> {
const startMs = Date.now();

let result: T;

try {
result = await fn();
} catch (err) {
throw err;
} finally {
const endMs = Date.now();
const durationMs = endMs - startMs;
const bottleneck = durationMs > config.bottleneckThresholdMs;

// Log if bottleneck or detailed logging enabled
if (bottleneck || config.enableDetailedLogging) {
const level = bottleneck ? 'warn' : 'debug';
const emoji = bottleneck ? '🐌' : '⚡';

logger[level](
{
src: 'pipeline:profiler',
mp: name,
durationMs,
cached,
bottleneck,
},
`${emoji} MP ${name}: ${durationMs}ms${cached ? ' (cached)' : ''}${bottleneck ? ' [BOTTLENECK]' : ''}`
);
}

// Track to resource tracker
if (config.trackToResourceTracker && resourceTracker) {
// Use custom category for MP timing (not LLM, not DB)
// We'll add a new tracking method if needed, or use generic timing
// For now, just track as a data point
}

const profile: MiniPipelineProfile = {
startMs,
endMs,
durationMs,
cached,
bottleneck,
};

return [result!, profile];
}

// Should never reach here, but TypeScript needs it
throw new Error('Unreachable');
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: return inside finally block causes incorrect behavior.

The return statement on line 89 is inside the finally block. This overwrites the control flow — if fn() throws an exception, the throw err on line 51 is silently discarded, and the function returns [undefined!, profile] instead of propagating the error. Additionally, result! is unsafe when an error occurred.

Apply this diff to fix the control flow:

 export async function profile<T>(
   name: string,
   fn: () => Promise<T>,
   cached = false,
   config = DEFAULT_PROFILER_CONFIG,
   resourceTracker?: IResourceTracker
 ): Promise<[T, MiniPipelineProfile]> {
   const startMs = Date.now();
-  
-  let result: T;
-  
-  try {
-    result = await fn();
-  } catch (err) {
-    throw err;
-  } finally {
-    const endMs = Date.now();
-    const durationMs = endMs - startMs;
-    const bottleneck = durationMs > config.bottleneckThresholdMs;
-    
-    // Log if bottleneck or detailed logging enabled
-    if (bottleneck || config.enableDetailedLogging) {
-      const level = bottleneck ? 'warn' : 'debug';
-      const emoji = bottleneck ? '🐌' : '⚡';
-      
-      logger[level](
-        {
-          src: 'pipeline:profiler',
-          mp: name,
-          durationMs,
-          cached,
-          bottleneck,
-        },
-        `${emoji} MP ${name}: ${durationMs}ms${cached ? ' (cached)' : ''}${bottleneck ? ' [BOTTLENECK]' : ''}`
-      );
-    }
-    
-    // Track to resource tracker
-    if (config.trackToResourceTracker && resourceTracker) {
-      // Use custom category for MP timing (not LLM, not DB)
-      // We'll add a new tracking method if needed, or use generic timing
-      // For now, just track as a data point
-    }
-    
-    const profile: MiniPipelineProfile = {
-      startMs,
-      endMs,
-      durationMs,
-      cached,
-      bottleneck,
-    };
-    
-    return [result!, profile];
-  }
-  
-  // Should never reach here, but TypeScript needs it
-  throw new Error('Unreachable');
+
+  const result = await fn();
+  
+  const endMs = Date.now();
+  const durationMs = endMs - startMs;
+  const bottleneck = durationMs > config.bottleneckThresholdMs;
+
+  // Log if bottleneck or detailed logging enabled
+  if (bottleneck || config.enableDetailedLogging) {
+    const level = bottleneck ? 'warn' : 'debug';
+    const emoji = bottleneck ? '🐌' : '⚡';
+
+    logger[level](
+      {
+        src: 'pipeline:profiler',
+        mp: name,
+        durationMs,
+        cached,
+        bottleneck,
+      },
+      `${emoji} MP ${name}: ${durationMs}ms${cached ? ' (cached)' : ''}${bottleneck ? ' [BOTTLENECK]' : ''}`
+    );
+  }
+
+  // Track to resource tracker
+  if (config.trackToResourceTracker && resourceTracker) {
+    // TODO: Add custom category for MP timing
+  }
+
+  const profileData: MiniPipelineProfile = {
+    startMs,
+    endMs,
+    durationMs,
+    cached,
+    bottleneck,
+  };
+
+  return [result, profileData];
 }

If you need timing even on failure, consider a wrapper that catches errors, calculates duration, logs, then re-throws.

🧰 Tools
🪛 Biome (2.1.2)

[error] 89-90: Unsafe usage of 'return'.

'return' in 'finally' overwrites the control flow statements inside 'try' and 'catch'.

(lint/correctness/noUnsafeFinally)

🤖 Prompt for AI Agents
In src/pipeline/profiler.ts around lines 43 to 94, the function currently
returns from inside the finally block which swallows exceptions and uses result!
unsafely; fix by moving the return out of the finally and explicitly preserving
any thrown error: declare local vars for result and caughtError, in try assign
result = await fn(), in catch assign caughtError = err, in finally compute
endMs/durationMs, do logging and resource-tracking there (but do not return or
throw), then after the finally if caughtError is set re-throw it, otherwise
return [result, profile]; this preserves timing/logging while correctly
propagating errors and avoids using result! unsafely.

Comment on lines +93 to +105
export interface ClassifyResult {
action: MessageAction; // ignore | react | reply | stop
confidence: number; // 0-1
reasons: string[]; // Why this classification
heuristics: {
isGreeting: boolean;
isQuestion: boolean;
isCommand: boolean;
isThanks: boolean;
isShort: boolean; // < 10 words
hasAddressedName: boolean;
};
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

ClassifyResult.action comment mentions stop which isn’t in MessageAction

The comment says:

action: MessageAction;         // ignore | react | reply | stop

but MessageAction (from src/types.ts) only allows 'ignore' | 'reply' | 'react'. This can confuse consumers into expecting a stop variant that can never appear.

Either add 'stop' to MessageAction or, more likely, update the comment to match the current union.

🤖 Prompt for AI Agents
In src/pipeline/types.ts around lines 93 to 105, the inline comment for
ClassifyResult.action lists "ignore | react | reply | stop" but MessageAction
(src/types.ts) only defines 'ignore' | 'reply' | 'react'; update the code so the
comment and type align: either remove "stop" from the comment and change it to
"ignore | react | reply" to reflect the current MessageAction union, or if
"stop" is required by design, add 'stop' to the MessageAction union in
src/types.ts (and update any consumers/handlers to accept the new variant).
Ensure the chosen fix keeps the comment consistent with the actual MessageAction
type across the codebase.

Comment on lines +116 to +132
// Compile actions
const actions = runtime.actions || [];
for (const action of actions) {
const compiled = await this.compileAction(action);
if (compiled) {
this.state.actions.set(compiled.name, compiled);
}
}

// Compile providers
const providers = runtime.providers || [];
for (const provider of providers) {
const compiled = await this.compileProvider(provider);
if (compiled) {
this.state.providers.set(compiled.name, compiled);
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Provider name resolution will collapse many providers to unknown

compileProvider sets the provider name to 'unknown' whenever typeof provider.get === 'function', which is likely true for most providers. This means multiple providers will be stored under the same key, overwriting each other in this.state.providers and making lookups unusable.

Consider deriving a stable, unique-ish name without discarding provider.name just because .get exists, e.g.:

-      const description = provider.description || '';
-      const name = typeof provider.get === 'function' ? 'unknown' : provider.name || 'unknown';
+      const description = provider.description || '';
+      const name =
+        provider.name ||
+        (provider.id as string | undefined) ||
+        (provider.constructor && provider.constructor.name) ||
+        'unknown';

Adjust the fallback chain to match your actual provider shape, but avoid the blanket 'unknown' for function-based providers.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/registry/action-provider-registry.ts around lines 116 to 132, provider
name resolution currently forces name to 'unknown' whenever typeof provider.get
=== 'function', causing many providers to collide; change the logic to preserve
provider.name when present and otherwise derive a stable name from available
runtime info (e.g. provider.name || provider.get?.name ||
provider.constructor?.name || a deterministic fallback like `${provider.type ||
'provider'}:${hashOrIndex}`), so function-based providers do not all resolve to
the literal 'unknown' and keys in this.state.providers remain unique and
predictable.

@odilitime odilitime changed the title V2 Implement new foundation with three-phase preprocessing and mini-pipelines Dec 8, 2025
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a comprehensive v2 foundation for the autonomous message processing system with a three-phase preprocessing pipeline that aims to reduce LLM costs by 90-98% through intelligent preprocessing and coordination.

Summary

The PR introduces an ambitious new architecture centered around:

  • Phase 0: 9 parallel mini-pipelines for no-LLM preprocessing (language detection, classification, entity extraction, etc.)
  • Phase 1: LLM-based review with character flavor injection (planned)
  • Phase 2: Conditional response generation (planned)

Key Additions:

  • Comprehensive NLP utilities (phonetic matching, similarity, keywords, entity extraction)
  • Flexible mini-pipeline framework with dependency resolution
  • Action/provider registry with startup compilation
  • Multi-message context management for deduplication
  • Extensive profiling and bottleneck detection
  • 267 total tests (245 passing, 15 skipped, 7 pre-existing failures)

Reviewed changes

Copilot reviewed 50 out of 50 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/nlp/*.ts (5 files) New NLP utilities for text processing - phonetic matching, similarity scoring, keyword extraction, entity extraction
src/pipeline/*.ts (14 files) New pipeline system with 9 mini-pipelines, context management, profiling, and orchestration
src/registry/*.ts (3 files) New action/provider registry with startup compilation and stability monitoring
src/__tests__/nlp/*.ts (4 files) Comprehensive test suite for NLP utilities (77 tests passing)
src/__tests__/pipeline/*.ts (4 files) Pipeline tests (42 passing, 15 skipped due to language detector requirements)
src/topic-registry.ts Added keyword-based relevance calculation
src/services/resource-tracker.ts Added per-agent cost breakdown visualization
src/services/autonomous-message-service.ts Added originalUseModel storage to prevent nested proxy wrapping
src/planning-engine.ts Added per-agent token tracking
src/name-variation-registry.ts Added placeholder for phonetic agent matching
package.json Added dependencies: double-metaphone, stopword
Documentation files (10+ files) Extensive documentation covering architecture, implementation, design philosophy, roadmap

Status: Foundation complete (Sprints 1-4, 7), integration pending (Sprints 5-6). No breaking changes to existing functionality.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
src/__tests__/nlp/entities.test.ts (1)

224-244: Fix truncated Ethereum address in extractAllEntities fixture

Line 229 still uses a 39-hex-character Ethereum address (0x + 39 chars):

        Address: 0x742d35Cc6634C0532925a3b844Bc9e7595f0bEb

If extractAddresses ever enforces correct length, this test will fail for the wrong reason. Align it with a full 40-hex-character address (e.g. the one used earlier in the dedicated Ethereum test):

-        Address: 0x742d35Cc6634C0532925a3b844Bc9e7595f0bEb
+        Address: 0x742d35Cc6634C0532925a3b844Bc9e7595f0bEbf

That keeps the combined-entities fixture consistent and avoids relying on a malformed value.

🧹 Nitpick comments (2)
src/__tests__/nlp/entities.test.ts (2)

97-104: Optional: tighten Ethereum address expectation to exact 40 hex chars

In the Ethereum address test, the expectation currently accepts any length between 39 and 42 hex characters:

expect(addresses[0]).toMatch(/^0x[a-fA-F0-9]{39,42}$/);

If extractAddresses is meant to validate proper Ethereum-style addresses, consider tightening this to exactly 40 hex characters after 0x:

-      expect(addresses[0]).toMatch(/^0x[a-fA-F0-9]{39,42}$/);
+      expect(addresses[0]).toMatch(/^0x[a-fA-F0-9]{40}$/);

This prevents truncated or over-long values from passing the test while still keeping the test implementation simple.


246-265: Optional: assert limits for all entity types in extractAllEntities

The "should respect all limits" test defines limits for every entity type but currently only asserts limits for URLs, emails, and mentions:

expect(entities.urls.length).toBeLessThanOrEqual(limits.maxUrls);
expect(entities.emails.length).toBeLessThanOrEqual(limits.maxEmails);
expect(entities.mentions.length).toBeLessThanOrEqual(limits.maxMentions);

If you want this test to guard against regressions across the whole surface, consider:

  • Including at least one address, UUID, name, and code block in the fixture text, and
  • Adding similar length <= limit assertions for addresses, uuids, codeBlocks, and names, optionally also checking that maxCodeBlockBytes is enforced.

Not mandatory, but it would convert this into a single, strong regression test for all limit knobs.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between db813ba and db9f901.

📒 Files selected for processing (1)
  • src/__tests__/nlp/entities.test.ts (1 hunks)
🔇 Additional comments (1)
src/__tests__/nlp/entities.test.ts (1)

1-289: Comprehensive, well-structured coverage for all entity extractors

The suite does a solid job exercising each extractor (URLs, emails, mentions, addresses, UUIDs, code blocks, names) plus the aggregate helpers, including deduplication and limit behavior. The tests are readable, focused, and align well with the intended API surface. Aside from the small address fixture tweak and optional tightening below, this looks ready.

odilitime and others added 2 commits December 8, 2025 16:07
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…nhancements

This commit represents a major evolution of the autonomous plugin's decision-making
architecture, shifting from rigid "hard rules" to a flexible "signals not rules"
approach that empowers LLMs to make nuanced decisions while maintaining cost control.

## Major Features

### Planning Modes (Cost vs Quality Control)
- **Full Mode** (default): Always uses LLM for highest quality decisions
- **Hybrid Mode**: Uses signal-based rules for high-confidence decisions (≥75%),
  escalates ambiguous cases to LLM. Typical savings: ~70% of batches skip LLM
- **Signals-Only Mode**: Pure deterministic rules, zero LLM cost for planning
- Configurable via AUTONOMOUS_PLANNING_MODE env var or per-agent settings
- 13 prioritized decision rules covering direct address, action intent, group
  commands, conversation continuity, and more

### Action Intent Detection (NLP-Enhanced)
- Detects when users want to EXECUTE actions vs just chat
- NLP-based noun/verb analysis using compromise library
- Special "Agent + Verb" pattern detection (e.g., "Quill can you draft...")
- Command pattern hints (e.g., "roll call", "mesh announce")
- Confidence scoring with context-aware adjustments
- Integrates with action registry for capability matching

### Entity Language Provider
- Persistent language preference tracking per user and room
- 1-2-1 confirmation pattern for language trust building
- Explicit preference storage ("speak to me in Spanish")
- Eliminates per-message detection for short messages ("gm", "ok", "thx")
- Room-level language context for new users

### Signals Not Rules Architecture
- Converts most "hard rules" to contextual signals for LLM interpretation
- Addressing context: TEXT→Agent, REPLY→BOT, REPLY→AGENT signals
- Conversation continuity: USES_YOU, FOLLOW_UP, BACK_REF signals
- Comprehensive LLM guidance for pronoun resolution and reply-to handling
- Text-based addressing takes precedence over reply-to metadata

### LLM Overload Handling (DRY)
- Centralized detection of LLM API capacity errors (Anthropic, OpenAI)
- User-friendly error messages instead of silent failures
- Integrated into both direct and batch message processing paths
- Graceful degradation when external services are overloaded

### Enhanced Logging & Observability
- 2-line planning decision table format
- Full agent names with emoji reactions
- Multi-line reasoning display (truncated at 90 chars)
- Signal-based decision logging for hybrid/signals-only modes
- Comprehensive debug output for troubleshooting

### Pattern Detection (Mini-Pipeline)
- Conversation continuity signals: addressing pronouns, follow-ups, back-references
- Linguistic patterns: similes, examples, comparisons, conditionals
- Action intent modifiers (patterns reduce intent, imperatives increase)
- All detection happens without LLM calls for speed and cost savings

### System Stability Improvements
- Defensive WASM module initialization (language detectors)
- Unhandled promise rejection handling in MySQL plugin
- Graceful degradation when language detection fails
- Proper error boundaries for external service failures

## Documentation Updates
- Comprehensive README with planning modes comparison table
- "Signals Not Rules" philosophy section
- Action intent detection documentation
- Entity language provider guide
- Removed 29 outdated documentation files
- Enhanced code comments with "WHY" explanations throughout

## Breaking Changes
None - all changes are backward compatible with sensible defaults.

## Configuration
New environment variables (all optional):
- AUTONOMOUS_PLANNING_MODE: 'full' (default), 'hybrid', 'signals-only'
- HYBRID_CONFIDENCE_THRESHOLD: 75 (default)

## Migration Notes
Existing deployments will default to 'full' planning mode (current behavior).
To enable cost savings, set AUTONOMOUS_PLANNING_MODE=hybrid.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/__tests__/execution-flow.test.ts (1)

65-71: ⚠️ Potential issue | 🟠 Major

QueuedMessage requires queuedAt (type mismatch).

QueuedMessage includes queuedAt, so these literals won’t type-check. Add a timestamp to each instance.

🛠️ Suggested fix
 const queuedMessage: QueuedMessage = {
   message,
   roomId: roomUUID,
   agentId: agentUUID,
   runtime: mockRuntime,
   callback: callbackMock,
+  queuedAt: Date.now(),
 };
 const queuedMessage: QueuedMessage = {
   message,
   roomId: roomUUID,
   agentId: agentUUID,
   runtime: mockRuntime,
   callback: mock(),
+  queuedAt: Date.now(),
 };
 const queuedMsg1: QueuedMessage = {
   message,
   roomId: roomUUID,
   agentId: agent1UUID,
   runtime: createMockRuntime(),
   callback: callback1,
+  queuedAt: Date.now(),
 };
 const queuedMsg2: QueuedMessage = {
   message,
   roomId: roomUUID,
   agentId: agent2UUID,
   runtime: createMockRuntime(),
   callback: callback2,
+  queuedAt: Date.now(),
 };

Also applies to: 152-158, 250-256

🤖 Fix all issues with AI agents
In `@CHANGELOG.md`:
- Around line 56-107: The changelog triggers markdownlint MD040 and MD060: add
explicit fenced code language tags to any code fences (fix code blocks
referenced in the diff such as the "Result" or any triple-backtick regions) and
normalize the table pipe spacing to a single space on each side of pipes for the
performance table header and rows (the table starting with "| Operation | Time |
Notes |"); update the table rows to align with the header spacing to satisfy
MD060 and ensure all triple-backtick blocks include a language tag (e.g.,
```json or ```text) to satisfy MD040.

In `@DEVELOPER_GUIDE.md`:
- Around line 674-681: In the DEVELOPER_GUIDE.md Resources section update the
broken cross-references: replace the nonexistent reference ARCHITECTURE_V2.md
with the correct ARCHITECTURE.md (update the list entry that currently names
ARCHITECTURE_V2.md) and remove the V2_IMPLEMENTATION_STATUS.md entry entirely
(delete the list bullet that references V2_IMPLEMENTATION_STATUS.md); ensure the
remaining bullets (PIPELINE.md, NLP_UTILITIES.md, FUTURE_ROADMAP.md, and This
file) remain unchanged.

In `@package.json`:
- Around line 23-34: Update the two out-of-date dependencies in package.json:
bump "compromise-dates" from ^2.0.1 to ^3.7.1 and "cld3-asm" from ^3.1.1 to
^4.0.0, then run your package manager to refresh the lockfile (npm install /
pnpm install / yarn install) and run the test suite to surface any breaking
changes; if anything breaks, adapt code that imports or uses compromise-dates or
cld3-asm APIs to the new major versions.

In `@README.md`:
- Around line 50-95: The README has markdownlint issues: add explicit fenced
code block languages and normalize table pipe spacing; update each
triple-backtick block (e.g., the examples under "The Problem" and "The
Solution") to include a language token like ```text or ```bash, and reformat the
ASCII tables (the Mode/Cost/Speed/Quality table) so pipes align consistently
(single space padding around cells) or run a Markdown formatter (markdownlint
--fix / prettier) to normalize table pipes and fix MD040/MD060 automatically;
search for the fenced blocks and the table header in the snippet to locate the
changes.

In `@src/__tests__/benchmark-nlp.test.ts`:
- Around line 52-77: The test titled "MP10: should process complex message in <
10ms (after warmup)" has a mismatched assertion using
expect(duration).toBeLessThan(15);—align them by either updating the test name
to state "< 15ms" or changing the assertion to
expect(duration).toBeLessThan(10); so the title and the check match; update the
string in the it(...) call or the numeric threshold in the expect(...) inside
the MP10 test accordingly.
- Around line 27-50: The test name "MP10: should process simple greeting in <
5ms (after warmup)" and the assertion expect(duration).toBeLessThan(10) are
inconsistent; update either the test name or the assertion to match: locate the
test case (the it(...) string) and either change the description to "< 10ms" to
match executeMP10's expectation, or tighten the assertion to
expect(duration).toBeLessThan(5) if you want to enforce the <5ms target; the
relevant symbols to edit are the it(...) description and the expect(...) line in
the test that uses createMockRuntime(), createPipelineContext(), executeMP1, and
executeMP10.

In `@src/docs/dual-pressure-scoring.md`:
- Line 22: The quoted Discord sentence in the doc is unclear—replace the
fragment "why's both ways and then make it numeric" with a clarified,
preservation-of-intent alternate (e.g., "why not use both and make it numeric")
or mark the original as verbatim with [sic]; update the quoted message string in
src/docs/dual-pressure-scoring.md (the quoted Discord line starting with
**User**:) to either the edited clearer phrasing or append [sic] immediately
after the original to indicate it's verbatim.

In `@src/engagement/engine.ts`:
- Around line 182-184: register() currently overwrites the policies map but
doesn't update the cached budgetPolicy, causing getBudgetStatus() to read stale
data; modify the register(policy: EngagementPolicy) method to detect when
policy.name === "budget" and update the instance property this.budgetPolicy =
policy after setting the map (so replacements keep the cache in sync), and
ensure types match (EngagementPolicy) so getBudgetStatus() always reads the
current budget policy.
- Around line 302-305: The code directly parses the engagement threshold using
parseInt(runtime.getSetting(SETTINGS.ENGAGEMENT_THRESHOLD) ||
String(DEFAULT_ENGAGEMENT_THRESHOLD)) which can produce NaN and diverges from
the project convention; replace this manual parse with the shared helper
getNumericSetting from src/utils/config-helpers.ts (use
getNumericSetting(runtime, SETTINGS.ENGAGEMENT_THRESHOLD,
DEFAULT_ENGAGEMENT_THRESHOLD) or equivalent) to ensure validation and
finite-number handling; make the same replacement for the analogous parse at the
other engagement threshold occurrence (around lines 354-356) and in
src/engagement/policies/budget.ts to keep behavior consistent across the
codebase.

In `@src/index.ts`:
- Around line 345-383: The interval created in the block that assigns
memoryCleanupInterval never gets cleared, so add explicit teardown logic to
clearInterval(memoryCleanupInterval) and set memoryCleanupInterval = undefined
when the plugin/system shuts down or before reinitializing; hook this into the
runtime's shutdown/unload handler (e.g., a provided cleanup/shutdown function or
process.on('exit')/module unload hook) and ensure any code path that re-creates
the interval checks and clears an existing memoryCleanupInterval first; also
ensure you stop using sharedMessageService after clearing the interval to avoid
accessing stale state (references: memoryCleanupInterval, sharedMessageService,
cleanupStaleData, MEMORY_CLEANUP_INTERVAL_MS).

In `@src/name-variation-registry.ts`:
- Around line 67-72: processPendingBatch currently awaits an existing
this.batchPromise and returns, which leaves any new queued agents unprocessed;
change it so that after awaiting an in-flight this.batchPromise it does not
return but continues to check and drain the pending queue (e.g., loop or
fallthrough to the existing batch-start logic), so that any agents added while
the previous batch was running get processed immediately; reference the
processPendingBatch method, this.batchPromise, and the pendingAgents queue (or
whatever internal queue name) and ensure you preserve setting/clearing
this.batchPromise around new batch work to avoid races.

In `@src/phase3/action-context.ts`:
- Around line 190-239: The getActionUsageHistory function currently iterates
memories returned by runtime.getMemories in oldest-first order, so the
keep-last-N logic misapplies to older entries; after fetching memories (the
result of runtime.getMemories) call .reverse() (e.g., replace the fetched
memories assignment with a reversed array) so that the loop over memories
processes newest entries first and the seenActions success/failure counters
(seenActions map, counts.successes/counts.failures) correctly enforce
PHASE3_ACTION_MEMORY_SUCCESS_KEEP and PHASE3_ACTION_MEMORY_FAILURE_KEEP on
recent records.

In `@src/phase3/budget.ts`:
- Around line 250-254: selectTasksForBudget is being passed an empty focused
list because CycleContext doesn't carry parsed focused task IDs from
TriageOutput; update CycleContext to include a focused: string[] (or appropriate
type) field populated from the parser/TriageOutput and pass that field into the
selectTasksForBudget call (replace the [] with context.focused), or if
prioritization isn't intended yet add a clear TODO in the budget composition
code noting that focused tasks must be propagated from TriageOutput into
CycleContext before calling selectTasksForBudget; reference CycleContext,
TriageOutput, and selectTasksForBudget when making the change.

In `@src/phase3/executor.ts`:
- Around line 350-365: The task ID generation in createTask currently uses
stringToUuid(`phase3-task-${Date.now()}-${taskName}`) which can collide when
called multiple times within the same millisecond; update createTask to append a
high-entropy or monotonic component (e.g., a crypto-random suffix or an
incrementing counter) to the seed before calling stringToUuid so each invocation
produces a unique taskId, ensuring the change is applied where taskId is set and
preserves existing naming (taskName) and metadata logic.
🧹 Nitpick comments (23)
src/docs/deferred-intent-tracking.md (3)

19-28: Add language identifiers to fenced code blocks.

MD040 flags these blocks. Please label them (e.g., text, typescript, mermaid, or plain) to satisfy markdownlint and improve readability.

Suggested patch
-```
+```text
 Timeline WITHOUT Deferral:
 ...
-```
+```

-```
+```text
 Timeline WITH Deferral:
 ...
-```
+```

-```
+```text
 [You were going to respond to these messages but deferred:]
 ...
-```
+```

-```
+```text
 ┌─────────────────────────────────────────────────────────────────────────┐
 ...
-```
+```

Also applies to: 34-45, 109-114, 189-234


181-186: Fix table spacing to satisfy markdownlint MD060.

Add spaces inside the pipes for consistent table style.

Suggested patch
-| File | Purpose |
-|------|---------|
-| `types.ts` | `DeferredIntent`, `DeferredDraft` type definitions |
-| `services/autonomous-message-service.ts` | `DeferredIntentStore` class, checkpoints |
-| `queue-manager.ts` | `hasNewerMessagesForRoom()` method |
+| File | Purpose |
+| --- | --- |
+| `types.ts` | `DeferredIntent`, `DeferredDraft` type definitions |
+| `services/autonomous-message-service.ts` | `DeferredIntentStore` class, checkpoints |
+| `queue-manager.ts` | `hasNewerMessagesForRoom()` method |

11-13: Consider hyphenating “up-to-date” in the quoted text.

This is a minor style tweak; if you prefer to preserve the quote verbatim, feel free to ignore.

Suggested patch
-...we have a more up to date state"
+...we have a more up-to-date state"
src/docs/entity-language-provider.md (3)

13-22: Add a language tag to this fenced code block.

Markdownlint flags this block for missing a language specifier.

Proposed fix
-```
+```text
 User: "Gm"                     → Language detection: 'und' (undetermined)
 User: "ok"                     → Language detection: 'und'  
 User: "thx"                    → Language detection: 'und'
 User: "Hola amigo"             → Language detection: 'es' (Spanish)
 User: "gracias"                → Language detection: 'es'
 User: "k"                      → Language detection: 'und'

 Agent has NO IDEA this user primarily speaks Spanish!
-```
+```

33-45: Add a language tag to this fenced code block.

Markdownlint flags this block for missing a language specifier.

Proposed fix
-```
+```text
 EntityLanguageProfile for user:
   confirmed: ['es', 'en']      ← User has confirmed both
   preferred: 'es'              ← User explicitly said "habla español"
   primary: 'es'                ← Most messages are Spanish
   messageCount: { es: 15, en: 3 }

 Agent now KNOWS:
 - User speaks Spanish and English
 - User PREFERS Spanish
 - Most messages are in Spanish
 - Respond in Spanish!
-```
+```

51-78: Add a language tag to this fenced code block.

Markdownlint flags this block for missing a language specifier.

Proposed fix
-```
+```text
 ┌─────────────────────────────────────────────────────────────────────────┐
 │                         Message Processing                               │
 ├─────────────────────────────────────────────────────────────────────────┤
 │                                                                         │
 │  1. USER SENDS MESSAGE                                                  │
 │     └─► detectLanguageWithTrust() runs                                  │
 │         └─► Updates EntityLanguageProfile                               │
 │             └─► 1-2-1 pattern detection                                 │
 │                                                                         │
 │  2. PROVIDER RUNS                                                       │
 │     └─► entityLanguageProvider.get()                                    │
 │         └─► getEntityProfile(agentId, entityId)                         │
 │         └─► getRoomProfile(agentId, roomId)                             │
 │         └─► Format for agent context                                    │
 │                                                                         │
 │  3. AGENT SEES                                                          │
 │     # Sender Language Context                                           │
 │     **Preferred**: Spanish (explicitly requested)                       │
 │     **Primary**: Spanish (most used)                                    │
 │     **Confirmed**: Spanish, English                                     │
 │                                                                         │
 │     > Respond in Spanish (user's preference).                           │
 │                                                                         │
 │  4. AGENT RESPONDS IN SPANISH                                           │
 │                                                                         │
 └─────────────────────────────────────────────────────────────────────────┘
-```
+```
src/__tests__/test-helpers.ts (1)

79-91: Consider adding a default return value for getEfficiencySummary.

Unlike other mock methods in this factory (checkBudget, getMetrics, etc.), getEfficiencySummary at line 91 lacks a default return value. This inconsistency could cause tests to receive undefined unexpectedly if they call this method without overriding it.

🔧 Suggested fix
-        getEfficiencySummary: mock(),
+        getEfficiencySummary: mock().mockReturnValue({}),
src/__tests__/benchmark-nlp.test.ts (2)

116-129: Unused expectedRouting and category fields.

These fields are defined but never used for assertions. Either remove them to reduce noise, or add assertions to verify that each test case routes as expected:

Option 1: Remove unused fields
     const testCases = [
-      { text: 'hi', expectedRouting: 'template', category: 'greeting' },
-      { text: 'hello there', expectedRouting: 'template', category: 'greeting' },
+      { text: 'hi' },
+      { text: 'hello there' },
       // ... similarly for other cases
     ];
Option 2: Add assertions to verify expected routing
// After line 153
if (testCase.expectedRouting) {
  expect(decision.action).toBe(testCase.expectedRouting);
}

Note: Option 2 would make this a correctness test rather than a pure benchmark, which may not be desired due to the "weighted randomization" mentioned in the output.


195-214: Router instantiation included in timing measurement.

The SmartRouter is created inside the timed section (line 203), which includes instantiation overhead in the benchmark. If the intent is to measure only the routing decision time, consider moving instantiation before the timing block.

Proposed fix if measuring only execution time is desired
+    const router = new SmartRouter();
+    
     // Benchmark full pipeline
     const start = performance.now();
     
     // MP1 + MP10 + Routing
     await executeMP1(ctx);
     const mp10Result = await executeMP10(ctx);
     
     if (mp10Result.data) {
       const phase0: Phase0Result = { compromise: mp10Result.data } as Phase0Result;
-      const router = new SmartRouter();
       await router.route(phase0, mockRuntime);
     }
src/__tests__/planning-engine.test.ts (1)

245-311: Strengthen index-based mapping coverage with a non-trivial index.
With only one message, an “always pick first” bug could still pass. Consider adding a second message and using msg=2 to validate true index mapping.

Suggested test enhancement
       const agentUUID = 'd508a7a4-1234-5678-90ab-cdef12345678' as UUID;
       const roomUUID = '226d71f7-1234-5678-90ab-cdef12345678' as UUID;
       const messageUUID = 'a4efd5dd-1234-5678-90ab-cdef12345678' as UUID;
+      const messageUUID2 = 'b6c1d2e3-1234-5678-90ab-cdef12345678' as UUID;
@@
       const message: Memory = {
         id: messageUUID,
         entityId: 'user-123' as UUID,
         agentId: agentUUID,
         roomId: roomUUID,
         content: { text: 'test' },
         createdAt: Date.now(),
       };
+      const message2: Memory = {
+        id: messageUUID2,
+        entityId: 'user-123' as UUID,
+        agentId: agentUUID,
+        roomId: roomUUID,
+        content: { text: 'test-2' },
+        createdAt: Date.now(),
+      };
@@
       const queuedMessage: QueuedMessage = {
         message,
         roomId: roomUUID,
         agentId: agentUUID,
         runtime: mockRuntime,
         callback: mock(),
       };
+      const queuedMessage2: QueuedMessage = {
+        message: message2,
+        roomId: roomUUID,
+        agentId: agentUUID,
+        runtime: mockRuntime,
+        callback: mock(),
+      };
@@
-              messages: new Map([[messageUUID, queuedMessage]]),
+              messages: new Map([
+                [messageUUID, queuedMessage],
+                [messageUUID2, queuedMessage2],
+              ]),
@@
-        messages: [queuedMessage],
+        messages: [queuedMessage, queuedMessage2],
@@
-        `1|1|Tempest|reply|7|Test|85|50|50|50|Tempest`
+        `1|2|Tempest|reply|7|Test|85|50|50|50|Tempest`
@@
-      expect(result.plans[0].messageId).toBe(messageUUID); // Full UUID mapped from index
+      expect(result.plans[0].messageId).toBe(messageUUID2); // Full UUID mapped from index
src/docs/action-intent-detection.md (2)

63-67: Table may have rendering issues due to pipe characters in cells.

Line 66's example column contains pipe characters inside the regex pattern \d+ (ETH|USDC|SOL) which could interfere with Markdown table parsing in some renderers. Consider escaping the pipes or reformatting:

📝 Suggested fix
 | Parameter | Pattern | Example |
 |-----------|---------|---------|
 | Blockchain address | `0x[a-fA-F0-9]{40}` | `0x742d35Cc6634C0532925a3b844Bc9e7595f...` |
-| Token amount | `\d+ (ETH|USDC|SOL)` | `100 USDC`, `0.5 ETH` |
+| Token amount | `\d+ (ETH\|USDC\|SOL)` | `100 USDC`, `0.5 ETH` |
 | URL | `https?://...` | With intent phrase |

266-272: Remove trailing blank lines.

Multiple unnecessary blank lines at the end of the file.

ARCHITECTURE.md (1)

101-143: Minor inconsistency in ASCII diagram.

The ASCII diagram shows MP8 (A/P) connecting down to MP9 (lines 115-119), suggesting MP9 depends on MP8. However, the MP table (line 42) shows MP9 depends on "MP4, MP5" with no mention of MP8.

Verify the dependency chain is accurately represented. If MP9 doesn't depend on MP8, consider adjusting the diagram layout to avoid visual confusion:

Current (implies MP9 depends on MP8):
│  │ MP8: A/P  │
│  └─────┬─────┘
│        │
│  ┌─────▼─────┐
│  │ MP9: Char │

If MP9 is independent of MP8, consider:
│  │ MP8: A/P  │  │ MP9: Char │
│  └───────────┘  └───────────┘
src/name-variation-registry.ts (1)

375-393: Consider making the placeholder behavior more explicit.

Since this is a public method, returning [] can hide missing behavior. Consider logging a warning or marking it @internal until implemented.

src/banner.ts (1)

7-13: Misleading comment: ANSI codes don't match described colors.

The comment mentions "Chrome/Circuit Green" theme, but:

  • chrome (line 12) uses \x1b[31m which is red, not chrome/silver
  • circuit (line 13) uses \x1b[35m which is magenta, not green

Either update the comment to reflect the actual colors, or change the ANSI codes to match the intended theme. If green was intended for circuit, use \x1b[32m (green) or \x1b[92m (bright green).

src/engagement/policies/budget.ts (2)

186-199: Simplify: remove redundant condition and unreachable default case.

Line 186's if (pct >= 0.5) is always true at this point since all lower cases (pct <= 0, pct < 0.2, pct < 0.5) have already returned. The entire if-block and default case are redundant.

♻️ Proposed simplification
-    // ========================================================================
-    // BUDGET HEALTHY (> 50%)
-    // ========================================================================
-    // WHY 0 MODIFIER:
-    // No reduction. Let other policies decide.
-    if (pct >= 0.5) {
-      return {
-        modifier: 0,
-        effort: 'full',
-        reason: `ok(${Math.round(pct * 100)}%)`,
-      };
-    }
-
-    // Default case (shouldn't reach here, but TypeScript wants it)
+    // ========================================================================
+    // BUDGET HEALTHY (>= 50%)
+    // ========================================================================
+    // WHY 0 MODIFIER:
+    // No reduction. Let other policies decide.
     return {
       modifier: 0,
       effort: 'full',
       reason: `ok(${Math.round(pct * 100)}%)`,
     };

243-249: getBudgetStatus may return stale data if window has expired.

Unlike getWillingnessModifier and recordOutcome, this method doesn't check if the usage window has expired. If maxTokens is 50000 and an agent used 40000 tokens 2 hours ago, this will still report 40000 used even though the rolling window should have reset.

Consider resetting stale windows here for consistency:

♻️ Proposed fix
 getBudgetStatus(agentId: UUID, maxTokens: number): { used: number; remaining: number; pct: number } {
-  const u = this.usage.get(agentId);
-  const used = u?.tokens || 0;
+  const hourMs = 60 * 60 * 1000;
+  const now = Date.now();
+  let u = this.usage.get(agentId);
+  
+  // Reset if window expired (consistent with other methods)
+  if (u && now - u.windowStart > hourMs) {
+    u = undefined;
+  }
+  
+  const used = u?.tokens ?? 0;
   const remaining = maxTokens - used;
   const pct = remaining / maxTokens;
   return { used, remaining, pct };
 }
src/phase3/parser.ts (1)

141-149: Memory strength regex is case-sensitive for attribute value.

The regex on line 142 requires strength="weak|moderate|strong|permanent" to be lowercase, but the i flag only makes the tag names case-insensitive. If the LLM outputs strength="Weak" or strength="STRONG", this won't match.

Consider making the strength values case-insensitive in the regex:

♻️ Proposed fix
 function parseMemory(content: string): MemoryOutput | undefined {
-  const match = content.match(/<memory\s+strength="(weak|moderate|strong|permanent)"[^>]*>([\s\S]*?)<\/memory>/i);
+  const match = content.match(/<memory\s+strength="([wW][eE][aA][kK]|[mM][oO][dD][eE][rR][aA][tT][eE]|[sS][tT][rR][oO][nN][gG]|[pP][eE][rR][mM][aA][nN][eE][nN][tT])"[^>]*>([\s\S]*?)<\/memory>/);
   if (!match) return undefined;

   return {
     strength: match[1].toLowerCase() as MemoryStrength,
     content: match[2].trim(),
   };
 }

Alternatively, use a two-step approach: match any strength value, then validate:

 function parseMemory(content: string): MemoryOutput | undefined {
-  const match = content.match(/<memory\s+strength="(weak|moderate|strong|permanent)"[^>]*>([\s\S]*?)<\/memory>/i);
+  const match = content.match(/<memory\s+strength="([^"]+)"[^>]*>([\s\S]*?)<\/memory>/i);
   if (!match) return undefined;

+  const strength = match[1].toLowerCase();
+  if (!['weak', 'moderate', 'strong', 'permanent'].includes(strength)) {
+    return undefined;
+  }
+
   return {
-    strength: match[1].toLowerCase() as MemoryStrength,
+    strength: strength as MemoryStrength,
     content: match[2].trim(),
   };
 }
src/phase3/executor.ts (1)

802-818: Fragile hardcoded constraint pattern matching.

The fallback constraint matching (lines 806-817) uses hardcoded keyword patterns like "transfer", "spend", "risk", "confront". This is brittle because:

  1. New risky actions won't be caught
  2. False positives on legitimate uses (e.g., "knowledge transfer")
  3. Easy to bypass with synonyms

Consider either:

  • Requiring constraints to always have explicit triggers
  • Using a more sophisticated matching approach
  • Documenting this as a temporary implementation with a TODO
💡 Alternative: require explicit triggers
     } else {
-      // No triggers - check against common risky patterns based on constraint description
-      const descLower = constraint.description.toLowerCase();
-      
-      // Match common patterns
-      if (descLower.includes('transfer') && (lowerActionName.includes('transfer') || lowerDraft.includes('transfer'))) {
-        return constraint;
-      }
-      // ... other patterns ...
+      // No triggers specified - constraint cannot be automatically checked
+      // Log a warning for debugging
+      // runtime.logger?.warn(`[P3:Exec] Constraint "${constraint.id}" has no triggers - cannot auto-check`);
+      continue;
     }
src/phase3/catchup.ts (1)

31-36: Global module state limits testability and multi-agent scenarios.

roomAttentionMap and roomMessageHistory are global singletons. This means:

  1. Tests can't run in parallel without state pollution
  2. Multiple agents in the same process share attention data

For a "Chill" review, this is acceptable for MVP but consider extracting into a class or passing state explicitly for better testability:

💡 Future improvement: class-based state
export class CatchUpManager {
  private roomAttentionMap = new Map<UUID, RoomAttention>();
  private roomMessageHistory = new Map<UUID, number[]>();
  
  // Move all functions as methods...
}

Then inject per-agent or use a factory.

src/phase3/budget.ts (1)

75-88: Consider adding a guard against negative remaining.

consumeBudget doesn't prevent remaining from going negative. If callers pass more tokens than available, downstream code checking budget.remaining may behave unexpectedly.

🛡️ Optional: clamp remaining to non-negative
 export function consumeBudget(
   budget: TokenBudget,
   category: keyof TokenBudget['used'],
   tokens: number
 ): TokenBudget {
   return {
     ...budget,
     used: {
       ...budget.used,
       [category]: budget.used[category] + tokens,
     },
-    remaining: budget.remaining - tokens,
+    remaining: Math.max(0, budget.remaining - tokens),
   };
 }
src/phase3/action-memory.ts (1)

139-145: Consider batching deletes for better performance.

The sequential await runtime.deleteMemory(mem.id) calls inside a loop could be slow if many old memories exist. Consider batching with Promise.all if the runtime supports concurrent deletes.

♻️ Optional: batch deletes with Promise.all
       if (oldMems.length > 0) {
         // Delete old individual records
-        for (const mem of oldMems) {
-          if (mem.id) {
-            await runtime.deleteMemory(mem.id);
-          }
-        }
+        await Promise.all(
+          oldMems
+            .filter(mem => mem.id)
+            .map(mem => runtime.deleteMemory(mem.id!))
+        );
src/phase3/prompt.ts (1)

216-229: Consider adding a warning for low focus state.

Low energy and high social need have warning messages, but low focus does not. If focus is important for task prioritization, consider adding:

${focus < 0.3 ? '⚠️ Low focus - consider reducing distractions or simplifying tasks.' : ''}

Comment on lines +56 to +107
- Formality matching (casual/neutral/formal)
- Anti-repetition tracking (last 3 used per type)
- Memory-based persistence

#### Addressee Detection
- **Priority #1** in planning: Check who message is addressed to
- Text pattern detection: `@Name`, `Name,`, `Name:`
- Reply-to metadata detection (Discord `inReplyTo` field)
- Fast-path ignore for messages addressed to other agents
- Name variation registry with phonetic matching

#### Resource Tracking Enhancements
- Per-agent cost attribution (shows agent names, not UUIDs)
- NLP metrics tracking (routing decisions, template usage, ambiguity scores)
- Efficiency summary with LLM bypass rate
- Token usage breakdowns per model type

#### Discord Improvements
- Reply threading for busy room responses
- Emoji reaction sanitization (`:wave:` → `👋` conversion)
- Graceful permission error handling with invite link generation
- Bot-to-bot reply detection (prevents reply loops)

### Changed

#### Multi-Agent Batching Fix
**Problem**: Agents making separate planning calls instead of batching
**Root Cause**: 500ms slice duration too short for all agents to enqueue
**Fix**: Dynamic slice duration based on registered agent count:
- 1-2 agents: 1000ms
- 3-4 agents: 2000ms
- 5+ agents: 3000ms

**Result**: Proper batching, up to 8× fewer LLM calls

#### Nested Proxy Fix
**Problem**: `runtime.useModel` wrapped by multiple proxies if `withMaxTokens()` called repeatedly
**Fix**: Store original `useModel` once at agent registration, always restore to original

#### Response Brevity
- Added brevity scoring in planning (0-100)
- High brevity triggers strict token limits
- Minimum 150 tokens to prevent XML truncation
- Busy room detection enables "one response per batch" preference

### Performance

| Operation | Time | Notes |
|-----------|------|-------|
| Phase 0 (all MPs) | <30ms | Free intelligence |
| MP10 (Compromise) | 1-3ms | 120ms first run for init |
| Template response | <5ms | vs 500-2000ms for LLM |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

markdownlint warnings in tables/code fences.

MD040 (missing fenced code language) and MD060 (table pipe spacing) are reported in this section. Adding language tags and normalizing pipe spacing will keep lint clean.

🧰 Tools
🪛 LanguageTool

[grammar] ~83-~83: Ensure spelling is correct
Context: ...s instead of batching Root Cause: 500ms slice duration too short for all agents...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🪛 markdownlint-cli2 (0.20.0)

[warning] 56-56: Table column style
Table pipe is missing space to the right for style "compact"

(MD060, table-column-style)


[warning] 56-56: Table column style
Table pipe is missing space to the left for style "compact"

(MD060, table-column-style)


[warning] 56-56: Table column style
Table pipe is missing space to the right for style "compact"

(MD060, table-column-style)


[warning] 56-56: Table column style
Table pipe is missing space to the left for style "compact"

(MD060, table-column-style)


[warning] 56-56: Table column style
Table pipe is missing space to the right for style "compact"

(MD060, table-column-style)


[warning] 56-56: Table column style
Table pipe is missing space to the left for style "compact"

(MD060, table-column-style)


[warning] 66-66: Table column style
Table pipe is missing space to the right for style "compact"

(MD060, table-column-style)


[warning] 66-66: Table column style
Table pipe is missing space to the left for style "compact"

(MD060, table-column-style)


[warning] 66-66: Table column style
Table pipe is missing space to the right for style "compact"

(MD060, table-column-style)


[warning] 66-66: Table column style
Table pipe is missing space to the left for style "compact"

(MD060, table-column-style)


[warning] 66-66: Table column style
Table pipe is missing space to the right for style "compact"

(MD060, table-column-style)


[warning] 66-66: Table column style
Table pipe is missing space to the left for style "compact"

(MD060, table-column-style)


[warning] 78-78: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 99-99: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 104-104: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
In `@CHANGELOG.md` around lines 56 - 107, The changelog triggers markdownlint
MD040 and MD060: add explicit fenced code language tags to any code fences (fix
code blocks referenced in the diff such as the "Result" or any triple-backtick
regions) and normalize the table pipe spacing to a single space on each side of
pipes for the performance table header and rows (the table starting with "|
Operation | Time | Notes |"); update the table rows to align with the header
spacing to satisfy MD060 and ensure all triple-backtick blocks include a
language tag (e.g., ```json or ```text) to satisfy MD040.

Comment on lines +674 to +681
## Resources

- **ARCHITECTURE_V2.md** - High-level system design
- **PIPELINE.md** - Per-MP technical details
- **NLP_UTILITIES.md** - API reference for text processing
- **FUTURE_ROADMAP.md** - Ideas and enhancements
- **V2_IMPLEMENTATION_STATUS.md** - Current state and next steps
- **This file** - Developer guide and philosophy
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if referenced documentation files exist in the repository
echo "Checking for referenced documentation files..."
echo ""

for file in "ARCHITECTURE_V2.md" "ARCHITECTURE.md" "V2_IMPLEMENTATION_STATUS.md" "PIPELINE.md" "NLP_UTILITIES.md" "FUTURE_ROADMAP.md" "DEVELOPER_GUIDE.md"; do
  if fd -t f "^${file}$"; then
    echo "✓ Found: $file"
  else
    echo "✗ Missing: $file"
  fi
done

echo ""
echo "All markdown files in repository root:"
fd -t f "\.md$" --max-depth 1

Repository: elizaos-plugins/plugin-autonomous

Length of output: 593


Fix broken documentation cross-references in Resources section.

The Resources section references two files that don't exist in the repository:

  • Line 676: ARCHITECTURE_V2.md should be ARCHITECTURE.md
  • Line 680: V2_IMPLEMENTATION_STATUS.md does not exist and should be removed
Suggested fix
 ## Resources
 
-- **ARCHITECTURE_V2.md** - High-level system design
+- **ARCHITECTURE.md** - High-level system design
 - **PIPELINE.md** - Per-MP technical details
 - **NLP_UTILITIES.md** - API reference for text processing
 - **FUTURE_ROADMAP.md** - Ideas and enhancements
-- **V2_IMPLEMENTATION_STATUS.md** - Current state and next steps
 - **This file** - Developer guide and philosophy
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
## Resources
- **ARCHITECTURE_V2.md** - High-level system design
- **PIPELINE.md** - Per-MP technical details
- **NLP_UTILITIES.md** - API reference for text processing
- **FUTURE_ROADMAP.md** - Ideas and enhancements
- **V2_IMPLEMENTATION_STATUS.md** - Current state and next steps
- **This file** - Developer guide and philosophy
## Resources
- **ARCHITECTURE.md** - High-level system design
- **PIPELINE.md** - Per-MP technical details
- **NLP_UTILITIES.md** - API reference for text processing
- **FUTURE_ROADMAP.md** - Ideas and enhancements
- **This file** - Developer guide and philosophy
🤖 Prompt for AI Agents
In `@DEVELOPER_GUIDE.md` around lines 674 - 681, In the DEVELOPER_GUIDE.md
Resources section update the broken cross-references: replace the nonexistent
reference ARCHITECTURE_V2.md with the correct ARCHITECTURE.md (update the list
entry that currently names ARCHITECTURE_V2.md) and remove the
V2_IMPLEMENTATION_STATUS.md entry entirely (delete the list bullet that
references V2_IMPLEMENTATION_STATUS.md); ensure the remaining bullets
(PIPELINE.md, NLP_UTILITIES.md, FUTURE_ROADMAP.md, and This file) remain
unchanged.

Comment on lines 23 to 34
"dependencies": {
"@echogarden/fasttext-wasm": "^0.1.0",
"@elizaos/core": "workspace:*",
"cld3-asm": "^3.1.1",
"@echogarden/fasttext-wasm": "^0.1.0"
"compromise": "^14.14.0",
"compromise-dates": "^2.0.1",
"compromise-numbers": "^1.4.0",
"double-metaphone": "^2.0.1",
"franc": "^6.2.0",
"tinyld": "^1.3.4",
"stopword": "^3.1.5"
},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

latest versions compromise double-metaphone franc tinyld stopword npm packages 2025

💡 Result:

Latest versions (npm) and 2025 compromise status (as of Jan 30, 2026):

Package Latest version found Evidence Specifically reported as “compromised” in 2025 incidents?
double-metaphone 2.0.1 npm package page shows 2.0.1 (npmjs.com) Not found in the major 2025 npm compromise advisories I checked (see note below). (cisa.gov)
franc 6.2.0 npm package page shows 6.2.0 (npmjs.com) Not found in major 2025 advisories I checked. (cisa.gov)
tinyld 1.3.4 npm package page shows 1.3.4 (npmjs.com) Not found in major 2025 advisories I checked. (cisa.gov)
stopword 3.1.5 Snyk lists latest as 3.1.5 and “no vulnerabilities found in the latest version” (security.snyk.io) Not found in major 2025 advisories I checked. (cisa.gov)

Note on “compromised in 2025”: There were large-scale npm supply-chain compromise campaigns in 2025 (notably “Shai-Hulud”) impacting 500+ packages, with guidance to review dependencies and rotate credentials. (cisa.gov) I did not find these four package names called out in the major public advisories I reviewed.

Citations:


🌐 Web query:

compromise compromise-dates compromise-numbers latest versions npm 2025

💡 Result:

Latest npm versions (as of Jan 30, 2026):

  • compromise: 14.14.5 (latest) [1][2]
  • compromise-dates: 3.7.1 (latest) [3][4]
  • compromise-numbers: 1.4.0 (latest) [5]

Sources: [1] Socket (compromise) [2] Snyk (compromise) [3] Socket (compromise-dates) [4] Snyk (compromise-dates) [5] Socket (compromise-numbers)


🌐 Web query:

cld3-asm @echogarden/fasttext-wasm latest versions npm 2025

💡 Result:

  • cld3-asm: v4.0.0 (npm “latest”; published ~2 years ago, i.e., no newer npm release during 2025). [1]
  • @echogarden/fasttext-wasm: v0.1.0 (npm “latest”; published ~2 years ago, i.e., no newer npm release during 2025). [2]

Sources: npm package pages for cld3-asm and @echogarden/fasttext-wasm. [1] [2]


Update compromise-dates and cld3-asm to latest major versions.

The added dependencies correctly support the NLP utilities. Most versions are current:

  • double-metaphone (2.0.1), franc (6.2.0), tinyld (1.3.4), stopword (3.1.5), and compromise-numbers (1.4.0) match their latest releases
  • compromise (^14.14.0) is compatible with latest 14.14.5
  • None flagged in 2025 npm supply-chain advisories; stopword reports no vulnerabilities

However, two dependencies lag behind major releases:

  • compromise-dates at ^2.0.1; latest is 3.7.1
  • cld3-asm at ^3.1.1; latest is 4.0.0

Consider updating these to their latest major versions to access improvements and ensure long-term maintainability.

🤖 Prompt for AI Agents
In `@package.json` around lines 23 - 34, Update the two out-of-date dependencies
in package.json: bump "compromise-dates" from ^2.0.1 to ^3.7.1 and "cld3-asm"
from ^3.1.1 to ^4.0.0, then run your package manager to refresh the lockfile
(npm install / pnpm install / yarn install) and run the test suite to surface
any breaking changes; if anything breaks, adapt code that imports or uses
compromise-dates or cld3-asm APIs to the new major versions.

Comment on lines +50 to +95
| Mode | Cost | Speed | Quality | Use Case |
|------|------|-------|---------|----------|
| `hybrid` | ~$0.0003/batch | <10ms-2s | High | **Recommended** - best of both |
| `full` | ~$0.001/batch | ~500ms-2s | Highest | Complex multi-agent coordination |
| `signals-only` | $0 | <10ms | Medium | High-volume, cost-sensitive |

**Hybrid Mode (Recommended):**
- Runs signal rules first on ALL decisions
- If ALL are high-confidence (≥75%): **Skip LLM entirely** → $0 cost!
- If any are ambiguous (<75%): Call LLM for the batch
- Typical savings: ~70% of batches skip LLM completely

That's it! Your agents now coordinate through batching.

---

## 💰 Economics: Why This Plugin Saves Money

### The Problem

**Traditional multi-agent approach:**
```
1 message arrives
Agent 1: "Should I respond?" (LLM call $0.001)
Agent 1: "Yes! Here's my response..." (LLM call $0.01)
Agent 2: "Should I respond?" (LLM call $0.001)
Agent 2: "Yes! Here's my response..." (LLM call $0.01)
... × 8 agents = 16 LLM calls per message
```

**Cost per message: $0.088** (16 calls × avg $0.0055/call)

### The Solution

**Time-sliced batching:**
```
8 messages arrive across 5 rooms within 500ms
One planning call: "Here are 8 messages for 8 agents across 5 rooms.
Who should respond to what?" (1 LLM call)
Agents execute their assigned responses (8 LLM calls, not 64)
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

markdownlint warnings: add code-fence languages and normalize table pipes.

MD040 (missing fenced code block language) and MD060 (table column spacing) are flagged in multiple sections (e.g., around Line 50 for tables and Line 71 for fenced blocks). Consider adding language identifiers (e.g., text, bash, env) and running a formatter to normalize pipe spacing.

🧰 Tools
🪛 markdownlint-cli2 (0.20.0)

[warning] 51-51: Table column style
Table pipe is missing space to the right for style "compact"

(MD060, table-column-style)


[warning] 51-51: Table column style
Table pipe is missing space to the left for style "compact"

(MD060, table-column-style)


[warning] 51-51: Table column style
Table pipe is missing space to the right for style "compact"

(MD060, table-column-style)


[warning] 51-51: Table column style
Table pipe is missing space to the left for style "compact"

(MD060, table-column-style)


[warning] 51-51: Table column style
Table pipe is missing space to the right for style "compact"

(MD060, table-column-style)


[warning] 51-51: Table column style
Table pipe is missing space to the left for style "compact"

(MD060, table-column-style)


[warning] 51-51: Table column style
Table pipe is missing space to the right for style "compact"

(MD060, table-column-style)


[warning] 51-51: Table column style
Table pipe is missing space to the left for style "compact"

(MD060, table-column-style)


[warning] 51-51: Table column style
Table pipe is missing space to the right for style "compact"

(MD060, table-column-style)


[warning] 51-51: Table column style
Table pipe is missing space to the left for style "compact"

(MD060, table-column-style)


[warning] 71-71: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 88-88: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
In `@README.md` around lines 50 - 95, The README has markdownlint issues: add
explicit fenced code block languages and normalize table pipe spacing; update
each triple-backtick block (e.g., the examples under "The Problem" and "The
Solution") to include a language token like ```text or ```bash, and reformat the
ASCII tables (the Mode/Cost/Speed/Quality table) so pipes align consistently
(single space padding around cells) or run a Markdown formatter (markdownlint
--fix / prettier) to normalize table pipes and fix MD040/MD060 automatically;
search for the fenced blocks and the table header in the snippet to locate the
changes.

Comment on lines +27 to +50
it('MP10: should process simple greeting in < 5ms (after warmup)', async () => {
const mockRuntime = createMockRuntime();
const message: Memory = {
id: 'test-1',
entityId: 'user-1',
roomId: 'room-1',
content: { text: 'hello' },
createdAt: Date.now(),
};

const ctx = createPipelineContext(message, mockRuntime);

// Warmup run (initialize compromise)
await executeMP1(ctx);
await executeMP10(ctx);

// Actual benchmark run
const start = performance.now();
await executeMP10(ctx);
const duration = performance.now() - start;

console.log(` ⚡ MP10 (greeting): ${duration.toFixed(2)}ms`);
expect(duration).toBeLessThan(10); // Should be < 10ms after warmup
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Test name and assertion threshold mismatch.

The test name claims < 5ms but the assertion uses < 10ms. This inconsistency is misleading when reviewing test results.

Proposed fix: Align test name with assertion
-  it('MP10: should process simple greeting in < 5ms (after warmup)', async () => {
+  it('MP10: should process simple greeting in < 10ms (after warmup)', async () => {

Or tighten the assertion to match the name:

-    expect(duration).toBeLessThan(10); // Should be < 10ms after warmup
+    expect(duration).toBeLessThan(5); // Should be < 5ms after warmup
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
it('MP10: should process simple greeting in < 5ms (after warmup)', async () => {
const mockRuntime = createMockRuntime();
const message: Memory = {
id: 'test-1',
entityId: 'user-1',
roomId: 'room-1',
content: { text: 'hello' },
createdAt: Date.now(),
};
const ctx = createPipelineContext(message, mockRuntime);
// Warmup run (initialize compromise)
await executeMP1(ctx);
await executeMP10(ctx);
// Actual benchmark run
const start = performance.now();
await executeMP10(ctx);
const duration = performance.now() - start;
console.log(` ⚡ MP10 (greeting): ${duration.toFixed(2)}ms`);
expect(duration).toBeLessThan(10); // Should be < 10ms after warmup
});
it('MP10: should process simple greeting in < 10ms (after warmup)', async () => {
const mockRuntime = createMockRuntime();
const message: Memory = {
id: 'test-1',
entityId: 'user-1',
roomId: 'room-1',
content: { text: 'hello' },
createdAt: Date.now(),
};
const ctx = createPipelineContext(message, mockRuntime);
// Warmup run (initialize compromise)
await executeMP1(ctx);
await executeMP10(ctx);
// Actual benchmark run
const start = performance.now();
await executeMP10(ctx);
const duration = performance.now() - start;
console.log(` ⚡ MP10 (greeting): ${duration.toFixed(2)}ms`);
expect(duration).toBeLessThan(10); // Should be < 10ms after warmup
});
🤖 Prompt for AI Agents
In `@src/__tests__/benchmark-nlp.test.ts` around lines 27 - 50, The test name
"MP10: should process simple greeting in < 5ms (after warmup)" and the assertion
expect(duration).toBeLessThan(10) are inconsistent; update either the test name
or the assertion to match: locate the test case (the it(...) string) and either
change the description to "< 10ms" to match executeMP10's expectation, or
tighten the assertion to expect(duration).toBeLessThan(5) if you want to enforce
the <5ms target; the relevant symbols to edit are the it(...) description and
the expect(...) line in the test that uses createMockRuntime(),
createPipelineContext(), executeMP1, and executeMP10.

Comment on lines +345 to +383
if (!memoryCleanupInterval) {
let lastLogTime = 0;

memoryCleanupInterval = setInterval(() => {
const now = Date.now();

// Log memory stats periodically (every 5 minutes)
if (now - lastLogTime >= MEMORY_LOG_INTERVAL_MS) {
lastLogTime = now;
const stats = sharedMessageService?.getMemoryStats();
if (stats) {
const mem = process.memoryUsage();
logger.info(
{
src: 'plugin:autonomous:memory',
heapUsedMB: Math.round(mem.heapUsed / 1024 / 1024),
rssMB: Math.round(mem.rss / 1024 / 1024),
...stats,
},
`[Memory] heap=${Math.round(mem.heapUsed / 1024 / 1024)}MB, rss=${Math.round(mem.rss / 1024 / 1024)}MB, ` +
`rooms=${stats.roomActivity}, convTracker=${stats.conversationTracker.totalRooms}`
);
}
}

// Cleanup stale data
const cleanupResult = sharedMessageService?.cleanupStaleData(MEMORY_CLEANUP_INTERVAL_MS);
if (cleanupResult && (cleanupResult.roomRatesCleaned > 0 || cleanupResult.roomActivityCleaned > 0)) {
logger.info(
{ src: 'plugin:autonomous:cleanup', ...cleanupResult },
`[Cleanup] Cleaned ${cleanupResult.roomRatesCleaned + cleanupResult.roomActivityCleaned} stale entries`
);
}
}, MEMORY_CLEANUP_INTERVAL_MS);

logger.info(
`[plugin-autonomous] 🧹 Memory monitoring started (cleanup every ${MEMORY_CLEANUP_INTERVAL_MS / 60000}min, log every ${MEMORY_LOG_INTERVAL_MS / 60000}min)`
);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Memory cleanup interval is never cleared on shutdown.

The memoryCleanupInterval is set but never cleared. If the plugin is unloaded, reinitialized, or during tests, this interval will continue running, potentially causing memory leaks or errors when sharedMessageService becomes stale.

Consider clearing the interval in a shutdown/cleanup hook if the runtime supports it.

🤖 Prompt for AI Agents
In `@src/index.ts` around lines 345 - 383, The interval created in the block that
assigns memoryCleanupInterval never gets cleared, so add explicit teardown logic
to clearInterval(memoryCleanupInterval) and set memoryCleanupInterval =
undefined when the plugin/system shuts down or before reinitializing; hook this
into the runtime's shutdown/unload handler (e.g., a provided cleanup/shutdown
function or process.on('exit')/module unload hook) and ensure any code path that
re-creates the interval checks and clears an existing memoryCleanupInterval
first; also ensure you stop using sharedMessageService after clearing the
interval to avoid accessing stale state (references: memoryCleanupInterval,
sharedMessageService, cleanupStaleData, MEMORY_CLEANUP_INTERVAL_MS).

Comment on lines +67 to +72
private async processPendingBatch(): Promise<void> {
// If already processing, wait for it
if (this.batchPromise) {
await this.batchPromise;
return;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Pending agents can be stranded while a batch is in-flight.

If processPendingBatch runs during an active batchPromise, it awaits and returns, leaving any queued agents unprocessed unless another registration happens later. This can stall variation generation for those agents.

✅ Suggested fix
-    if (this.batchPromise) {
-      await this.batchPromise;
-      return;
-    }
+    if (this.batchPromise) {
+      await this.batchPromise;
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
private async processPendingBatch(): Promise<void> {
// If already processing, wait for it
if (this.batchPromise) {
await this.batchPromise;
return;
}
private async processPendingBatch(): Promise<void> {
// If already processing, wait for it
if (this.batchPromise) {
await this.batchPromise;
}
🤖 Prompt for AI Agents
In `@src/name-variation-registry.ts` around lines 67 - 72, processPendingBatch
currently awaits an existing this.batchPromise and returns, which leaves any new
queued agents unprocessed; change it so that after awaiting an in-flight
this.batchPromise it does not return but continues to check and drain the
pending queue (e.g., loop or fallthrough to the existing batch-start logic), so
that any agents added while the previous batch was running get processed
immediately; reference the processPendingBatch method, this.batchPromise, and
the pendingAgents queue (or whatever internal queue name) and ensure you
preserve setting/clearing this.batchPromise around new batch work to avoid
races.

Comment on lines +190 to +239
export async function getActionUsageHistory(runtime: IAgentRuntime): Promise<ActionUsageRecord[]> {
try {
// Try to load from memory
const memories = await runtime.getMemories({
roomId: runtime.agentId,
count: 50,
tableName: 'phase3-action-usage',
});

const records: ActionUsageRecord[] = [];
const seenActions = new Map<string, { successes: number; failures: number }>();

// Process memories newest first
for (const mem of memories) {
const meta = mem.metadata as any;
if (!meta?.actionName) continue;

const actionName = meta.actionName;
const outcome = meta.outcome || 'error';

// Track counts per action
if (!seenActions.has(actionName)) {
seenActions.set(actionName, { successes: 0, failures: 0 });
}
const counts = seenActions.get(actionName)!;

// Apply keep limits
if (outcome === 'success') {
if (counts.successes >= PHASE3_ACTION_MEMORY_SUCCESS_KEEP) continue;
counts.successes++;
} else {
if (counts.failures >= PHASE3_ACTION_MEMORY_FAILURE_KEEP) continue;
counts.failures++;
}

records.push({
actionName,
draft: meta.draft || '',
outcome,
feedback: meta.feedback,
timestamp: mem.createdAt || Date.now(),
});
}

return records;
} catch {
// No action history yet
return [];
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Find the IAgentRuntime interface and getMemories definition
rg -n 'interface IAgentRuntime|class.*IAgentRuntime|getMemories' --type=ts -B2 -A10 | head -150

Repository: elizaos-plugins/plugin-autonomous

Length of output: 7230


🏁 Script executed:

# Look for the actual implementation of getMemories
fd -e ts -e js | xargs grep -l "getMemories" | head -20

Repository: elizaos-plugins/plugin-autonomous

Length of output: 281


🏁 Script executed:

# Check action-context.ts to see complete context
wc -l src/phase3/action-context.ts

Repository: elizaos-plugins/plugin-autonomous

Length of output: 109


🏁 Script executed:

# Check imports and interface definitions in action-context.ts
head -50 src/phase3/action-context.ts | grep -n "import\|interface\|export"

Repository: elizaos-plugins/plugin-autonomous

Length of output: 329


🏁 Script executed:

# Search for IAgentRuntime definition in the plugin
rg -n "interface IAgentRuntime|type IAgentRuntime" --type=ts

Repository: elizaos-plugins/plugin-autonomous

Length of output: 2671


🏁 Script executed:

# Check if getMemories has JSDoc comments explaining ordering
rg -n "getMemories" --type=ts -B5 | grep -E "@param|@returns|newest|oldest|order|sort" | head -30

Repository: elizaos-plugins/plugin-autonomous

Length of output: 158


🏁 Script executed:

# Look at action-memory.ts which has similar logic - check if it sorts
rg -n "getMemories\|sort\|createdAt" src/phase3/action-memory.ts -A15 | head -80

Repository: elizaos-plugins/plugin-autonomous

Length of output: 59


🏁 Script executed:

# Read the full function context
sed -n '185,239p' src/phase3/action-context.ts

Repository: elizaos-plugins/plugin-autonomous

Length of output: 1652


🏁 Script executed:

# Check if there's any sorting or reverse() calls in the file
rg -n "sort|reverse|createdAt" src/phase3/action-context.ts

Repository: elizaos-plugins/plugin-autonomous

Length of output: 128


🏁 Script executed:

# Compare with action-memory.ts to see if it has similar patterns
cat src/phase3/action-memory.ts | head -120

Repository: elizaos-plugins/plugin-autonomous

Length of output: 3550


🏁 Script executed:

# Check if there are any tests or comments about getMemories ordering
rg -n "getMemories" --type=ts -B3 -A3 | grep -E "sort|order|newest|oldest|reverse" | head -20

Repository: elizaos-plugins/plugin-autonomous

Length of output: 59


🏁 Script executed:

# Check planning-engine.ts for more clues about ordering expectations
sed -n '1480,1510p' src/planning-engine.ts

Repository: elizaos-plugins/plugin-autonomous

Length of output: 1026


🏁 Script executed:

# Search for other .reverse() calls on getMemories results
rg -n "getMemories" --type=ts -A10 | grep -E "reverse|sort" -B5 | head -50

Repository: elizaos-plugins/plugin-autonomous

Length of output: 59


🏁 Script executed:

# Check if action-memory.ts also uses reverse
sed -n '80,110p' src/phase3/action-memory.ts

Repository: elizaos-plugins/plugin-autonomous

Length of output: 824


Add .reverse() to process memories in chronological order (newest-first).

Based on the codebase pattern in planning-engine.ts (line 1493), runtime.getMemories returns results in oldest-first order. The current implementation processes memories without reversing, which means the "keep last N successes/failures" logic incorrectly applies to the oldest records rather than the newest. Add .reverse() after fetching memories to ensure the retention limits capture recent patterns as intended.

Suggested fix
const memories = await runtime.getMemories({
  roomId: runtime.agentId,
  count: 50,
  tableName: 'phase3-action-usage',
});

+ const records: ActionUsageRecord[] = [];
+ const seenActions = new Map<string, { successes: number; failures: number }>();
+
+ // Process memories newest first
+ for (const mem of memories.reverse()) {
🤖 Prompt for AI Agents
In `@src/phase3/action-context.ts` around lines 190 - 239, The
getActionUsageHistory function currently iterates memories returned by
runtime.getMemories in oldest-first order, so the keep-last-N logic misapplies
to older entries; after fetching memories (the result of runtime.getMemories)
call .reverse() (e.g., replace the fetched memories assignment with a reversed
array) so that the loop over memories processes newest entries first and the
seenActions success/failure counters (seenActions map,
counts.successes/counts.failures) correctly enforce
PHASE3_ACTION_MEMORY_SUCCESS_KEEP and PHASE3_ACTION_MEMORY_FAILURE_KEEP on
recent records.

Comment on lines +250 to +254
// Tasks
const selectedTasks = selectTasksForBudget(context.tasks, [], budget);
budget = consumeBudget(budget, 'tasks', selectedTasks.reduce(
(sum, t) => sum + estimateTokens(t.name + (t.description || '')) + 30, 0
));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's check the repository structure to understand the codebase
fd --type f --name "*.ts" | head -20

Repository: elizaos-plugins/plugin-autonomous

Length of output: 327


🏁 Script executed:

# Search for focusedIds and related patterns
rg -n 'focusedIds|focusedTaskIds|focused.*[Ii]d' --type=ts -C2 | head -100

Repository: elizaos-plugins/plugin-autonomous

Length of output: 1055


🏁 Script executed:

# Find the selectTasksForBudget function
rg -n 'selectTasksForBudget' --type=ts -B2 -A10

Repository: elizaos-plugins/plugin-autonomous

Length of output: 2009


🏁 Script executed:

# Check the CycleContext type definition
rg -n 'type CycleContext|interface CycleContext' --type=ts -A20

Repository: elizaos-plugins/plugin-autonomous

Length of output: 1167


🏁 Script executed:

# Get the full composeWithBudget function to see its signature and all parameters
rg -n 'export function composeWithBudget' --type=ts -A50

Repository: elizaos-plugins/plugin-autonomous

Length of output: 3226


🏁 Script executed:

# Check if there's a focused field added to CycleContext dynamically or in extended interfaces
rg -n 'focused' --type=ts -B2 -A5 | grep -E '(focused|CycleContext|interface|type)' | head -50

Repository: elizaos-plugins/plugin-autonomous

Length of output: 1441


🏁 Script executed:

# Look at the parser output and how focused tasks are used
rg -n 'focused:' --type=ts -B5 -A10

Repository: elizaos-plugins/plugin-autonomous

Length of output: 1741


🏁 Script executed:

# Find all calls to composeWithBudget
rg -n 'composeWithBudget' --type=ts -B2 -A5

Repository: elizaos-plugins/plugin-autonomous

Length of output: 883


🏁 Script executed:

# Check if TriageOutput is used alongside composeWithBudget calls
rg -n 'TriageOutput|parseTriage' --type=ts -B3 -A3

Repository: elizaos-plugins/plugin-autonomous

Length of output: 2080


🏁 Script executed:

# Check the full CycleContext definition
rg -n 'export interface CycleContext' --type=ts -A50

Repository: elizaos-plugins/plugin-autonomous

Length of output: 2700


Add focused task IDs to CycleContext to enable prioritization in selectTasksForBudget

selectTasksForBudget is designed to prioritize focused tasks but receives an empty array at line 251 because CycleContext doesn't carry the focused task data. The parser extracts focused tasks into TriageOutput, but this data never reaches the budget composition function. Either add a focused field to CycleContext to propagate the parsed focused tasks, or add a TODO comment if this prioritization is not yet intended.

🤖 Prompt for AI Agents
In `@src/phase3/budget.ts` around lines 250 - 254, selectTasksForBudget is being
passed an empty focused list because CycleContext doesn't carry parsed focused
task IDs from TriageOutput; update CycleContext to include a focused: string[]
(or appropriate type) field populated from the parser/TriageOutput and pass that
field into the selectTasksForBudget call (replace the [] with context.focused),
or if prioritization isn't intended yet add a clear TODO in the budget
composition code noting that focused tasks must be propagated from TriageOutput
into CycleContext before calling selectTasksForBudget; reference CycleContext,
TriageOutput, and selectTasksForBudget when making the change.

Comment on lines +350 to +365
async function createTask(op: TaskOperation, runtime: IAgentRuntime): Promise<TaskResult> {
const taskName = op.name || 'Unnamed Task';
const taskId = stringToUuid(`phase3-task-${Date.now()}-${taskName}`);

const task: Task = {
id: taskId,
name: taskName,
description: op.context || taskName,
roomId: runtime.agentId, // Self-room for Phase 3 tasks
tags: ['phase3', op.type || 'action'],
metadata: {
type: op.type || 'action',
createdBy: 'phase3',
context: op.context,
},
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential duplicate task IDs when called rapidly.

stringToUuid(\phase3-task-${Date.now()}-${taskName}`)can produce the same UUID ifcreateTask` is called multiple times within the same millisecond with the same task name. This could cause task overwrites or conflicts.

Consider adding a random component or incrementing counter:

🔧 Proposed fix
 async function createTask(op: TaskOperation, runtime: IAgentRuntime): Promise<TaskResult> {
   const taskName = op.name || 'Unnamed Task';
-  const taskId = stringToUuid(`phase3-task-${Date.now()}-${taskName}`);
+  const taskId = stringToUuid(`phase3-task-${Date.now()}-${Math.random().toString(36).slice(2, 9)}-${taskName}`);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async function createTask(op: TaskOperation, runtime: IAgentRuntime): Promise<TaskResult> {
const taskName = op.name || 'Unnamed Task';
const taskId = stringToUuid(`phase3-task-${Date.now()}-${taskName}`);
const task: Task = {
id: taskId,
name: taskName,
description: op.context || taskName,
roomId: runtime.agentId, // Self-room for Phase 3 tasks
tags: ['phase3', op.type || 'action'],
metadata: {
type: op.type || 'action',
createdBy: 'phase3',
context: op.context,
},
};
async function createTask(op: TaskOperation, runtime: IAgentRuntime): Promise<TaskResult> {
const taskName = op.name || 'Unnamed Task';
const taskId = stringToUuid(`phase3-task-${Date.now()}-${Math.random().toString(36).slice(2, 9)}-${taskName}`);
const task: Task = {
id: taskId,
name: taskName,
description: op.context || taskName,
roomId: runtime.agentId, // Self-room for Phase 3 tasks
tags: ['phase3', op.type || 'action'],
metadata: {
type: op.type || 'action',
createdBy: 'phase3',
context: op.context,
},
};
🤖 Prompt for AI Agents
In `@src/phase3/executor.ts` around lines 350 - 365, The task ID generation in
createTask currently uses stringToUuid(`phase3-task-${Date.now()}-${taskName}`)
which can collide when called multiple times within the same millisecond; update
createTask to append a high-entropy or monotonic component (e.g., a
crypto-random suffix or an incrementing counter) to the seed before calling
stringToUuid so each invocation produces a unique taskId, ensuring the change is
applied where taskId is set and preserves existing naming (taskName) and
metadata logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant