Skip to content

feat(stem): close BullMQ parity gaps across events, control, and stores#17

Merged
kingwill101 merged 23 commits intomasterfrom
feat/close-bullmq-gap-impl
Feb 24, 2026
Merged

feat(stem): close BullMQ parity gaps across events, control, and stores#17
kingwill101 merged 23 commits intomasterfrom
feat/close-bullmq-gap-impl

Conversation

@kingwill101
Copy link
Owner

@kingwill101 kingwill101 commented Feb 24, 2026

Summary

  • add first-class queue custom events via QueueEventsProducer / QueueEvents
  • introduce shared StemEvent and migrate signal/event payloads to emit StemEvent
  • expand StemSignals with worker-scoped listener filters and canonical event naming
  • add queue pause/resume worker control commands and persistent pause-state handling
  • add SQLite revoke-store implementation, migration, and adapter wiring
  • add group rate-limit task options (groupRateLimit, groupRateKey, failure mode)
  • add first-class Canvas batch APIs (submitBatch, inspectBatch) with durable lifecycle status
  • extend adapter contract suites for queue events and revoke store
  • update .site docs and snippets for all new user-facing behavior, including BullMQ parity mapping updates

Testing

  • dart test packages/stem/test/unit/signals/signal_test.dart
  • dart test packages/stem/test/unit/signals/stem_signals_test.dart
  • dart test packages/stem/test/unit/core/stem_event_test.dart
  • dart test packages/stem/test/unit/core/queue_events_test.dart
  • dart test packages/stem/test/unit/worker/worker_test.dart
  • dart test packages/stem_adapter_tests
  • dart test packages/stem_sqlite/test/control
  • dart test packages/stem_sqlite/test/broker/sqlite_broker_test.dart
  • npm --prefix .site run build

Summary by CodeRabbit

  • New Features

    • Queue-scoped custom events, durable batch submissions/status inspection, group-scoped rate limits, worker CLI pause/resume, SQLite revoke store, unified event model for lifecycle/workflow signals, and a zone-scoped clock for deterministic time.
  • Documentation

    • New "Stem vs BullMQ" comparison, expanded Quick Starts and guides for queue events, batches, revoke store, rate limits, CLI controls, and signals.
  • Tests

    • Added contract and unit tests for queue events, revoke store, batches, signals, rate-limiting, and worker/CLI flows.

@coderabbitai
Copy link

coderabbitai bot commented Feb 24, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds queue-scoped custom events, a durable batch submission API with inspectable lifecycle, unified StemEvent contract and expanded signals, group-scoped rate limiting, SQLite-backed revoke store (migrations/ORM/adapter), worker pause/resume CLI, broadcast-only consumer support, a zone-aware clock (stemNow), many docs, examples, and tests.

Changes

Cohort / File(s) Summary
Documentation
.site/docs/brokers/sqlite.md, .site/docs/comparisons/stem-vs-bullmq.md, .site/docs/core-concepts/..., .site/docs/getting-started/index.md, .site/docs/workers/worker-control.md, .site/sidebars.ts
New and expanded docs: queue events, batches, queue-events quickstarts, pause/resume CLI, revoke-store SQLite support, rate-limiting, signals refactor, and a Stem vs BullMQ comparison; sidebar updated.
Core — Events & Signals
packages/stem/lib/src/core/stem_event.dart, packages/stem/lib/src/signals/...
Introduce StemEvent interface; many payload types now implement it; Signal types constrained to StemEvent; SignalContext carries event and timestamp; added canonical signal names and workflow/worker signals with worker-scoped filters.
Queue Events API & Tests
packages/stem/lib/src/core/queue_events.dart, packages/stem/test/unit/core/queue_events_test.dart, packages/stem/example/docs_snippets/lib/queue_events.dart
Add QueueCustomEvent, QueueEventsProducer, QueueEvents listener (per-queue broadcast/fan-out), stream/filter API, validation, unit tests and examples.
Canvas Batch API
packages/stem/lib/src/canvas/canvas.dart, packages/stem/example/docs_snippets/lib/canvas_batch.dart, packages/stem/test/unit/canvas/canvas_test.dart
Add BatchLifecycleState, BatchSubmission, BatchStatus, submitBatch and inspectBatch, locking to serialize submissions, example snippet and unit tests.
Group Rate Limiting
packages/stem/lib/src/core/contracts.dart, packages/stem/example/docs_snippets/lib/rate_limiting.dart, packages/stem/test/unit/core/contracts_test.dart
Add TaskOptions groupRateLimit fields, groupRateKey/header, RateLimiterFailureMode enum, (de)serialization, copyWith/toJson and example/test coverage.
CLI — Worker Queue Control
packages/stem_cli/lib/src/cli/worker.dart, packages/stem/example/docs_snippets/lib/cli_control.dart, packages/stem_cli/test/unit/*
Add stem worker pause and stem worker resume commands with shared base class, CLI wiring, output changes, and pause/resume tests.
SQLite Revoke Store & ORM
packages/stem_sqlite/lib/src/control/sqlite_revoke_store.dart, .../models/stem_revoke_entry*.dart, .../database/migrations/*, .../orm_registry.g.dart, src/workflow/sqlite_factories.dart, src/stack/sqlite_adapter.dart, packages/stem_sqlite/lib/stem_sqlite.dart, tests
Implement SqliteRevokeStore (connect/open/fromDataSource), add migration and generated ORM model/codec/DTOs, register in ORM registry, provide factory and adapter wiring, exports and integration tests.
Adapter Contract Tests & Capabilities
packages/stem_adapter_tests/lib/src/*, packages/stem_adapter_tests/test/*
Add queue-events and revoke-store contract suites, capability classes (QueueEventsContractCapabilities, RevokeStoreContractCapabilities), runners and adapter test wiring for in-memory/redis/postgres/sqlite.
Broker Consumer Broadcast Support
packages/stem_postgres/.../postgres_broker.dart, packages/stem_redis/.../redis_broker.dart, packages/stem_sqlite/test/broker/sqlite_broker_test.dart
Support broadcast-only consumption (nullable queue) in consumer runners, introduce queueLabel logging, guard loops for broadcast-only flows, and add queue-events contract tests for adapters.
Clock & Time refactor
packages/stem/lib/src/core/clock.dart, many files across packages (envelope, backends, brokers, scheduler, CLI, dashboard, tests)
Introduce StemClock, stemNow(), withStemClock(); replace DateTime.now() usages with stemNow() across codebase for consistent, testable time handling.
Signals & Worker internals
packages/stem/lib/src/worker/worker.dart, packages/stem/lib/src/workflow/..., packages/stem/lib/stem.dart
Worker pause/resume state and control-plane handling, group-rate-limit enforcement in delivery path, WorkerEvent implements StemEvent, and exports updated to include clock/queue_events/stem_event.
Examples & Tests
packages/stem/example/docs_snippets/*, packages/stem/test/unit/*, packages/stem_cli/test/*, packages/stem_postgres/test/..., packages/stem_redis/test/..., packages/stem_sqlite/test/...
New/updated examples for queue events, canvas_batch, sqlite revoke-store, signals; extensive unit and integration tests for new features and adapter contracts.
Generated ORM & Large Codegen
packages/stem_sqlite/lib/src/models/stem_revoke_entry.orm.dart, orm_registry.g.dart
Large generated ORM additions for StemRevokeEntry (model, codecs, DTOs, tracked types, predicates, query helpers) and updated registry indices.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Producer as QueueEventsProducer
    participant Broker
    participant Listener as QueueEvents

    Client->>Producer: emit(queue, eventName, payload, headers)
    Producer->>Broker: publish(broadcastChannel, envelope)
    Broker->>Listener: deliver(envelope)
    Listener->>Listener: validate & convert -> QueueCustomEvent
    Listener-->>Client: stream QueueCustomEvent (on / events)
    Client->>Listener: close()
    Listener->>Broker: cancel subscription
Loading
sequenceDiagram
    participant Caller
    participant Canvas
    participant GroupMgr
    participant Broker as Worker/Broker

    Caller->>Canvas: submitBatch(signatures)
    Canvas->>GroupMgr: create durable group (batchId)
    Canvas->>Broker: dispatch grouped tasks (groupId=batchId)
    Canvas-->>Caller: return BatchSubmission(batchId, taskIds)
    Caller->>Canvas: inspectBatch(batchId)
    Canvas->>GroupMgr: read group status
    GroupMgr-->>Canvas: group lifecycle snapshot
    Canvas-->>Caller: BatchStatus(state, counts, meta)
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Poem

🐰 I hopped through queues and little code glades,
I stitched batch ids, events, and revoke-store shades,
Workers pause and signals hum in tune,
SQLite listens under the moon,
A rabbit cheers these bright new upgrades. 🥕✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately captures the primary changes: closing BullMQ parity gaps through events, control commands, and persistence stores, which aligns with the major additions of queue events, signals enhancements, worker pause/resume, and SQLite revoke store.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/close-bullmq-gap-impl

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

Package publishing

Package Version Status Publish tag (post-merge)
package:stem 0.1.1 ready to publish stem-v0.1.1
package:stem_adapter_tests 0.1.1 ready to publish stem_adapter_tests-v0.1.1
package:stem_builder 0.1.0 ready to publish stem_builder-v0.1.0
package:stem_cli 0.1.0 already published at pub.dev
package:stem_memory 0.1.0 ready to publish stem_memory-v0.1.0
package:stem_postgres 0.1.0 already published at pub.dev
package:stem_redis 0.1.1 ready to publish stem_redis-v0.1.1
package:stem_sqlite 0.1.1 ready to publish stem_sqlite-v0.1.1

Documentation at https://github.com/dart-lang/ecosystem/wiki/Publishing-automation.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6321d54294

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 31

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (7)
packages/stem_adapter_tests/README.md (1)

29-47: ⚠️ Potential issue | 🟡 Minor

Add runRevokeStoreContractTests to Quick Start and document RevokeStoreContractCapabilities.

The package exports both runRevokeStoreContractTests and RevokeStoreContractCapabilities (with verifyPruneExpired flag), but neither appears in the README. Adapter authors will lack guidance on wiring up the revoke-store contract suite or tuning its behavior.

Add the test runner to the Quick Start snippet between runQueueEventsContractTests and the workflow factory setup:

runRevokeStoreContractTests(
  adapterName: 'my-adapter',
  factory: RevokeStoreContractFactory(create: createRevokeStore),
);

Add a new RevokeStoreContractCapabilities section to the Capability Flags table (after QueueEventsContractCapabilities):

| `verifyPruneExpired` | `true` | Expiry pruning tests | Verifies expired revoke records are cleaned up. |

Also add a corresponding Adapter Recipe example showing how to disable verifyPruneExpired if needed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/stem_adapter_tests/README.md` around lines 29 - 47, Add the missing
revoke-store contract docs: insert a call to runRevokeStoreContractTests (using
RevokeStoreContractFactory(create: createRevokeStore)) into the Quick Start
snippet between runQueueEventsContractTests and the workflow factory setup, and
add a new Capability Flags entry for RevokeStoreContractCapabilities with the
verifyPruneExpired flag (default true) describing its purpose ("Expiry pruning
tests — verifies expired revoke records are cleaned up"); also add a short
Adapter Recipe example showing how to construct RevokeStoreContractCapabilities
with verifyPruneExpired: false to disable those pruning tests.
packages/stem/test/unit/worker/worker_test.dart (1)

486-486: ⚠️ Potential issue | 🟡 Minor

Duplicate test name — "emits worker lifecycle signals on start and shutdown" at lines 344 and 486.

Both tests share the identical name, making it difficult to distinguish failures in test reports. Consider renaming one of them to give each a distinct description.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/stem/test/unit/worker/worker_test.dart` at line 486, Two tests share
the exact name "emits worker lifecycle signals on start and shutdown", causing
ambiguous test reports; locate the duplicate test declaration that uses that
string (the test(...) call in worker_test.dart) and rename one of them to a
distinct description (e.g. "emits worker lifecycle signals on start and stop" or
"emits lifecycle signals on start and shutdown for multiple workers") so each
test has a unique, descriptive name; keep the test body unchanged, only update
the test string passed to test().
.site/docs/brokers/sqlite.md (1)

74-88: ⚠️ Potential issue | 🟡 Minor

"Recommended layout" section is missing the revoke store file.

The section advises separate DB files for broker and backend but doesn't mention that the revoke store also requires its own file (e.g., stem_revoke.sqlite). The new configureSqliteRevokeStore() snippet uses a dedicated file, and the same WAL/single-writer constraints apply.

📝 Suggested addition
 A simple layout:
 

./stem_broker.sqlite # broker only
./stem_backend.sqlite # result backend only
+./stem_revoke.sqlite # revoke store only

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.site/docs/brokers/sqlite.md around lines 74 - 88, The "Recommended layout"
section omits the revoke store DB file; update the docs to mention that the
revoke store needs its own SQLite file (e.g., stem_revoke.sqlite) and that it
must follow the same WAL / single-writer guidance; reference the new
configureSqliteRevokeStore() snippet and add the revoke file to the example list
(./stem_revoke.sqlite   # revoke store only) so readers know to keep broker,
backend, and revoke stores in separate DB files.
.site/docs/core-concepts/canvas.md (1)

86-90: ⚠️ Potential issue | 🟡 Minor

Missing canvas_batch.dart in the "Running the examples" list.

The new batch example isn't included in the run instructions alongside chain, group, and chord.

Proposed fix
 cd packages/stem/example/docs_snippets
 dart run lib/canvas_chain.dart
 dart run lib/canvas_group.dart
+dart run lib/canvas_batch.dart
 dart run lib/canvas_chord.dart
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.site/docs/core-concepts/canvas.md around lines 86 - 90, Add the new batch
example to the run instructions list by including the missing command that runs
canvas_batch.dart alongside the existing entries (canvas_chain.dart,
canvas_group.dart, canvas_chord.dart) so the "Running the examples" block also
shows: dart run lib/canvas_batch.dart; update the snippet where those three run
commands are listed to include this fourth command.
.site/docs/workers/worker-control.md (1)

216-233: ⚠️ Potential issue | 🟡 Minor

Lifecycle-guards paragraph is orphaned under "Queue Pause/Resume".

Lines 226–233 ("Lifecycle guards can also recycle isolates automatically…") logically belong under "Shutdown Modes and Lifecycle Guards" (line 202) but now sit under the newly inserted "Queue Pause/Resume" heading. This makes the pause/resume section read as if lifecycle recycling is related to queue pausing.

Move the lifecycle-guards paragraph and its code snippet above the ## Queue Pause/Resume heading so it remains part of the shutdown section.

Proposed restructure
 - **Hard** immediately requeues active deliveries and terminates isolates.
 
 By default, workers install signal handlers ...
 with `WorkerLifecycleConfig(installSignalHandlers: false)` when embedding Stem
 inside a larger host that already owns signal routing.
 
+Lifecycle guards can also recycle isolates automatically:
+
+```dart file=<rootDir>/../packages/stem/example/docs_snippets/lib/worker_control.dart#worker-control-lifecycle
+
+```
+
+Recycling occurs after the active task finishes; the worker logs the recycle
+reason and spawns a fresh isolate before accepting new work.
+
 ## Queue Pause/Resume
 
 `stem worker pause` and `stem worker resume` target queue names (repeatable
 `--queue`) and optionally specific workers (`--worker`). Paused queues are
 requeued instead of executed until resumed.
 
 - Pause/resume state is persisted when a revoke store is configured.
 - Without a revoke store, pause/resume still works for active workers but does
   not survive worker restarts.
 
-Lifecycle guards can also recycle isolates automatically:
-
-```dart file=<rootDir>/../packages/stem/example/docs_snippets/lib/worker_control.dart#worker-control-lifecycle
-
-```
-
-Recycling occurs after the active task finishes; the worker logs the recycle
-reason and spawns a fresh isolate before accepting new work.
-
 ## Configuration Summary
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.site/docs/workers/worker-control.md around lines 216 - 233, Move the
lifecycle-guards paragraph and its associated code snippet (the block
referencing worker-control-lifecycle) out of the "Queue Pause/Resume" section
and place it directly under the "Shutdown Modes and Lifecycle Guards" heading so
that the sentence "Lifecycle guards can also recycle isolates automatically:"
plus the ```dart file=...#worker-control-lifecycle``` snippet and the following
sentence about recycling appear before the "## Queue Pause/Resume" heading;
ensure the snippet marker worker-control-lifecycle and the recycle explanation
remain intact and that "Queue Pause/Resume" begins immediately after that moved
content.
packages/stem_redis/test/integration/brokers/redis_broker_integration_test.dart (1)

22-22: 🧹 Nitpick | 🔵 Trivial

Dedicate a separate namespace variable for the queue events contract to avoid fragile shared state.

Both runBrokerContractTests.create and runQueueEventsContractTests.create write to the same contractNamespace variable. Their respective additionalBrokerFactory closures then read it. While Dart's default sequential test execution makes this safe today, the two suites clobbering the same variable is brittle — enabling --concurrency or reordering registration would cause the wrong namespace to be picked up.

♻️ Proposed refactor: dedicated namespace variables per contract suite
-  String? contractNamespace;
+  String? contractNamespace;
+  String? queueEventsContractNamespace;

Then update the queue events factory to use its own variable:

   runQueueEventsContractTests(
     adapterName: 'Redis',
     factory: QueueEventsContractFactory(
       create: () async {
         final namespace = _uniqueNamespace();
-        contractNamespace = namespace;
+        queueEventsContractNamespace = namespace;
         return RedisStreamsBroker.connect(
           redisUrl,
           namespace: namespace,
           ...
         );
       },
       ...
       additionalBrokerFactory: () async {
-        final namespace = contractNamespace;
+        final namespace = queueEventsContractNamespace;
         if (namespace == null || namespace.isEmpty) {
           throw StateError(
             'Redis queue-events contract requires primary broker namespace.',
           );
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/stem_redis/test/integration/brokers/redis_broker_integration_test.dart`
at line 22, The two test suites share the single variable contractNamespace
causing fragile shared state; introduce a dedicated variable (e.g.,
queueContractNamespace) for the queue-events suite and keep contractNamespace
for the broker suite, then update the additionalBrokerFactory closure passed to
runQueueEventsContractTests.create to read the new queueContractNamespace
instead of the shared contractNamespace so each suite writes/reads its own
namespace (refer to contractNamespace, runBrokerContractTests.create,
runQueueEventsContractTests.create, and the additionalBrokerFactory closures to
locate the changes).
packages/stem/lib/src/signals/payloads.dart (1)

661-733: ⚠️ Potential issue | 🟠 Major

occurredAt should be stable for control-command payloads.
Using DateTime.now() in the getter produces a different timestamp on every access. Capture the timestamp at construction and return a stored value.

🐛 Proposed fix
 class ControlCommandReceivedPayload implements StemEvent {
   /// Creates a new [ControlCommandReceivedPayload] instance.
-  const ControlCommandReceivedPayload({
+  ControlCommandReceivedPayload({
     required this.worker,
     required this.command,
-  });
+    DateTime? timestamp,
+  }) : _occurredAt = (timestamp ?? DateTime.now()).toUtc();
+
+  final DateTime _occurredAt;

@@
-  DateTime get occurredAt => DateTime.now().toUtc();
+  DateTime get occurredAt => _occurredAt;
@@
 class ControlCommandCompletedPayload implements StemEvent {
   /// Creates a new [ControlCommandCompletedPayload] instance.
-  const ControlCommandCompletedPayload({
+  ControlCommandCompletedPayload({
     required this.worker,
     required this.command,
     required this.status,
     this.response,
     this.error,
-  });
+    DateTime? timestamp,
+  }) : _occurredAt = (timestamp ?? DateTime.now()).toUtc();
+
+  final DateTime _occurredAt;

@@
-  DateTime get occurredAt => DateTime.now().toUtc();
+  DateTime get occurredAt => _occurredAt;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/stem/lib/src/signals/payloads.dart` around lines 661 - 733, The
occurredAt getters on ControlCommandReceivedPayload and
ControlCommandCompletedPayload are non-stable because they call DateTime.now()
on each access; fix by capturing the timestamp once during construction and
returning that stored value. Add a final DateTime field (e.g., _occurredAt or
occurredAtUtc) to each class, set it to DateTime.now().toUtc() in the
constructor (or accept an optional DateTime parameter defaulting to
now().toUtc()), and change the occurredAt getter to return that field instead of
calling DateTime.now() again; update the const constructors and any call sites
if you add an optional parameter.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.site/docs/comparisons/stem-vs-bullmq.md:
- Around line 46-48: The "## Update policy" heading is empty while the
procedural "When this matrix changes…" block currently sits after "## BullMQ
events parity notes"; move the entire "When this matrix changes…" block (the
procedural steps) so it appears directly under the "## Update policy" heading
(or delete the empty "## Update policy" heading if you prefer to keep the steps
where they are), ensuring the header and its body are contiguous and
semantically correct; look for the "## Update policy" and the paragraph
beginning "When this matrix changes" to relocate or remove accordingly.

In @.site/docs/core-concepts/queue-events.md:
- Around line 44-45: Change the phrase "built in to `QueueEvents`" to the
correct compound form "built into `QueueEvents`" in the documentation sentence
describing delivery behavior so the line reads: "Delivery follows the underlying
broker's broadcast behavior for active listeners (no historical replay API is
built into `QueueEvents`)."

In @.site/docs/core-concepts/rate-limiting.md:
- Around line 146-153: Update the docs for groupRateLimiterFailureMode to state
the default is "failOpen"; specifically, in the bullet list documenting
groupRateLimiterFailureMode (near the entries for groupRateLimit, groupRateKey,
groupRateKeyHeader) add a short sentence like "Default: failOpen" or annotate
the failOpen entry with "(default)" so readers know the API defaults to failOpen
when not specified; ensure the symbol name groupRateLimiterFailureMode and the
two options failOpen/failClosed are mentioned exactly as in the code/docs.

In @.site/docs/core-concepts/signals.md:
- Around line 36-37: Update the description for the signals to clarify ordering
and roles: change the `taskReceived` line to state it fires when the message is
claimed/dequeued, change the `taskPrerun` line to state it fires immediately
before the handler is invoked, and make the ordering explicit as `taskReceived →
taskPrerun → handler → taskPostrun` (keep `taskPostrun` as running after
completion) so readers can see the precise sequence and distinction between
`taskReceived` and `taskPrerun`.

In `@packages/stem_adapter_tests/lib/src/queue_events_contract_suite.dart`:
- Around line 245-251: The file-level mutable _counter used by _queueName() can
persist across suite runs in the same isolate; replace the top-level mutable
state by making the counter local to the generator or remove it entirely and
derive uniqueness from a non-mutable source (e.g., a UUID or Random value
combined with DateTime). Update _queueName to either maintain a local static
counter inside the function or build the suffix with
DateTime.now().microsecondsSinceEpoch plus a short random/UUID string so you no
longer need the top-level _counter variable.

In `@packages/stem_adapter_tests/lib/src/revoke_store_contract_suite.dart`:
- Around line 144-194: Add a subcase to the existing test that inserts a
RevokeEntry with namespace 'stem-contract-a' and expiresAt set to null via
current.upsertAll, then call current.pruneExpired(namespaceA, now) and assert
the pruned count does not remove that entry and that current.list(namespaceA)
still contains its taskId; reference the existing test scope using RevokeEntry,
current.upsertAll, current.pruneExpired, and current.list so the new assertion
verifies entries with null expiresAt survive pruning.

In `@packages/stem_adapter_tests/README.md`:
- Around line 72-77: Add an "Adapter Recipes" example showing how to opt out of
fan-out by demonstrating usage of QueueEventsContractCapabilities with
verifyFanout set to false; specifically add a snippet that calls
runQueueEventsContractTests with an adapterName (e.g.,
'single-listener-adapter'), a QueueEventsContractFactory(create: createBroker),
and settings: QueueEventsContractSettings(capabilities:
QueueEventsContractCapabilities(verifyFanout: false)), so adapter authors have a
copy-paste template for brokers that do not support fan-out.
- Around line 72-77: The README is missing documentation for the
RevokeStoreContractCapabilities class and its verifyPruneExpired flag; add a new
capability table titled "RevokeStoreContractCapabilities" mirroring the style of
other sections: include a row with Flag `verifyPruneExpired`, Default `true`,
Affects (describe what it affects, e.g., revoke store pruning tests or
prune-expired behavior), and Behavior when enabled (describe that it verifies
expired revocations are pruned). Ensure the table formatting matches the
existing markdown tables and place it alongside the other capability sections.

In `@packages/stem_cli/lib/src/cli/worker.dart`:
- Around line 1081-1095: The table header in _renderQueueControlReplies is
misleading for resume operations (e.g., WorkerResumeCommand) because
reply.payload['paused'] shows queues still paused after the operation; change
the header to make that explicit (e.g., "Paused queues (after)" or "Still
paused") and update the column formatting in the same function where the header
lines are written (the two dependencies.out.writeln calls and the subsequent row
formatting using reply.payload['paused']) so the label accurately reflects
"paused after" state; alternatively, if you prefer separate semantics, split
_renderQueueControlReplies into two renderers (one used by WorkerPauseCommand
and one by WorkerResumeCommand) with appropriate header text for each.
- Line 1099: The exit code logic currently returns 0 whenever any reply exists
(replies.isEmpty ? 70 : 0); change it to return 70 if replies is empty or if any
reply reports an error status. Locate the return using the replies variable and
update the condition to check replies.isEmpty || replies.any((r) => r['status']
== 'error') (or the codebase's Reply type/status accessor) so the command
mirrors WorkerPingCommand/WorkerInspectCommand/WorkerRevokeCommand behavior and
exits 70 when any worker reports an error.

In `@packages/stem_cli/test/unit/cli/cli_worker_stats_test.dart`:
- Around line 148-150: The test uses a fixed 180ms delay then checks
backend.get(taskId) for TaskState.queued which is flaky; replace the fixed sleep
with a deterministic wait: after issuing the pause command, either wait for an
explicit pause-ack/event from the worker (preferred) or poll backend.get(taskId)
in a short loop until you observe a stable queued state or a configurable
timeout, then assert pausedStatus?.state == TaskState.queued; ensure you
reference taskId, backend.get, and TaskState.queued and fail the test only after
the timeout to avoid timing-based flakes.
- Around line 129-143: The test allocates pauseErr and resumeErr StringBuffer
values but never asserts their contents, so stderr output from runStemCli calls
(for the 'worker pause' and 'worker resume' commands) can go unnoticed; update
the test around the runStemCli invocations to assert that pauseErr.toString()
and resumeErr.toString() are empty (or match expected stderr) after their
respective pauseCode and resumeCode checks, and keep existing assertions for
pauseOut/resumeOut and exit codes from runStemCli to ensure CLI errors are
detected.
- Around line 463-475: The _waitFor helper can exceed timeout because it calls
predicate before checking the deadline; change the loop to check the deadline
before each predicate invocation and before sleeping so the timeout is enforced
precisely. In practice, compute deadline as now+timeout (already done), then
loop: if DateTime.now().isAfter(deadline) throw TimeoutException(...); if (await
predicate()) return; await Future.delayed(pollInterval); This ensures _waitFor,
predicate, timeout and pollInterval semantics are preserved while preventing the
function from running past the specified timeout.
- Around line 111-175: The test 'pause/resume commands stop and restart queue
consumption' leaves the Worker and InMemoryBroker running if an assertion fails;
wrap the test's logic after creating/starting the Worker (after await
worker.start()) in a try/finally and move the cleanup calls into the finally
block so worker.shutdown() is always awaited and broker.dispose() always called;
reference the Worker instance named worker and the broker variable to ensure
those two resources are shut down in the finally block.

In
`@packages/stem_redis/test/integration/brokers/redis_broker_integration_test.dart`:
- Around line 90-106: The test's additionalBrokerFactory wraps
RedisStreamsBroker.connect in _NoCloseBroker which prevents closing the inner
connection and leaks Redis; remove the _NoCloseBroker wrapper and instead pass
an explicit additionalDispose that calls _safeCloseRedisBroker on the created
broker (i.e. keep using RedisStreamsBroker.connect in additionalBrokerFactory
but return the raw broker and supply additionalDispose: (b) =>
_safeCloseRedisBroker(b) so the underlying Redis connection is properly closed).

In
`@packages/stem_sqlite/lib/src/database/migrations/m_20260224103000_add_revoke_store.dart`:
- Around line 14-19: The issued_at and expires_at columns in the stem_revokes
migration use .timestamp() but should use timezone-aware .timestampTz() to match
audit columns and avoid serialization/comparison issues used by pruneExpired
(which compares expiresAt DateTimes); update the migration so the builder call
for issued_at uses timestampTz() and the expires_at column uses
timestampTz().nullable(), keeping other column definitions (terminate, reason,
requested_by) unchanged so timestamp handling is consistent with timestampsTz()
used elsewhere (task results, groups, worker heartbeats).

In `@packages/stem_sqlite/lib/src/models/stem_revoke_entry.dart`:
- Around line 34-35: The doc comment for the `terminate` field in
StemRevokeEntry is ambiguous about which integer values map to true/false;
update the comment on `final int terminate` to explicitly state the contract
(e.g., "0 = false, 1 = true; only 0/1 are valid" or "0 = false, any non-zero =
true") and, if the repository uses adapter methods (e.g., toDomain/fromDomain or
any conversion functions for RevokeEntry), add validation and explicit
conversion there so the domain model uses bool and the DB layer enforces or
normalizes the int to the chosen contract.

In `@packages/stem_sqlite/test/control/sqlite_revoke_store_test.dart`:
- Around line 28-31: Remove the redundant cast in the contract factory's dispose
closure: the parameter `store` already implements `RevokeStore` (which declares
`close()`), so update the `dispose` passed to `RevokeStoreContractFactory` to
call `store.close()` directly instead of `(store as SqliteRevokeStore).close()`;
locate the factory creation using `RevokeStoreContractFactory` and change the
`dispose` lambda accordingly.
- Around line 19-24: The tearDown block currently checks and deletes dbFile
explicitly before calling tempDir.delete(recursive: true), which is redundant
because tempDir.delete(recursive: true) already removes dbFile; remove the
dbFile.existsSync() and await dbFile.delete() lines from the tearDown so it
simply awaits tempDir.delete(recursive: true) (locate the tearDown function and
the dbFile/tempDir identifiers to apply the change).

In `@packages/stem/example/docs_snippets/lib/canvas_batch.dart`:
- Around line 32-36: The example calls
app.canvas.inspectBatch(submission.batchId) immediately after submitBatch which
will usually show a non-terminal state; update the snippet to either await a
short delay (e.g., sleep/pause) before calling inspectBatch or add an inline
comment clarifying this demonstrates the inspection API and may return
pending/running states—not a completed batch (refer to inspectBatch,
submitBatch, and submission.batchId in the snippet).

In `@packages/stem/lib/src/canvas/canvas.dart`:
- Around line 418-444: submitBatch currently always calls backend.initGroup and
group(...) which re-initializes metadata and re-publishes tasks for a
caller-provided batchId; change submitBatch to first check whether a group with
the computed id already exists in the backend (e.g., via a backend.getGroup /
backend.groupExists or similar method) and if it exists return an existing
BatchSubmission (fetch taskIds from the backend group record) without calling
backend.initGroup or group(...); only call backend.initGroup and publish tasks
with group(signatures, groupId: id) when the group does not already exist to
prevent re-initialization and duplicate task dispatching.
- Around line 685-727: The _buildBatchStatus function misclassifies batch state
because it treats all status.results.entries as completed and infers succeeded
via negative checks; fix it by first filtering entries to terminal states using
the existing TaskStateX.isTerminal (or TaskState.isTerminal) extension, compute
completed = terminalEntries.length, compute succeeded/failed/cancelled by
counting terminalEntries where entry.value.state ==
TaskState.succeeded/failed/cancelled, and then determine BatchLifecycleState in
positive terms (e.g., if completed == 0 -> pending, else if completed <
status.expected -> running, else if succeeded == expected -> succeeded, else if
failed == expected -> failed, else if cancelled == expected -> cancelled, else
-> partial) while returning BatchStatus as before.

In `@packages/stem/lib/src/core/contracts.dart`:
- Around line 1026-1036: _parseFailureMode currently only handles string inputs;
update it to accept RateLimiterFailureMode instances as well by checking if
value is a RateLimiterFailureMode and returning it directly, otherwise convert
strings via toString/trim/lowercase and compare against
RateLimiterFailureMode.values (keep the existing loop). Also fix the null check
order so you test for null before calling toString to avoid unnecessary toString
calls on null; reference the _parseFailureMode function and the
RateLimiterFailureMode enum.

In `@packages/stem/lib/src/core/queue_events.dart`:
- Around line 219-234: The _onDelivery handler can call _events.addError after
the stream is closed; add a top-level guard using a boolean flag (e.g., _closed)
to immediately return if the controller is closed, and also check that flag
before calling _events.add or _events.addError so neither is invoked when
closed; update the controller close path to set _closed = true; keep the
existing _eventFromEnvelope(...) logic and the broker.ack(...) finally block
unchanged.
- Around line 128-136: Validate the trimmed queue string inside the QueueEvents
constructor (not only in start()) and fail-fast when it's empty: in the
QueueEvents constructor (the initializer that sets queue = queue.trim()), check
if the resulting queue is empty and throw an ArgumentError (or use assert) with
a clear message so callers constructing QueueEvents(broker: ..., queue: '   ')
get immediate feedback; leave start() validation unchanged or remove its
redundant check after adding this constructor-level validation.

In `@packages/stem/lib/src/worker/worker.dart`:
- Around line 3729-3745: The _extractQueueTargets function currently calls
toString() on items from payload['queues'], which can produce unintended names
like "null" or "Instance of ..."; update the iteration to only accept String
entries (e.g., check value is String), trim and add non-empty strings, and skip
any non-string values so only explicit string queue identifiers are included;
keep the existing handling of payload['queue'] as a nullable String and the
final sort/return behavior.

In `@packages/stem/test/unit/canvas/canvas_test.dart`:
- Around line 126-142: Extract a generic polling helper (e.g., _pollUntil<T>)
and replace both _waitForBatchTerminal and _waitForSuccess to call it: the
helper should accept a polling closure that returns Future<T?>, a predicate (T?
-> bool) to decide terminal/success, a timeout Duration, and an interval
Duration (currently 50ms); implement the same loop/timeout/delay logic once
inside _pollUntil and have _waitForBatchTerminal call
_pollUntil(canvas.inspectBatch(batchId), predicate: (s) => s != null &&
s.isTerminal, timeout: ..., interval: ...), likewise adapt _waitForSuccess to
use _pollUntil with its own predicate so the duplicated polling code is removed.

In `@packages/stem/test/unit/worker/worker_test.dart`:
- Around line 1485-1566: Replace the fixed 180ms sleep with a short polling/wait
that first ensures workerB has attempted a poll cycle before checking the task
state: subscribe to workerB.events (workerB.events) and use _waitFor to wait for
an event that indicates the worker attempted consumption (e.g., an event with
WorkerEventType.poll/heartbeat or any WorkerEvent emitted after start), then
call backend.get(taskId) and assert the state is TaskState.queued; if your
codebase lacks a poll/heartbeat event, instead loop with _waitFor polling
backend.get until either the worker has emitted any event or a timeout, and only
then assert the queued state. Ensure you remove the fixed Future.delayed(const
Duration(milliseconds: 180)) and reference workerB.events, WorkerEventType,
backend.get, and _waitFor when making the change.
- Around line 1732-1751: The helper _waitForTaskState duplicates logic in
_waitForCallbackSuccess; replace the body of _waitForCallbackSuccess to delegate
to _waitForTaskState by calling _waitForTaskState(backend, taskId,
TaskState.succeeded, timeout: timeout) (or forwarding any named params) and
remove the duplicated polling loop; reference the functions _waitForTaskState
and _waitForCallbackSuccess to find and update the code accordingly so only the
generic polling function contains the loop and timeout handling.
- Around line 1428-1483: The test "group limiter fail-closed requeues while
limiter is unavailable" is timing-sensitive and may flake if retries are
exhausted before the fixed 220ms delay; update the TaskOptions for the
registered FunctionTaskHandler (name: 'tasks.group.failclosed') to include an
explicit maxRetries (e.g., maxRetries: 5) and replace the fixed Future.delayed
check with a polling helper that waits for TaskState.retried (use or add a
helper like _waitForTaskState(taskId, TaskState.retried, backend, timeout)) so
the assertion uses a bounded poll instead of a hard sleep and the test can
reliably observe the retried state without exhausting retries.
- Around line 1291-1378: Remove the redundant null-check assertion for the first
task result: delete the expect(await backend.get(firstId), isNotNull) line since
the subsequent expect((await backend.get(firstId))?.state, TaskState.succeeded)
already verifies the result and its non-null state; update the test 'shares
group limiter keys across task types' (references: Worker, Stem.enqueue,
backend.get, TaskState.succeeded) by removing that single redundant expect to
keep the test clearer.

---

Outside diff comments:
In @.site/docs/brokers/sqlite.md:
- Around line 74-88: The "Recommended layout" section omits the revoke store DB
file; update the docs to mention that the revoke store needs its own SQLite file
(e.g., stem_revoke.sqlite) and that it must follow the same WAL / single-writer
guidance; reference the new configureSqliteRevokeStore() snippet and add the
revoke file to the example list (./stem_revoke.sqlite   # revoke store only) so
readers know to keep broker, backend, and revoke stores in separate DB files.

In @.site/docs/core-concepts/canvas.md:
- Around line 86-90: Add the new batch example to the run instructions list by
including the missing command that runs canvas_batch.dart alongside the existing
entries (canvas_chain.dart, canvas_group.dart, canvas_chord.dart) so the
"Running the examples" block also shows: dart run lib/canvas_batch.dart; update
the snippet where those three run commands are listed to include this fourth
command.

In @.site/docs/workers/worker-control.md:
- Around line 216-233: Move the lifecycle-guards paragraph and its associated
code snippet (the block referencing worker-control-lifecycle) out of the "Queue
Pause/Resume" section and place it directly under the "Shutdown Modes and
Lifecycle Guards" heading so that the sentence "Lifecycle guards can also
recycle isolates automatically:" plus the ```dart
file=...#worker-control-lifecycle``` snippet and the following sentence about
recycling appear before the "## Queue Pause/Resume" heading; ensure the snippet
marker worker-control-lifecycle and the recycle explanation remain intact and
that "Queue Pause/Resume" begins immediately after that moved content.

In `@packages/stem_adapter_tests/README.md`:
- Around line 29-47: Add the missing revoke-store contract docs: insert a call
to runRevokeStoreContractTests (using RevokeStoreContractFactory(create:
createRevokeStore)) into the Quick Start snippet between
runQueueEventsContractTests and the workflow factory setup, and add a new
Capability Flags entry for RevokeStoreContractCapabilities with the
verifyPruneExpired flag (default true) describing its purpose ("Expiry pruning
tests — verifies expired revoke records are cleaned up"); also add a short
Adapter Recipe example showing how to construct RevokeStoreContractCapabilities
with verifyPruneExpired: false to disable those pruning tests.

In
`@packages/stem_redis/test/integration/brokers/redis_broker_integration_test.dart`:
- Line 22: The two test suites share the single variable contractNamespace
causing fragile shared state; introduce a dedicated variable (e.g.,
queueContractNamespace) for the queue-events suite and keep contractNamespace
for the broker suite, then update the additionalBrokerFactory closure passed to
runQueueEventsContractTests.create to read the new queueContractNamespace
instead of the shared contractNamespace so each suite writes/reads its own
namespace (refer to contractNamespace, runBrokerContractTests.create,
runQueueEventsContractTests.create, and the additionalBrokerFactory closures to
locate the changes).

In `@packages/stem/lib/src/signals/payloads.dart`:
- Around line 661-733: The occurredAt getters on ControlCommandReceivedPayload
and ControlCommandCompletedPayload are non-stable because they call
DateTime.now() on each access; fix by capturing the timestamp once during
construction and returning that stored value. Add a final DateTime field (e.g.,
_occurredAt or occurredAtUtc) to each class, set it to DateTime.now().toUtc() in
the constructor (or accept an optional DateTime parameter defaulting to
now().toUtc()), and change the occurredAt getter to return that field instead of
calling DateTime.now() again; update the const constructors and any call sites
if you add an optional parameter.

In `@packages/stem/test/unit/worker/worker_test.dart`:
- Line 486: Two tests share the exact name "emits worker lifecycle signals on
start and shutdown", causing ambiguous test reports; locate the duplicate test
declaration that uses that string (the test(...) call in worker_test.dart) and
rename one of them to a distinct description (e.g. "emits worker lifecycle
signals on start and stop" or "emits lifecycle signals on start and shutdown for
multiple workers") so each test has a unique, descriptive name; keep the test
body unchanged, only update the test string passed to test().

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a660d09 and 6321d54.

📒 Files selected for processing (62)
  • .site/docs/brokers/sqlite.md
  • .site/docs/comparisons/stem-vs-bullmq.md
  • .site/docs/core-concepts/canvas.md
  • .site/docs/core-concepts/cli-control.md
  • .site/docs/core-concepts/index.md
  • .site/docs/core-concepts/persistence.md
  • .site/docs/core-concepts/queue-events.md
  • .site/docs/core-concepts/rate-limiting.md
  • .site/docs/core-concepts/signals.md
  • .site/docs/getting-started/index.md
  • .site/docs/workers/worker-control.md
  • .site/sidebars.ts
  • packages/stem/example/docs_snippets/lib/canvas_batch.dart
  • packages/stem/example/docs_snippets/lib/cli_control.dart
  • packages/stem/example/docs_snippets/lib/persistence.dart
  • packages/stem/example/docs_snippets/lib/queue_events.dart
  • packages/stem/example/docs_snippets/lib/rate_limiting.dart
  • packages/stem/example/docs_snippets/lib/signals.dart
  • packages/stem/lib/src/canvas/canvas.dart
  • packages/stem/lib/src/core/contracts.dart
  • packages/stem/lib/src/core/queue_events.dart
  • packages/stem/lib/src/core/stem_event.dart
  • packages/stem/lib/src/signals/emitter.dart
  • packages/stem/lib/src/signals/payloads.dart
  • packages/stem/lib/src/signals/signal.dart
  • packages/stem/lib/src/signals/stem_signals.dart
  • packages/stem/lib/src/worker/worker.dart
  • packages/stem/lib/src/workflow/runtime/workflow_introspection.dart
  • packages/stem/lib/stem.dart
  • packages/stem/test/unit/canvas/canvas_test.dart
  • packages/stem/test/unit/core/contracts_test.dart
  • packages/stem/test/unit/core/queue_events_test.dart
  • packages/stem/test/unit/core/stem_event_test.dart
  • packages/stem/test/unit/signals/signal_test.dart
  • packages/stem/test/unit/signals/stem_signals_test.dart
  • packages/stem/test/unit/worker/worker_test.dart
  • packages/stem_adapter_tests/README.md
  • packages/stem_adapter_tests/lib/src/contract_capabilities.dart
  • packages/stem_adapter_tests/lib/src/queue_events_contract_suite.dart
  • packages/stem_adapter_tests/lib/src/revoke_store_contract_suite.dart
  • packages/stem_adapter_tests/lib/stem_adapter_tests.dart
  • packages/stem_adapter_tests/test/contract_suite_exports_test.dart
  • packages/stem_adapter_tests/test/queue_events_contract_suite_test.dart
  • packages/stem_adapter_tests/test/revoke_store_contract_suite_test.dart
  • packages/stem_cli/lib/src/cli/revoke_store_factory.dart
  • packages/stem_cli/lib/src/cli/worker.dart
  • packages/stem_cli/test/unit/cli/cli_worker_stats_test.dart
  • packages/stem_cli/test/unit/cli/revoke_store_factory_test.dart
  • packages/stem_postgres/test/integration/brokers/postgres_broker_integration_test.dart
  • packages/stem_redis/test/integration/brokers/redis_broker_integration_test.dart
  • packages/stem_sqlite/lib/orm_registry.g.dart
  • packages/stem_sqlite/lib/src/control/sqlite_revoke_store.dart
  • packages/stem_sqlite/lib/src/database/migrations.dart
  • packages/stem_sqlite/lib/src/database/migrations/m_20260224103000_add_revoke_store.dart
  • packages/stem_sqlite/lib/src/models/models.dart
  • packages/stem_sqlite/lib/src/models/stem_revoke_entry.dart
  • packages/stem_sqlite/lib/src/models/stem_revoke_entry.orm.dart
  • packages/stem_sqlite/lib/src/stack/sqlite_adapter.dart
  • packages/stem_sqlite/lib/src/workflow/sqlite_factories.dart
  • packages/stem_sqlite/lib/stem_sqlite.dart
  • packages/stem_sqlite/test/broker/sqlite_broker_test.dart
  • packages/stem_sqlite/test/control/sqlite_revoke_store_test.dart
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
🔇 Additional comments (82)
packages/stem/test/unit/worker/worker_test.dart (3)

1380-1426: LGTM!

Clean test of the fail-open behavior: the limiter always throws, and the test correctly verifies the task still succeeds. Resource cleanup is appropriate.


1753-1791: LGTM — clean control-command helper.

The finally block ensures the subscription is always cancelled, and the Completer.isCompleted guard prevents double-completion. Well structured.


1793-1821: LGTM — test doubles are minimal and fit for purpose.

_ScenarioRateLimiter correctly tracks per-key attempt counts and exposes the keys list for assertions. _FixedRetryStrategy is a clean constant-delay implementation.

.site/docs/core-concepts/rate-limiting.md (2)

12-14: Nice intro for group-scoped rate limits.
Clear and concise context-setting.


131-131: Good syntax parity callout.
This helps reduce ambiguity for readers.

packages/stem/lib/src/core/contracts.dart (6)

855-859: Constructor defaults for group limits look good.
Clear defaults (tenant, failOpen) are helpful.


879-894: Parsing logic is solid with safe fallback.
The default to failOpen is sensible.


918-929: Field docs are clear and aligned with behavior.
No issues here.


954-977: copyWith covers the new group-rate options.
Good parity with the constructor.


987-997: Serialization includes all new fields.
Good to persist the failure mode by name.


2062-2069: Enum addition is clean and scoped.
Good fit for the new failure mode behavior.

packages/stem/example/docs_snippets/lib/rate_limiting.dart (1)

73-93: GroupRateLimitedTask snippet looks solid.
It showcases the new options cleanly.

packages/stem/test/unit/core/contracts_test.dart (1)

127-157: Good coverage for new TaskOptions fields.
The enum mapping assertion is especially useful.

packages/stem/lib/src/core/stem_event.dart (1)

1-11: LGTM — clean minimal interface contract.

abstract interface class is the correct Dart 3 idiom for a pure interface, and the three members cover the canonical event shape (name, timestamp, attributes).

packages/stem/example/docs_snippets/lib/cli_control.dart (1)

38-44: LGTM — follows established snippet pattern.

Region markers, constant naming, and CLI syntax are consistent with existing entries. The --queue default argument correctly satisfies _WorkerQueueControlCommand's mandatory queue validation.

packages/stem/example/docs_snippets/lib/persistence.dart (1)

149-169: LGTM — cleanup order and wiring are correct.

Resource teardown follows the right dependency order, and the region tag matches the documentation reference.

packages/stem_sqlite/lib/src/database/migrations.dart (1)

41-47: LGTM — migration entry is correctly registered.

Timestamp DateTime.utc(2026, 2, 24, 10, 30) matches the filename prefix, the string ID is consistent, and chronological order in _entries is maintained.

.site/docs/getting-started/index.md (1)

21-21: LGTM!

Good placement — the comparison link is added after the onboarding steps as supplementary reference material, consistent with its sidebar placement under "Guides."

.site/sidebars.ts (1)

32-32: LGTM!

Both new sidebar entries are logically placed — the comparison page leads the Guides section, and queue-events follows signals in Core Concepts.

Also applies to: 56-56

.site/docs/core-concepts/cli-control.md (1)

23-24: LGTM!

The pause/resume additions are consistently reflected across the command listing, semantics description, snippet references, CLI examples, and the backend requirements table. The revoke-store scheme list update is also accurate.

Also applies to: 39-41, 92-98, 167-168, 238-238, 246-247

.site/docs/core-concepts/canvas.md (1)

34-45: Batches section looks good.

The API description is clear and the lifecycle states are comprehensive.

.site/docs/workers/worker-control.md (2)

29-30: LGTM!

Pause/resume rows in the CLI overview table are consistent with the existing format.


251-255: LGTM!

Good addition of a concrete SQLite revoke-store example for local/single-node deployments.

packages/stem/example/docs_snippets/lib/signals.dart (3)

53-76: LGTM!

The worker-scoped signal subscriptions clearly demonstrate the new filtering capabilities (taskName, workerId, commandType), and the payload access patterns are consistent with the StemEvent-based model.


78-89: LGTM!

Good defensive null check on context.event before accessing event properties.


47-49: LGTM!

The migration from .workerReady.connect(...).onWorkerReady(...) and .signalHookups.connect(...).onControlCommandCompleted(...) is consistent with the new convenience API surface introduced in this PR.

Also applies to: 101-103

packages/stem_adapter_tests/lib/src/contract_capabilities.dart (1)

84-105: LGTM!

Both new capability classes follow the existing pattern precisely—const constructors, named bool flags with sensible defaults, and proper doc comments. Clean additions.

packages/stem_adapter_tests/lib/src/revoke_store_contract_suite.dart (2)

1-49: Well-structured contract test scaffold.

Settings, factory, and lifecycle management follow the established patterns from other contract suites. The nullable store with null-check before dispose is clean.


51-112: Good coverage of upsert semantics and version conflict resolution.

The version-ordering test (line 79) and the stale-version-rejection test (lines 104–111) together verify the core idempotent upsert contract. Clean assertions.

packages/stem/lib/src/workflow/runtime/workflow_introspection.dart (1)

59-75: Clean StemEvent conformance.

The eventName derivation from type.name, delegation of occurredAt to timestamp, and conditional attribute inclusion are all idiomatic. The type and timestamp fields are properly surfaced through the StemEvent getters rather than duplicated in attributes.

packages/stem/test/unit/core/stem_event_test.dart (1)

4-45: Good StemEvent conformance tests.

Each test verifies the three StemEvent properties (eventName, occurredAt, attributes) for a distinct event type. The assertions are focused and sufficient for contract validation.

packages/stem/test/unit/canvas/canvas_test.dart (1)

83-104: Thorough batch lifecycle test.

Good coverage: verifies the submission response shape (batchId, taskIds), then polls for terminal state and asserts the full lifecycle summary (state, counts, absence of failures, metadata flag).

.site/docs/core-concepts/signals.md (1)

114-120: Good enumeration of worker-scoped helpers.

Listing all convenience helpers that support worker-scoped filtering provides a clear reference. This aligns well with the new tab example above.

packages/stem/lib/src/core/queue_events.dart (2)

82-122: Well-implemented producer with proper input validation.

The emit method correctly trims and validates both queue and eventName, constructs the envelope with the sentinel name and structured args, publishes via broadcast routing, and returns the stable envelope ID. Clean design.


237-269: Robust envelope parsing with sensible fallbacks.

The function gracefully handles missing queue (falls back to envelope.queue), missing emittedAt (falls back to envelope.enqueuedAt), and non-map payloads (defaults to empty map). The sentinel name check at line 238 properly filters non-event envelopes.

One note: the .cast<String, Object?>() on line 257 creates a lazy wrapper that defers type checking to access time. If an envelope from a misbehaving producer contains non-String keys, the error surfaces downstream rather than here. Since the sentinel name guards against third-party envelopes this is low-risk, but a Map.from with explicit casting would fail fast if preferred.

packages/stem_sqlite/lib/src/models/stem_revoke_entry.dart (1)

6-7: The composite primary key is correctly resolved through @OrmField(columnName:) mappings. The generated ORM file marks both namespace and taskId fields as primary keys and uses the correct column names in all operations: 'namespace' and 'task_id' in encoding, decoding, and DTO mappings.

.site/docs/core-concepts/index.md (1)

20-22: LGTM!

New feature highlights and navigation entry for Queue Events are consistent with the new API surface introduced in this PR.

packages/stem/example/docs_snippets/lib/queue_events.dart (2)

9-34: LGTM!

Clean producer/listener snippet with proper lifecycle management (start → subscribe → emit → cancel → close). Region markers align with the doc page references.


38-68: LGTM!

Fan-out snippet correctly demonstrates two listeners on the same queue, each receiving emitted events independently.

packages/stem/test/unit/core/queue_events_test.dart (2)

22-53: LGTM!

Good pattern: subscribing to the .first future before emitting avoids race conditions. Assertions cover all event fields comprehensively.


119-134: The emit test uses an incorrect matcher for async validation throws.

The expect(() => producer.emit('', 'evt'), throwsA(...)) pattern only catches synchronous throws. However, emit is declared async with validation that occurs inside the async body (after queue.trim()). Errors thrown inside an async function are captured in the returned Future and won't be caught by this matcher — you should use expectLater(producer.emit('', 'evt'), throwsA(...)) instead.

The listener.on() test is correct because on() is synchronous and throws directly before returning the Stream.

packages/stem_adapter_tests/lib/src/queue_events_contract_suite.dart (2)

50-68: LGTM!

Clean contract test harness with configurable settings, proper broker lifecycle via setUp/tearDown, and null-safe disposal.


190-241: LGTM!

Good design: the fan-out test supports an optional secondary broker for cross-broker verification, uses identical to avoid double-disposing the same instance, and gates on the verifyFanout capability flag.

packages/stem_sqlite/lib/src/models/stem_revoke_entry.orm.dart (1)

1-4: Generated ORM code — no manual modifications needed.

This is auto-generated code. The field definitions, composite primary key (namespace + taskId), codec, DTOs, and tracked model class are all consistent with the StemRevokeEntry model.

packages/stem_sqlite/lib/src/models/models.dart (1)

5-5: LGTM!

New export for stem_revoke_entry.dart is correctly placed in alphabetical order.

packages/stem_sqlite/lib/orm_registry.g.dart (1)

15-39: LGTM!

Generated registry correctly inserts StemRevokeEntry at index 4 and re-indexes all subsequent type aliases consistently in both buildOrmRegistry and registerGeneratedModels.

packages/stem_adapter_tests/test/contract_suite_exports_test.dart (1)

9-10: Export checks look good.

The new contract runners are validated consistently with existing exports.

packages/stem_sqlite/lib/stem_sqlite.dart (1)

5-17: Exports align with new revoke-store support.

Public surface now exposes the SQLite revoke store and factory as expected.

packages/stem_sqlite/lib/src/workflow/sqlite_factories.dart (1)

6-79: Factory wiring is consistent and safe.

The new revoke-store factory mirrors existing factories and closes the store cleanly.

packages/stem_cli/lib/src/cli/revoke_store_factory.dart (1)

7-49: SQLite scheme handling looks correct.

The new branch follows the same resolution style as Redis/Postgres.

packages/stem_sqlite/lib/src/stack/sqlite_adapter.dart (1)

84-87: Adapter now exposes revoke-store factory as expected.

This completes the SQLite adapter contract surface.

packages/stem_cli/test/unit/cli/revoke_store_factory_test.dart (1)

25-61: Good coverage for SQLite revoke-store resolution.

The tests validate both explicit and fallback paths with cleanup.

packages/stem_sqlite/test/broker/sqlite_broker_test.dart (1)

55-73: Queue-events contract coverage looks solid.

Factory and disposal mirror existing broker contract tests.

.site/docs/core-concepts/persistence.md (1)

77-89: Docs update matches new revoke-store backends.

The added snippets and wording align with the new SQLite/Postgres support.

packages/stem/lib/stem.dart (1)

95-98: LGTM — new public exports are correctly placed.

queue_events.dart and stem_event.dart are inserted in alphabetical order within the export list and match the new core modules introduced in this PR.

packages/stem_adapter_tests/lib/stem_adapter_tests.dart (1)

4-6: LGTM — contract suite exports are in the correct alphabetical position.

packages/stem_adapter_tests/test/revoke_store_contract_suite_test.dart (1)

1-12: LGTM — minimal, correct in-memory wiring for the revoke store contract suite.

packages/stem_postgres/test/integration/brokers/postgres_broker_integration_test.dart (1)

59-79: LGTM — additionalDispose correctly mirrors dispose, ensuring both Postgres broker connections are explicitly closed.

packages/stem/test/unit/signals/signal_test.dart (1)

88-146: LGTM — _TestEvent is a clean, minimal StemEvent test double; new test cases exercise payload identity and listener tracking correctly.

packages/stem_adapter_tests/test/queue_events_contract_suite_test.dart (1)

5-14: No action needed—state isolation is properly handled by design.

The shared namespace across create and additionalBrokerFactory is intentional and documented. InMemoryBroker uses reference counting to manage the broadcast hub lifecycle: each broker increments the ref count on creation and decrements on disposal. When the reference count reaches zero (after both brokers dispose), the broadcast hub is removed from the static registry, ensuring the next test gets a fresh instance. Queue state itself is instance-level (not static), so there is no cross-broker queue bleeding. The setUp/tearDown pattern in the contract suite ensures proper cleanup between tests.

Likely an incorrect or invalid review comment.

packages/stem/test/unit/signals/stem_signals_test.dart (3)

54-90: Worker/task scoping coverage looks solid.
The added workerId filter plus the extra failure emission validate that only the targeted worker/task is captured.


93-187: Worker-scoped lifecycle/control command tests are thorough.
Covers init/stop/shutdown plus command received/completed with worker filtering.


213-260: Canonical signal-name assertions are clear.
Validates eventName mapping for worker and workflow signals.

packages/stem/lib/src/worker/worker.dart (10)

589-638: Queue pause state is wired cleanly into worker startup.
The new pause tracking plus subscription refresh fits the control-plane design.


778-867: Pause short-circuit + group rate-limit flow is well structured.
Fail-open metrics/logging and retry behavior read cleanly.


2292-2338: Retry helper centralizes paused/rate-limited requeue behavior.
Nice consolidation of broker nack/publish + status metadata.


2460-2549: Subscription refresh/cleanup logic looks consistent.
Cancelling prior subscriptions before resubscribing to active queues is clean.


3143-3223: Queue pause sync/prune helpers look good.
The revoke-store integration and expiry pruning are cohesive.


3747-3815: Pause/resume command handling is cohesive.
Persists state, prunes expired entries, and refreshes subscriptions in one flow.


3934-3966: Control-plane queue pause/resume/status wiring looks good.
Switch cases integrate cleanly with the control command handler.


4048-4103: Paused queue visibility in stats/inspect is helpful.
Nice addition for operational introspection.


4299-4357: WorkerEvent StemEvent conformance is solid.
Attributes mapping is clear and consistent with the new event contract.


2276-2286: No guard needed — groupRateKeyHeader is non-nullable with a default value.

The field is declared as final String groupRateKeyHeader (not String?) and has a default value of 'tenant' in the TaskOptions constructor. Even when deserializing from JSON, null is coalesced to 'tenant'. Calling .trim() on this field is always safe.

Likely an incorrect or invalid review comment.

packages/stem/lib/src/signals/emitter.dart (3)

160-216: Worker lifecycle signal naming is now explicit.
Payloads now carry canonical signal names consistently.


232-255: Child lifecycle signal naming is correct.
Signal selection and payload wiring look clean.


340-399: Workflow signal naming via withSignalName is clean.
Keeps payloads consistent across workflow lifecycle emissions.

packages/stem_sqlite/lib/src/control/sqlite_revoke_store.dart (2)

9-37: Overall implementation aligns with the RevokeStore contract.
Factories plus transactional list/prune/upsert usage look solid.

Also applies to: 71-167


47-67: The code is valid as written. Dart 3.0+ does not require break statements in non-empty case clauses—each case body automatically exits the switch upon completion. The project targets Dart SDK >=3.9.2, which fully supports this behavior. No changes are needed.

Likely an incorrect or invalid review comment.

packages/stem/lib/src/signals/signal.dart (1)

3-188: StemEvent-typed signals and context propagation look good.
The new bounds and context event wiring align with the shared event contract.

packages/stem/lib/src/signals/payloads.dart (1)

43-659: StemEvent adoption across payloads is consistent.
Event naming and attribute mappings are clear for task/worker/workflow/schedule payloads.

packages/stem/lib/src/signals/stem_signals.dart (2)

35-372: Signal registry expansion looks consistent.
Workflow run signals are now part of the canonical list and configuration flow.


418-741: Merged filters and worker/command scoping are well integrated.
WorkerId and commandType filtering read cleanly across task and control signals.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/stem_redis/lib/src/brokers/redis_broker.dart`:
- Around line 574-576: The condition checking both queue == null and
streamKeys.isEmpty is redundant because streamKeys is derived from queue via
_priorityStreamKeys and that helper never returns an empty list; update the
guard in the method containing the variables queue and streamKeys to only check
if queue == null and return early (i.e., remove the streamKeys.isEmpty clause),
keeping the rest of the method logic unchanged.

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6321d54 and ecb4f22.

📒 Files selected for processing (2)
  • packages/stem_postgres/lib/src/brokers/postgres_broker.dart
  • packages/stem_redis/lib/src/brokers/redis_broker.dart
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: test
🔇 Additional comments (6)
packages/stem_redis/lib/src/brokers/redis_broker.dart (1)

488-502: LGTM — clean broadcast-only path.

The nullable queue derivation, validation, sentinel group, and empty stream keys are consistent. The guard at line 493 properly enforces that at least one of queue or broadcast channel is present.

packages/stem_postgres/lib/src/brokers/postgres_broker.dart (5)

231-242: LGTM — broadcast-only support is consistent with the Redis broker.

Nullable queue derivation, validation, and sentinel locker key follow the same pattern as the Redis broker. The __broadcast__ sentinel in the locker is inert in broadcast-only mode since _claimNextJob is never called.


248-261: LGTM — clean logging for broadcast-only cancel path.

The queueLabel fallback to <broadcast-only> provides clear observability in logs when no queue is set.


812-812: Correct use of queue! after null-guard — Dart field promotion caveat.

Since queue is a class field (final String? queue), Dart does not promote it even after a null check. The explicit queue! at line 850 inside the if (queue != null) block is the correct pattern here. The logic properly skips job claiming entirely in broadcast-only mode while still processing broadcast reservations below.

Also applies to: 848-854


821-838: LGTM — start/stop logging updated consistently.

Both start() and stop() use the same queueLabel fallback pattern for observability.


878-889: LGTM — error-path logging properly handles nullable queue.

The catch block correctly derives queueLabel for the error log context.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
packages/stem/lib/src/signals/payloads.dart (4)

630-664: ⚠️ Potential issue | 🟡 Minor

ScheduleEntryFailedPayload.occurredAt reports the scheduled time, not the failure time.

`@override`
DateTime get occurredAt => scheduledFor.toUtc(); // ← scheduled time, not when it failed

A schedule entry can fail well after scheduledFor (e.g., if execution is delayed). ScheduleEntryDispatchedPayload correctly uses executedAtScheduleEntryFailedPayload should do the same. Adding a failedAt (or executedAt) parameter is the minimal fix:

🐛 Proposed fix
 class ScheduleEntryFailedPayload implements StemEvent {
   const ScheduleEntryFailedPayload({
     required this.entry,
     required this.scheduledFor,
+    required this.failedAt,
     required this.error,
     required this.stackTrace,
   });

   final ScheduleEntry entry;
   final DateTime scheduledFor;
+  final DateTime failedAt;
   final Object error;
   final StackTrace stackTrace;

   `@override`
-  DateTime get occurredAt => scheduledFor.toUtc();
+  DateTime get occurredAt => failedAt.toUtc();

   `@override`
   Map<String, Object?> get attributes => {
     'entryId': entry.id,
     'scheduledFor': scheduledFor.toUtc().toIso8601String(),
+    'failedAt': failedAt.toUtc().toIso8601String(),
     'error': error.toString(),
     'stackTrace': stackTrace.toString(),
   };
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/stem/lib/src/signals/payloads.dart` around lines 630 - 664, The
occurredAt getter on ScheduleEntryFailedPayload currently returns scheduledFor
instead of the actual failure time; update ScheduleEntryFailedPayload by adding
a new DateTime field/constructor param (name it failedAt or executedAt to match
ScheduleEntryDispatchedPayload), set the occurredAt getter to return
failedAt.toUtc(), and include failedAt in the attributes map (e.g., 'failedAt':
failedAt.toUtc().toIso8601String()); ensure constructors and references use the
new symbol (failedAt/executedAt) and keep scheduledFor as-is for the originally
scheduled time.

233-284: ⚠️ Potential issue | 🟡 Minor

nextRetryAt is not UTC-normalized at construction, creating a field/attribute divergence.

emittedAt is stored as UTC ((emittedAt ?? DateTime.now()).toUtc()), but nextRetryAt is stored as passed. The attributes getter then calls nextRetryAt.toUtc() for serialization. If a local-timezone DateTime is passed, payload.nextRetryAt and DateTime.parse(payload.attributes['nextRetryAt']! as String) represent the same instant but in different timezone representations, which can surprise callers doing direct field comparisons or round-trip deserialisation.

🐛 Proposed fix
 TaskRetryPayload({
   required this.envelope,
   required this.worker,
   required this.reason,
-  required this.nextRetryAt,
+  required DateTime nextRetryAt,
   DateTime? emittedAt,
-}) : emittedAt = (emittedAt ?? DateTime.now()).toUtc();
+}) : nextRetryAt = nextRetryAt.toUtc(),
+     emittedAt = (emittedAt ?? DateTime.now()).toUtc();

With nextRetryAt already UTC, the explicit .toUtc() call in attributes (line 282) becomes a no-op and can be removed for consistency with emittedAt.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/stem/lib/src/signals/payloads.dart` around lines 233 - 284,
TaskRetryPayload stores emittedAt as UTC but leaves nextRetryAt as-is, causing
field vs serialized-attribute divergence; normalize nextRetryAt to UTC at
construction (in TaskRetryPayload constructor) similar to emittedAt, and then
update the attributes getter to use the now-UTC nextRetryAt (you can remove the
explicit .toUtc() there) so payload.nextRetryAt and attributes['nextRetryAt']
are consistent.

509-566: 🧹 Nitpick | 🔵 Trivial

eventName fallback maps running"workflow-run-running", diverging from the canonical "workflow-run-started".

String get eventName => signalName ?? 'workflow-run-${status.name}';
// WorkflowRunStatus.running.name == 'running'
// → 'workflow-run-running' ≠ StemSignals.workflowRunStartedName ('workflow-run-started')

The emitter always passes a signalName, so this fallback is unreachable in normal flow. However, any code that constructs a WorkflowRunPayload(status: WorkflowRunStatus.running) directly — e.g., in tests or custom event sources — will observe eventName == 'workflow-run-running', which won't match the canonical constant. Consider either renaming the enum member to started, or hardcoding the fallback to the canonical strings:

♻️ Option A — explicit fallback map
+  static const _statusEventName = {
+    WorkflowRunStatus.running: 'workflow-run-started',
+    WorkflowRunStatus.suspended: 'workflow-run-suspended',
+    WorkflowRunStatus.completed: 'workflow-run-completed',
+    WorkflowRunStatus.failed: 'workflow-run-failed',
+    WorkflowRunStatus.cancelled: 'workflow-run-cancelled',
+  };
+
   `@override`
-  String get eventName => signalName ?? 'workflow-run-${status.name}';
+  String get eventName => signalName ?? _statusEventName[status]!;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/stem/lib/src/signals/payloads.dart` around lines 509 - 566, The
eventName getter on WorkflowRunPayload produces non-canonical names like
"workflow-run-running" when signalName is null because it uses status.name;
update WorkflowRunPayload.eventName to map WorkflowRunStatus values to the
canonical signal strings (e.g., map WorkflowRunStatus.running ->
StemSignals.workflowRunStartedName) instead of using status.name, so any direct
constructions of WorkflowRunPayload produce the correct canonical eventName;
keep withSignalName unchanged so explicit signalName still overrides the
fallback.

667-695: ⚠️ Potential issue | 🟠 Major

occurredAt is a drifting clock, not the event timestamp — fix by storing at construction time.

Both ControlCommandReceivedPayload (line 684) and ControlCommandCompletedPayload (line 727) implement occurredAt as:

DateTime get occurredAt => DateTime.now().toUtc();

Because these are const constructors, no stable timestamp is captured. Every access to occurredAt returns a fresh wall-clock value, so a consumer that reads the field twice (log then serialize, dedup comparison, replay) gets different timestamps — breaking the StemEvent contract. Follow the same pattern as WorkerLifecyclePayload:

🐛 Proposed fix for both payloads
-class ControlCommandReceivedPayload implements StemEvent {
-  const ControlCommandReceivedPayload({
+class ControlCommandReceivedPayload implements StemEvent {
+  ControlCommandReceivedPayload({
     required this.worker,
     required this.command,
+    DateTime? occurredAt,
-  });
+  }) : _occurredAt = (occurredAt ?? DateTime.now()).toUtc();

   final WorkerInfo worker;
   final ControlCommandMessage command;
+  final DateTime _occurredAt;

   `@override`
   String get eventName => 'control-command-received';

   `@override`
-  DateTime get occurredAt => DateTime.now().toUtc();
+  DateTime get occurredAt => _occurredAt;
   // ... attributes unchanged
 }

Apply the identical change to ControlCommandCompletedPayload.

Also applies to: 698-738

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/stem/lib/src/signals/payloads.dart` around lines 667 - 695,
ControlCommandReceivedPayload (and likewise ControlCommandCompletedPayload)
currently implements occurredAt as a getter that returns DateTime.now().toUtc(),
producing a drifting clock; change this to a stored final DateTime field set at
construction time. Replace the getter override with a final DateTime occurredAt
field, update the constructor (remove const or accept an optional DateTime
occurredAt parameter) and initialize it like: this.occurredAt = occurredAt ??
DateTime.now().toUtc(); so the timestamp is captured once at construction; apply
the identical change to ControlCommandCompletedPayload and remove the dynamic
getter.
packages/stem/test/unit/signals/payloads_test.dart (1)

53-67: 🧹 Nitpick | 🔵 Trivial

LGTM – assertions are correct; consider extending attribute coverage.

The new emittedAt/occurredAt assertions are accurate against the implementation. Optionally, covering retry.eventName ('task-retry') and retry.attributes['emittedAt'] (the other new attribute key added in this PR) would complete the round-trip validation for the StemEvent surface.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/stem/test/unit/signals/payloads_test.dart` around lines 53 - 67, Add
assertions to fully validate the StemEvent surface for TaskRetryPayload: verify
retry.eventName equals 'task-retry' and assert retry.attributes['emittedAt']
equals DateTime.utc(2024).toIso8601String(); locate the TaskRetryPayload test
where the existing retry instance is created (symbol: TaskRetryPayload and
variable retry) and add these two expectations alongside the existing assertions
for taskId/attempt/occurredAt/nextRetryAt.
♻️ Duplicate comments (2)
packages/stem_cli/test/unit/cli/cli_worker_stats_test.dart (2)

132-177: ⚠️ Potential issue | 🟡 Minor

Assert pause/resume stderr buffers are empty.

pauseErr and resumeErr are never checked, so unexpected CLI errors can slip through.

✅ Proposed fix
         expect(pauseCode, equals(0));
+        expect(pauseErr.toString().trim(), isEmpty,
+            reason: 'pause command produced unexpected stderr output');

         final stem = Stem(broker: broker, registry: registry, backend: backend);
         final taskId = await stem.enqueue('tasks.blocking');
         await _assertTaskRemainsQueued(backend, taskId);
         expect(started.isCompleted, isFalse);

@@
         expect(resumeCode, equals(0));
+        expect(resumeErr.toString().trim(), isEmpty,
+            reason: 'resume command produced unexpected stderr output');
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/stem_cli/test/unit/cli/cli_worker_stats_test.dart` around lines 132
- 177, Add assertions to verify the CLI produced no stderr output by checking
the pauseErr and resumeErr buffers after the runStemCli calls: after obtaining
pauseCode and resumeCode, assert pauseErr.toString().isEmpty (or equals('')) and
resumeErr.toString().isEmpty to ensure no unexpected CLI errors; reference the
pauseErr and resumeErr StringBuffer variables and the corresponding runStemCli
invocations used to pause/resume the worker.

478-490: 🧹 Nitpick | 🔵 Trivial

Tighten timeout semantics in _waitFor.

The current loop checks the deadline after evaluating the predicate, which can overshoot the timeout by one predicate duration and poll interval. Consider checking the deadline before each predicate call.

♻️ Suggested adjustment
   final deadline = DateTime.now().add(timeout);
-  while (!await predicate()) {
-    if (DateTime.now().isAfter(deadline)) {
-      throw TimeoutException('Condition not met within $timeout');
-    }
-    await Future<void>.delayed(pollInterval);
-  }
+  while (true) {
+    if (DateTime.now().isAfter(deadline)) {
+      throw TimeoutException('Condition not met within $timeout');
+    }
+    if (await predicate()) return;
+    await Future<void>.delayed(pollInterval);
+  }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/stem_cli/test/unit/cli/cli_worker_stats_test.dart` around lines 478
- 490, The loop in _waitFor can overshoot timeout because it evaluates predicate
before checking deadline; modify _waitFor to check the deadline before calling
await predicate() (and throw TimeoutException if passed), and also check the
deadline immediately after the delayed pollInterval before the next iteration to
avoid an extra sleep/predicate call past the timeout; keep the existing timeout,
pollInterval, and exception message, and update the loop in the _waitFor
function accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/stem_redis/lib/src/brokers/redis_broker.dart`:
- Around line 498-502: The code currently discards a caller-supplied
consumerGroup in broadcast-only mode; fix by threading consumerGroup into
_listenBroadcast and using it there instead of always deriving via
_broadcastGroupKey. Update loop() to compute group (retain consumerGroup if
non-null) and pass it into _listenBroadcast (add a consumerGroup parameter to
_listenBroadcast), have _listenBroadcast use group = consumerGroup ??
_broadcastGroupKey(channel, consumer) for XREADGROUP, and ensure any
claim/scheduler paths that expected a group use this passed-in group; remove the
unused '__broadcast__' literal or keep it only as a default inside
_listenBroadcast.

In `@packages/stem/lib/src/canvas/canvas.dart`:
- Around line 475-480: inspectBatch currently converts any group returned by
backend.getGroup into a BatchStatus; change it to first detect whether the
returned group represents a batch created by submitBatch (e.g. check a
type/role/metadata field on the returned group object from getGroup such as a
"type" or specific batch marker) and if that marker is missing return null (or
throw) instead of calling _buildBatchStatus; place this guard in inspectBatch
immediately after retrieving status and before calling _buildBatchStatus so only
genuine batch groups are converted.
- Around line 414-473: submitBatch currently races: two callers can both see
backend.getGroup null and both proceed to initGroup and dispatch; make the
operation idempotent by ensuring dispatch only occurs if initGroup actually
created the group. Change submitBatch to attempt backend.initGroup and then
check its result (or catch a “already exists” error) — using backend.getGroup
after initGroup if needed — and if the group exists and has meta['stem.batch']
== true return BatchSubmission using _batchTaskIdsFromGroup instead of
dispatching; only call group(...) and dispatch when initGroup confirms the group
was newly created. Ensure you handle initGroup failure that signals pre-existing
group and reuse GroupDescriptor/meta to build the return BatchSubmission.

In `@packages/stem/lib/src/signals/payloads.dart`:
- Around line 119-135: Several execution-phase payloads (TaskReceivedPayload,
TaskPrerunPayload, TaskPostrunPayload, TaskSuccessPayload, TaskFailurePayload,
TaskRevokedPayload) incorrectly set occurredAt to envelope.enqueuedAt; change
each payload to accept an optional DateTime? occurredAt parameter (defaulting to
DateTime.now().toUtc()), make the constructors non-const, store the value on the
instance, and have the occurredAt getter return that stored timestamp instead of
envelope.enqueuedAt.toUtc(); follow the same pattern used by
TaskRetryPayload/emittedAt for consistency so consumers get the actual runtime
event time for ordering/latency/audit.

---

Outside diff comments:
In `@packages/stem/lib/src/signals/payloads.dart`:
- Around line 630-664: The occurredAt getter on ScheduleEntryFailedPayload
currently returns scheduledFor instead of the actual failure time; update
ScheduleEntryFailedPayload by adding a new DateTime field/constructor param
(name it failedAt or executedAt to match ScheduleEntryDispatchedPayload), set
the occurredAt getter to return failedAt.toUtc(), and include failedAt in the
attributes map (e.g., 'failedAt': failedAt.toUtc().toIso8601String()); ensure
constructors and references use the new symbol (failedAt/executedAt) and keep
scheduledFor as-is for the originally scheduled time.
- Around line 233-284: TaskRetryPayload stores emittedAt as UTC but leaves
nextRetryAt as-is, causing field vs serialized-attribute divergence; normalize
nextRetryAt to UTC at construction (in TaskRetryPayload constructor) similar to
emittedAt, and then update the attributes getter to use the now-UTC nextRetryAt
(you can remove the explicit .toUtc() there) so payload.nextRetryAt and
attributes['nextRetryAt'] are consistent.
- Around line 509-566: The eventName getter on WorkflowRunPayload produces
non-canonical names like "workflow-run-running" when signalName is null because
it uses status.name; update WorkflowRunPayload.eventName to map
WorkflowRunStatus values to the canonical signal strings (e.g., map
WorkflowRunStatus.running -> StemSignals.workflowRunStartedName) instead of
using status.name, so any direct constructions of WorkflowRunPayload produce the
correct canonical eventName; keep withSignalName unchanged so explicit
signalName still overrides the fallback.
- Around line 667-695: ControlCommandReceivedPayload (and likewise
ControlCommandCompletedPayload) currently implements occurredAt as a getter that
returns DateTime.now().toUtc(), producing a drifting clock; change this to a
stored final DateTime field set at construction time. Replace the getter
override with a final DateTime occurredAt field, update the constructor (remove
const or accept an optional DateTime occurredAt parameter) and initialize it
like: this.occurredAt = occurredAt ?? DateTime.now().toUtc(); so the timestamp
is captured once at construction; apply the identical change to
ControlCommandCompletedPayload and remove the dynamic getter.

In `@packages/stem/test/unit/signals/payloads_test.dart`:
- Around line 53-67: Add assertions to fully validate the StemEvent surface for
TaskRetryPayload: verify retry.eventName equals 'task-retry' and assert
retry.attributes['emittedAt'] equals DateTime.utc(2024).toIso8601String();
locate the TaskRetryPayload test where the existing retry instance is created
(symbol: TaskRetryPayload and variable retry) and add these two expectations
alongside the existing assertions for taskId/attempt/occurredAt/nextRetryAt.

---

Duplicate comments:
In `@packages/stem_cli/test/unit/cli/cli_worker_stats_test.dart`:
- Around line 132-177: Add assertions to verify the CLI produced no stderr
output by checking the pauseErr and resumeErr buffers after the runStemCli
calls: after obtaining pauseCode and resumeCode, assert
pauseErr.toString().isEmpty (or equals('')) and resumeErr.toString().isEmpty to
ensure no unexpected CLI errors; reference the pauseErr and resumeErr
StringBuffer variables and the corresponding runStemCli invocations used to
pause/resume the worker.
- Around line 478-490: The loop in _waitFor can overshoot timeout because it
evaluates predicate before checking deadline; modify _waitFor to check the
deadline before calling await predicate() (and throw TimeoutException if
passed), and also check the deadline immediately after the delayed pollInterval
before the next iteration to avoid an extra sleep/predicate call past the
timeout; keep the existing timeout, pollInterval, and exception message, and
update the loop in the _waitFor function accordingly.

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ecb4f22 and 5559a24.

📒 Files selected for processing (7)
  • packages/stem/lib/src/canvas/canvas.dart
  • packages/stem/lib/src/signals/emitter.dart
  • packages/stem/lib/src/signals/payloads.dart
  • packages/stem/test/unit/canvas/canvas_test.dart
  • packages/stem/test/unit/signals/payloads_test.dart
  • packages/stem_cli/test/unit/cli/cli_worker_stats_test.dart
  • packages/stem_redis/lib/src/brokers/redis_broker.dart
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
🔇 Additional comments (13)
packages/stem_cli/test/unit/cli/cli_worker_stats_test.dart (3)

406-412: Async predicate wait looks good.

The async predicate usage keeps the event wait consistent with the new _waitFor contract.


492-509: Helper reads well and improves pause assertions.

The queued-state guard is clear and keeps the pause/resume test intent explicit.


511-526: _FastTask looks fine.

Straightforward no-op task handler for test scenarios.

packages/stem_redis/lib/src/brokers/redis_broker.dart (1)

574-576: LGTM — correct early return for broadcast-only mode

The queue == null guard is the right single-condition check here and cleanly handles the addressed past review note about the redundant streamKeys.isEmpty test.

packages/stem/lib/src/canvas/canvas.dart (3)

154-245: Batch lifecycle models look solid.
Clear enum states and status snapshot fields make the new batch API easy to reason about.


714-758: Batch status classification is robust.
Filtering to terminal states and using positive equality checks makes the lifecycle computation stable.


760-789: Task ID extraction is defensive and order-preserving.
Deduping and meta-first ordering aligns with the batch immutability contract.

packages/stem/test/unit/canvas/canvas_test.dart (2)

82-154: Batch lifecycle tests provide good coverage.
Covers terminal status, idempotency, and non-terminal completion counting.


176-192: Polling helper is consistent with existing test patterns.
Timeout + interval behavior matches the suite’s conventions.

packages/stem/lib/src/signals/emitter.dart (4)

104-115: LGTM – pre-emit timestamp capture is the correct pattern.

Capturing emittedAt before the emit call ensures a stable, deterministic occurredAt on the payload. The constructor's redundant .toUtc() on an already-UTC value is a no-op.


163-220: LGTM – signalName injection aligns eventName with the concrete signal for all worker lifecycle payloads.


241-257: LGTM – signal name selection and emission branch are consistent.


343-403: LGTM – withSignalName correctly threads the canonical name into a timestamp-preserving copy before emit.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
packages/dashboard/lib/src/services/models.dart (1)

1-1: ⚠️ Potential issue | 🔴 Critical

Fix missing import for stemNow (compile error).

stemNow() is referenced but not included in the show list, so this won’t compile. Add it to the import.

🛠️ Proposed fix
-import 'package:stem/stem.dart' show QueueHeartbeat, WorkerHeartbeat;
+import 'package:stem/stem.dart' show QueueHeartbeat, WorkerHeartbeat, stemNow;

Also applies to: 158-158

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/dashboard/lib/src/services/models.dart` at line 1, Import statement
currently shows QueueHeartbeat and WorkerHeartbeat but omits stemNow, causing a
compile error where stemNow() is referenced; update the import that uses "show"
to include stemNow alongside QueueHeartbeat and WorkerHeartbeat (and mirror this
change at the other occurrence around line 158) so stemNow is available to the
code.
packages/stem/lib/src/brokers/in_memory_broker.dart (1)

342-347: 🧹 Nitpick | 🔵 Trivial

Misleading indentation on the cascade in purgeDeadLetters.

In Dart, .. has lower precedence than ?:, so ..forEach(state.deadLetters.remove) is applied to the whole ternary — not only to candidates in the else branch. The code is functionally correct, but the 6-space indent below candidates makes it look like the cascade only affects the else branch, which could lead to a future reader incorrectly concluding that the if-branch doesn't purge entries.

♻️ Clarify intent with explicit parentheses and consistent indentation
-    final toRemove =
-        limit != null && limit >= 0
-              ? candidates.take(limit).toList()
-              : candidates
-          ..forEach(state.deadLetters.remove);
+    final toRemove = (limit != null && limit >= 0
+        ? candidates.take(limit).toList()
+        : candidates)
+      ..forEach(state.deadLetters.remove);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/stem/lib/src/brokers/in_memory_broker.dart` around lines 342 - 347,
The ternary with a cascade in purgeDeadLetters is misleading because the
..forEach(state.deadLetters.remove) applies to the whole expression; make the
intent explicit by wrapping the else-side expression in parentheses (e.g.,
(candidates)..forEach(...)) or by applying the cascade to the selected value
after the ternary, and fix indentation so it's clear the removal runs for both
branches; reference the variables limit, candidates and the call to
state.deadLetters.remove to locate and adjust the code.
packages/stem/lib/src/signals/payloads.dart (1)

667-739: ⚠️ Potential issue | 🟡 Minor

Freeze control-command timestamps at creation.

occurredAt currently calls stemNow() in the getter, so the timestamp changes on every read. Store the timestamp when the payload is created.

♻️ Proposed fix
-class ControlCommandReceivedPayload implements StemEvent {
-  const ControlCommandReceivedPayload({
-    required this.worker,
-    required this.command,
-  });
+class ControlCommandReceivedPayload implements StemEvent {
+  ControlCommandReceivedPayload({
+    required this.worker,
+    required this.command,
+    DateTime? occurredAt,
+  }) : _occurredAt = (occurredAt ?? stemNow()).toUtc();
@@
-  DateTime get occurredAt => stemNow().toUtc();
+  final DateTime _occurredAt;
+  DateTime get occurredAt => _occurredAt;
 }
 
-class ControlCommandCompletedPayload implements StemEvent {
-  const ControlCommandCompletedPayload({
-    required this.worker,
-    required this.command,
-    required this.status,
-    this.response,
-    this.error,
-  });
+class ControlCommandCompletedPayload implements StemEvent {
+  ControlCommandCompletedPayload({
+    required this.worker,
+    required this.command,
+    required this.status,
+    this.response,
+    this.error,
+    DateTime? occurredAt,
+  }) : _occurredAt = (occurredAt ?? stemNow()).toUtc();
@@
-  DateTime get occurredAt => stemNow().toUtc();
+  final DateTime _occurredAt;
+  DateTime get occurredAt => _occurredAt;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/stem/lib/src/signals/payloads.dart` around lines 667 - 739, The
occurredAt getters in ControlCommandReceivedPayload and
ControlCommandCompletedPayload call stemNow() on each read and must be frozen at
creation; add a final DateTime field (e.g., final DateTime occurredAt) to each
class, initialize it in the constructor (provide a default of stemNow().toUtc()
if none supplied), and change the overridden occurredAt getter to return that
stored field instead of calling stemNow() directly; update constructors for
ControlCommandReceivedPayload and ControlCommandCompletedPayload to
accept/assign the timestamp.
♻️ Duplicate comments (10)
packages/stem/lib/src/canvas/canvas.dart (2)

476-480: ⚠️ Potential issue | 🟡 Minor

Guard inspectBatch against non-batch groups.
Line 479 should also verify the stem.batch marker to avoid misclassifying generic groups as batches.

🔧 Suggested fix
 Future<BatchStatus?> inspectBatch(String batchId) async {
   final status = await backend.getGroup(batchId);
-  if (status == null) return null;
+  if (status == null || status.meta['stem.batch'] != true) return null;
   return _buildBatchStatus(status);
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/stem/lib/src/canvas/canvas.dart` around lines 476 - 480, The
inspectBatch method currently treats any group returned from
backend.getGroup(batchId) as a batch; update inspectBatch to verify the group
has the stem.batch marker before building a BatchStatus: after retrieving status
in inspectBatch, check the group's metadata/markers for the "stem.batch" marker
and return null if it's absent, and only call _buildBatchStatus(status) when
that marker is present (use the same status object returned from
backend.getGroup for the check).

415-474: ⚠️ Potential issue | 🟠 Major

Make submitBatch idempotent under concurrent callers.
Line 428’s pre-check is not atomic; two callers with the same batchId can both pass and dispatch duplicate tasks. Gate dispatch on an atomic initGroup (unique constraint) or handle an “already exists” result and return the existing batch instead of publishing.

#!/bin/bash
# Inspect ResultBackend.initGroup contract and implementations for atomicity/uniqueness.
rg -n "abstract class ResultBackend" --type dart -A 60
rg -n "initGroup\\(" --type dart -A 12 -B 4
rg -n "GroupDescriptor" --type dart -A 8
rg -n "already exists|unique|conflict" --type dart -i -C 2
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/stem/lib/src/canvas/canvas.dart` around lines 415 - 474, submitBatch
is not atomic: the pre-check using backend.getGroup can race so two callers with
the same batchId both dispatch tasks; change submitBatch to attempt a single
atomic creation or handle initGroup conflicts by calling backend.initGroup and
treating "already exists" as success — i.e., remove reliance on the separate
pre-check or make it best-effort, call backend.initGroup(GroupDescriptor ...)
first (or catch the specific error/return value it emits when the group already
exists), and if initGroup reports the group exists then fetch the existing group
(backend.getGroup) and return its taskIds instead of dispatching; ensure the
code references the same id, GroupDescriptor, normalizedSignatures, and uses
group(..., groupId: id) only after successful initGroup to guarantee
idempotence.
packages/stem/lib/src/core/contracts.dart (1)

1028-1037: ⚠️ Potential issue | 🟡 Minor

Handle RateLimiterFailureMode instances in _parseFailureMode.
If a map already contains the enum (e.g., groupRateLimiterFailureMode: RateLimiterFailureMode.failClosed), toString() yields RateLimiterFailureMode.failClosed, so parsing falls back to failOpen and silently changes behavior. Accept the enum directly to avoid this downgrade.

🛠️ Suggested fix
 static RateLimiterFailureMode? _parseFailureMode(Object? value) {
+  if (value is RateLimiterFailureMode) return value;
   final raw = value?.toString().trim().toLowerCase();
   if (raw == null || raw.isEmpty) return null;
   for (final mode in RateLimiterFailureMode.values) {
     if (mode.name.toLowerCase() == raw) {
       return mode;
     }
   }
   return null;
 }

Verification (find map literals that might pass enums directly into TaskOptions.fromJson):

#!/bin/bash
# Search for uses of groupRateLimiterFailureMode in map literals or fromJson inputs.
rg -n --type=dart -C3 'groupRateLimiterFailureMode' .
rg -n --type=dart -C3 'TaskOptions\.fromJson' .
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/stem/lib/src/core/contracts.dart` around lines 1028 - 1037,
_parsFailureMode currently only parses strings and falls back incorrectly when
given a RateLimiterFailureMode instance (or a toString like
"RateLimiterFailureMode.failClosed"). Update _parseFailureMode to first check if
value is a RateLimiterFailureMode and return it immediately; otherwise convert
value to string, strip any enum qualifier (e.g., take substring after the last
'.' so "RateLimiterFailureMode.failClosed" becomes "failclosed"), trim/lowercase
and match against RateLimiterFailureMode.values by comparing mode.name; return
null if no match. This touches the _parseFailureMode function and references
RateLimiterFailureMode.
packages/stem_cli/lib/src/cli/worker.dart (2)

1099-1099: ⚠️ Potential issue | 🟡 Minor

Exit code should reflect error replies.

Returning 0 whenever any reply exists hides worker failures and diverges from other worker commands.

🐛 Suggested fix
-      return replies.isEmpty ? 70 : 0;
+      final hasError = replies.any((reply) => reply.status != 'ok');
+      return (replies.isEmpty || hasError) ? 70 : 0;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/stem_cli/lib/src/cli/worker.dart` at line 1099, The return
expression currently returns 70 when replies.isEmpty and 0 otherwise, which
inverts the intended exit semantics; change the return to return 0 when there
are no replies and a non‑zero error code when replies exist (flip the ternary to
return replies.isEmpty ? 0 : 70) so the process exit code reflects error
replies; locate the return that references "replies" (the return line currently
"return replies.isEmpty ? 70 : 0;") and replace it with the corrected ternary.

1081-1094: ⚠️ Potential issue | 🟡 Minor

Clarify the “Paused” column for resume output.

For resume operations, paused represents queues still paused after the command, not the queues resumed. Rename the header to avoid operator confusion.

💡 Suggested tweak
-        dependencies.out.writeln('Worker        | Status | Updated | Paused');
-        dependencies.out.writeln(
-          '--------------+--------+---------+----------------',
-        );
+        dependencies.out.writeln(
+          'Worker        | Status | Updated | Paused (after)',
+        );
+        dependencies.out.writeln(
+          '--------------+--------+---------+------------------',
+        );
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/stem_cli/lib/src/cli/worker.dart` around lines 1081 - 1094, The
header "Paused" is misleading because the `paused` variable in the resume output
reflects queues still paused after the command; update the column header string
in the dependencies.out.writeln call (the line currently printing 'Worker       
| Status | Updated | Paused') to a clearer label such as 'Still Paused' or
'Paused (after)' and adjust the separator line printed below it if needed so
column alignment remains correct; ensure the rest of the row output that uses
`${paused}` and the `paused` variable logic remains unchanged.
packages/stem/lib/src/core/queue_events.dart (2)

129-137: ⚠️ Potential issue | 🟡 Minor

Queue emptiness still validated lazily in start() instead of the constructor.

♻️ Fail fast at construction time
  QueueEvents({
    required this.broker,
    required String queue,
    String? consumerName,
    this.prefetch = 10,
- }) : queue = queue.trim(),
+ }) : queue = queue.trim() {
+   if (this.queue.isEmpty) {
+     throw ArgumentError.value(queue, 'queue', 'Queue name must not be empty');
+   }
+  }
+  // ignore: prefer_initializing_formals
+  QueueEvents._init({
+    required this.broker,
+    required this.queue,
     consumerName =
         consumerName ??
-        'stem-queue-events-${generateEnvelopeId().replaceAll('-', '')}';
+        'stem-queue-events-${generateEnvelopeId().replaceAll('-', '')}',
+   required this.prefetch,
  });

(Alternatively, use an initializer list with an assert or a redirecting constructor.)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/stem/lib/src/core/queue_events.dart` around lines 129 - 137, The
constructor for QueueEvents currently defers validating the provided queue
string until start(); change it to fail fast by validating in the constructor
(QueueEvents) itself — e.g., use an initializer-check or assert/throw in the
constructor’s initializer list for the trimmed `queue` parameter (or a
redirecting constructor) so empty/invalid queue names are rejected immediately
instead of in `start()`. Reference: QueueEvents constructor, the `queue` field
and `start()` method.

203-205: ⚠️ Potential issue | 🟡 Minor

_events.addError called without a _closed guard in both _onDelivery and the onError stream handler.

_events.addError throws StateError when the controller is already closed. Both the onError callback registered in start() (line 204) and the catch block in _onDelivery (line 227) can reach _events.addError after close() has been called.

🛡️ Proposed fix
  Future<void> _onDelivery(Delivery delivery) async {
+   if (_closed) {
+     await broker.ack(delivery);
+     return;
+   }
    try {
      final event = _eventFromEnvelope(delivery.envelope);
      if (event != null && event.queue == queue) {
-       _events.add(event);
+       if (!_closed) _events.add(event);
      }
    } on Object catch (error, stackTrace) {
-     _events.addError(error, stackTrace);
+     if (!_closed) _events.addError(error, stackTrace);
    } finally {

And guard the stream onError handler in start():

  .listen(
    _onDelivery,
    onError: (Object error, StackTrace stackTrace) {
-     _events.addError(error, stackTrace);
+     if (!_closed) _events.addError(error, stackTrace);
    },
  );

Also applies to: 220-234

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/stem/lib/src/core/queue_events.dart` around lines 203 - 205, Both
the onError handler in start() and the catch block inside _onDelivery call
_events.addError without checking whether the controller has been closed, which
can throw StateError; update both sites (the start() onError closure and the
catch in _onDelivery) to guard the call by checking a closed flag (e.g., if
(!_closed) or if (!_events.isClosed) before calling _events.addError(error,
stackTrace), and ensure the same guard is used consistently where close() may
have been called.
packages/stem_redis/lib/src/brokers/redis_broker.dart (1)

497-501: ⚠️ Potential issue | 🟡 Minor

consumerGroup is still silently discarded in broadcast-only mode.

When queue == null, group is set to consumerGroup ?? '__broadcast__', but this value is never used: loop() returns immediately (line 573), and _listenBroadcast derives its own group via _broadcastGroupKey(channel, consumer) without reading the outer group. A caller passing an explicit consumerGroup in broadcast-only mode has it silently dropped.

♻️ Options to resolve

Option A — fail fast (minimal):

 if (queue == null && broadcastChannels.isEmpty) {
   throw ArgumentError(
     'RedisStreamsBroker requires at least one queue or broadcast channel.',
   );
 }
+if (queue == null && consumerGroup != null) {
+  throw ArgumentError(
+    'consumerGroup is not supported in broadcast-only mode.',
+  );
+}

Option B — thread consumerGroup into _listenBroadcast so callers can share a consumer group across multiple broadcast subscriptions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/stem_redis/lib/src/brokers/redis_broker.dart` around lines 497 -
501, The code currently ignores an explicit consumerGroup when queue == null
because loop() returns early and _listenBroadcast computes its own group via
_broadcastGroupKey; thread the consumerGroup through instead: change the call
site in loop() to pass the resolved group into _listenBroadcast (using the
existing variable group when queue == null) and add an optional consumerGroup
parameter to _listenBroadcast so it uses the passed consumerGroup (if non-null)
rather than always calling _broadcastGroupKey(channel, consumer); update any
helper call sites and documentation/comments accordingly so broadcast-only
subscriptions honor an explicitly provided consumerGroup.
packages/stem/lib/src/worker/worker.dart (1)

3728-3741: ⚠️ Potential issue | 🟡 Minor

Validate queue targets are strings before adding.

toString() on null/objects can create unintended queue names (“null”, “Instance of …”). Skip non-string entries instead.

♻️ Proposed fix
-    if (rawQueues is List) {
-      for (final value in rawQueues) {
-        final queueName = value.toString().trim();
-        if (queueName.isNotEmpty) {
-          queues.add(queueName);
-        }
-      }
-    }
+    if (rawQueues is List) {
+      for (final value in rawQueues) {
+        if (value is! String) continue;
+        final queueName = value.trim();
+        if (queueName.isNotEmpty) {
+          queues.add(queueName);
+        }
+      }
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/stem/lib/src/worker/worker.dart` around lines 3728 - 3741, The
_extractQueueTargets function currently calls toString() on rawQueues elements
which can yield unintended names like "null" or "Instance of ..."; change the
loop over rawQueues to only accept values that are actually String (non-null) —
e.g., check `if (value is String)` then trim and add to the queues set,
otherwise skip the entry; similarly ensure the single `queue` extracted from
payload is validated as a String (already using as String? but keep the explicit
type check) so only genuine string queue names are added.
packages/stem/lib/src/signals/payloads.dart (1)

106-136: ⚠️ Potential issue | 🟡 Minor

occurredAt should reflect execution time, not enqueue time.

Execution-phase payloads still use envelope.enqueuedAt, which misstates when the event occurred. Capture a timestamp at payload creation (default stemNow().toUtc()), and return that for occurredAt. Apply the same pattern to TaskPrerunPayload, TaskPostrunPayload, TaskSuccessPayload, TaskFailurePayload, and TaskRevokedPayload.

♻️ Proposed fix pattern (example for TaskReceivedPayload)
-class TaskReceivedPayload implements StemEvent {
-  const TaskReceivedPayload({required this.envelope, required this.worker});
+class TaskReceivedPayload implements StemEvent {
+  TaskReceivedPayload({
+    required this.envelope,
+    required this.worker,
+    DateTime? occurredAt,
+  }) : _occurredAt = (occurredAt ?? stemNow()).toUtc();
@@
-  DateTime get occurredAt => envelope.enqueuedAt.toUtc();
+  final DateTime _occurredAt;
+  DateTime get occurredAt => _occurredAt;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/stem/lib/src/signals/payloads.dart` around lines 106 - 136, The
payloads for execution-phase events (TaskReceivedPayload, TaskPrerunPayload,
TaskPostrunPayload, TaskSuccessPayload, TaskFailurePayload, TaskRevokedPayload)
currently return envelope.enqueuedAt from their occurredAt getters; change them
to capture a timestamp at construction (e.g., add a final DateTime occurredAtUtc
field defaulting to stemNow().toUtc()) and have the occurredAt getter return
that field instead of envelope.enqueuedAt so the event reflects actual execution
time; ensure constructors accept an optional DateTime or initialize the private
field and update attributes where occurredAt is used.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/dashboard/lib/src/services/stem_service.dart`:
- Around line 195-196: The loop currently calls stemNow() twice per iteration
causing a TOCTOU gap; in the loop that uses deadline and remaining (the while
guard and the remaining = deadline.difference(...) calculation), call stemNow()
once at the top of the loop (e.g., assign final now = stemNow()) and use now for
both the while condition and computing remaining so remaining cannot become
negative between calls; update references in the same function/method where
stemNow(), deadline, and remaining are used to rely on the single captured now
value.

In `@packages/stem_postgres/lib/src/control/postgres_revoke_store.dart`:
- Line 130: The ternary expression setting "version" uses identical branches and
should be simplified: replace the expression "existing != null ? entry.version :
entry.version" with just "entry.version" wherever it appears (referencing the
variables existing, entry, and the version field) to remove the redundant
conditional.

In `@packages/stem/lib/src/core/clock.dart`:
- Around line 13-19: SystemStemClock.now() currently returns a local-kind
DateTime which causes inconsistent kinds between production and tests
(FakeStemClock uses UTC); change SystemStemClock (class SystemStemClock, method
now()) to normalize to UTC at the source by returning DateTime.now().toUtc() and
update the doc comment to state it always returns a UTC DateTime so callers (and
components like Envelope.enqueuedAt and FakeStem) can rely on UTC-kind
timestamps.

In `@packages/stem/lib/src/core/queue_events.dart`:
- Around line 53-60: The attributes map in the overridden getter Map<String,
Object?> get attributes is missing the event name, so update the attributes
returned by QueueCustomEvent (the attributes getter) to include the event
discriminator by adding a 'name' entry (i.e., include 'name': name or the
appropriate eventName property) alongside id, queue, payload, headers, and meta
so consumers that introspect attributes can see the event name.

In `@packages/stem/lib/src/scheduler/in_memory_lock_store.dart`:
- Around line 63-67: In renew(), capture stemNow() once into a local variable
(e.g., now) and use that single timestamp for both the isExpired check and for
computing the new expiration; specifically, replace the two stemNow() calls in
the renew logic so you call stemNow() once, use now for lock.isExpired(now) and
set lock.expiresAt = now.add(ttl), and keep the existing _locks.remove(key) and
return false behavior when expired.

In `@packages/stem/lib/src/workflow/core/workflow_clock.dart`:
- Around line 5-6: Remove the now-stale lint suppression by deleting the "//
ignore: one_member_abstracts" comment above the WorkflowClock declaration;
WorkflowClock already extends StemClock and does not declare its own abstract
members, so simply remove the ignore annotation from the WorkflowClock class
definition (referencing the WorkflowClock class and its extension of StemClock).

In `@packages/stem/lib/stem.dart`:
- Line 90: The public barrel currently re-exports test utility FakeStemClock via
export 'src/core/clock.dart' alongside production symbols (StemClock, stemNow,
withStemClock); remove FakeStemClock from the production barrel by either
(preferred) moving the FakeStemClock symbol into a testing-only export (create
lib/testing.dart that exports FakeStemClock from src/core/clock.dart and other
test utilities) and adjust export 'src/core/clock.dart' to only expose
production symbols, or (alternative) change the existing export to a show/hide
list that omits FakeStemClock so it is only re-exported from the testing barrel;
update any imports in tests to import package:stem/testing.dart.

---

Outside diff comments:
In `@packages/dashboard/lib/src/services/models.dart`:
- Line 1: Import statement currently shows QueueHeartbeat and WorkerHeartbeat
but omits stemNow, causing a compile error where stemNow() is referenced; update
the import that uses "show" to include stemNow alongside QueueHeartbeat and
WorkerHeartbeat (and mirror this change at the other occurrence around line 158)
so stemNow is available to the code.

In `@packages/stem/lib/src/brokers/in_memory_broker.dart`:
- Around line 342-347: The ternary with a cascade in purgeDeadLetters is
misleading because the ..forEach(state.deadLetters.remove) applies to the whole
expression; make the intent explicit by wrapping the else-side expression in
parentheses (e.g., (candidates)..forEach(...)) or by applying the cascade to the
selected value after the ternary, and fix indentation so it's clear the removal
runs for both branches; reference the variables limit, candidates and the call
to state.deadLetters.remove to locate and adjust the code.

In `@packages/stem/lib/src/signals/payloads.dart`:
- Around line 667-739: The occurredAt getters in ControlCommandReceivedPayload
and ControlCommandCompletedPayload call stemNow() on each read and must be
frozen at creation; add a final DateTime field (e.g., final DateTime occurredAt)
to each class, initialize it in the constructor (provide a default of
stemNow().toUtc() if none supplied), and change the overridden occurredAt getter
to return that stored field instead of calling stemNow() directly; update
constructors for ControlCommandReceivedPayload and
ControlCommandCompletedPayload to accept/assign the timestamp.

---

Duplicate comments:
In `@packages/stem_cli/lib/src/cli/worker.dart`:
- Line 1099: The return expression currently returns 70 when replies.isEmpty and
0 otherwise, which inverts the intended exit semantics; change the return to
return 0 when there are no replies and a non‑zero error code when replies exist
(flip the ternary to return replies.isEmpty ? 0 : 70) so the process exit code
reflects error replies; locate the return that references "replies" (the return
line currently "return replies.isEmpty ? 70 : 0;") and replace it with the
corrected ternary.
- Around line 1081-1094: The header "Paused" is misleading because the `paused`
variable in the resume output reflects queues still paused after the command;
update the column header string in the dependencies.out.writeln call (the line
currently printing 'Worker        | Status | Updated | Paused') to a clearer
label such as 'Still Paused' or 'Paused (after)' and adjust the separator line
printed below it if needed so column alignment remains correct; ensure the rest
of the row output that uses `${paused}` and the `paused` variable logic remains
unchanged.

In `@packages/stem_redis/lib/src/brokers/redis_broker.dart`:
- Around line 497-501: The code currently ignores an explicit consumerGroup when
queue == null because loop() returns early and _listenBroadcast computes its own
group via _broadcastGroupKey; thread the consumerGroup through instead: change
the call site in loop() to pass the resolved group into _listenBroadcast (using
the existing variable group when queue == null) and add an optional
consumerGroup parameter to _listenBroadcast so it uses the passed consumerGroup
(if non-null) rather than always calling _broadcastGroupKey(channel, consumer);
update any helper call sites and documentation/comments accordingly so
broadcast-only subscriptions honor an explicitly provided consumerGroup.

In `@packages/stem/lib/src/canvas/canvas.dart`:
- Around line 476-480: The inspectBatch method currently treats any group
returned from backend.getGroup(batchId) as a batch; update inspectBatch to
verify the group has the stem.batch marker before building a BatchStatus: after
retrieving status in inspectBatch, check the group's metadata/markers for the
"stem.batch" marker and return null if it's absent, and only call
_buildBatchStatus(status) when that marker is present (use the same status
object returned from backend.getGroup for the check).
- Around line 415-474: submitBatch is not atomic: the pre-check using
backend.getGroup can race so two callers with the same batchId both dispatch
tasks; change submitBatch to attempt a single atomic creation or handle
initGroup conflicts by calling backend.initGroup and treating "already exists"
as success — i.e., remove reliance on the separate pre-check or make it
best-effort, call backend.initGroup(GroupDescriptor ...) first (or catch the
specific error/return value it emits when the group already exists), and if
initGroup reports the group exists then fetch the existing group
(backend.getGroup) and return its taskIds instead of dispatching; ensure the
code references the same id, GroupDescriptor, normalizedSignatures, and uses
group(..., groupId: id) only after successful initGroup to guarantee
idempotence.

In `@packages/stem/lib/src/core/contracts.dart`:
- Around line 1028-1037: _parsFailureMode currently only parses strings and
falls back incorrectly when given a RateLimiterFailureMode instance (or a
toString like "RateLimiterFailureMode.failClosed"). Update _parseFailureMode to
first check if value is a RateLimiterFailureMode and return it immediately;
otherwise convert value to string, strip any enum qualifier (e.g., take
substring after the last '.' so "RateLimiterFailureMode.failClosed" becomes
"failclosed"), trim/lowercase and match against RateLimiterFailureMode.values by
comparing mode.name; return null if no match. This touches the _parseFailureMode
function and references RateLimiterFailureMode.

In `@packages/stem/lib/src/core/queue_events.dart`:
- Around line 129-137: The constructor for QueueEvents currently defers
validating the provided queue string until start(); change it to fail fast by
validating in the constructor (QueueEvents) itself — e.g., use an
initializer-check or assert/throw in the constructor’s initializer list for the
trimmed `queue` parameter (or a redirecting constructor) so empty/invalid queue
names are rejected immediately instead of in `start()`. Reference: QueueEvents
constructor, the `queue` field and `start()` method.
- Around line 203-205: Both the onError handler in start() and the catch block
inside _onDelivery call _events.addError without checking whether the controller
has been closed, which can throw StateError; update both sites (the start()
onError closure and the catch in _onDelivery) to guard the call by checking a
closed flag (e.g., if (!_closed) or if (!_events.isClosed) before calling
_events.addError(error, stackTrace), and ensure the same guard is used
consistently where close() may have been called.

In `@packages/stem/lib/src/signals/payloads.dart`:
- Around line 106-136: The payloads for execution-phase events
(TaskReceivedPayload, TaskPrerunPayload, TaskPostrunPayload, TaskSuccessPayload,
TaskFailurePayload, TaskRevokedPayload) currently return envelope.enqueuedAt
from their occurredAt getters; change them to capture a timestamp at
construction (e.g., add a final DateTime occurredAtUtc field defaulting to
stemNow().toUtc()) and have the occurredAt getter return that field instead of
envelope.enqueuedAt so the event reflects actual execution time; ensure
constructors accept an optional DateTime or initialize the private field and
update attributes where occurredAt is used.

In `@packages/stem/lib/src/worker/worker.dart`:
- Around line 3728-3741: The _extractQueueTargets function currently calls
toString() on rawQueues elements which can yield unintended names like "null" or
"Instance of ..."; change the loop over rawQueues to only accept values that are
actually String (non-null) — e.g., check `if (value is String)` then trim and
add to the queues set, otherwise skip the entry; similarly ensure the single
`queue` extracted from payload is validated as a String (already using as
String? but keep the explicit type check) so only genuine string queue names are
added.

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5559a24 and 8e8dc15.

📒 Files selected for processing (43)
  • packages/dashboard/lib/src/services/models.dart
  • packages/dashboard/lib/src/services/stem_service.dart
  • packages/dashboard/lib/src/state/dashboard_state.dart
  • packages/dashboard/lib/src/ui/content.dart
  • packages/stem/lib/src/backend/in_memory_backend.dart
  • packages/stem/lib/src/bootstrap/workflow_app.dart
  • packages/stem/lib/src/brokers/in_memory_broker.dart
  • packages/stem/lib/src/canvas/canvas.dart
  • packages/stem/lib/src/control/revoke_store.dart
  • packages/stem/lib/src/core/clock.dart
  • packages/stem/lib/src/core/contracts.dart
  • packages/stem/lib/src/core/envelope.dart
  • packages/stem/lib/src/core/queue_events.dart
  • packages/stem/lib/src/core/stem.dart
  • packages/stem/lib/src/observability/metrics.dart
  • packages/stem/lib/src/scheduler/beat.dart
  • packages/stem/lib/src/scheduler/in_memory_lock_store.dart
  • packages/stem/lib/src/scheduler/in_memory_schedule_store.dart
  • packages/stem/lib/src/signals/emitter.dart
  • packages/stem/lib/src/signals/payloads.dart
  • packages/stem/lib/src/signals/signal.dart
  • packages/stem/lib/src/testing/fake_stem.dart
  • packages/stem/lib/src/worker/worker.dart
  • packages/stem/lib/src/workflow/core/run_state.dart
  • packages/stem/lib/src/workflow/core/workflow_clock.dart
  • packages/stem/lib/src/workflow/core/workflow_watcher.dart
  • packages/stem/lib/stem.dart
  • packages/stem/test/unit/core/clock_test.dart
  • packages/stem/test/unit/workflow/workflow_clock_test.dart
  • packages/stem_cli/lib/src/cli/dlq.dart
  • packages/stem_cli/lib/src/cli/observer.dart
  • packages/stem_cli/lib/src/cli/schedule.dart
  • packages/stem_cli/lib/src/cli/worker.dart
  • packages/stem_postgres/lib/src/backend/postgres_backend.dart
  • packages/stem_postgres/lib/src/brokers/postgres_broker.dart
  • packages/stem_postgres/lib/src/control/postgres_revoke_store.dart
  • packages/stem_postgres/lib/src/scheduler/postgres_lock_store.dart
  • packages/stem_postgres/lib/src/scheduler/postgres_schedule_store.dart
  • packages/stem_redis/lib/src/backend/redis_backend.dart
  • packages/stem_redis/lib/src/brokers/redis_broker.dart
  • packages/stem_redis/lib/src/scheduler/redis_schedule_store.dart
  • packages/stem_sqlite/lib/src/backend/sqlite_result_backend.dart
  • packages/stem_sqlite/lib/src/broker/sqlite_broker.dart
📜 Review details
🔇 Additional comments (79)
packages/stem_postgres/lib/src/scheduler/postgres_schedule_store.dart (2)

125-127: Good standardization on Stem clock for writes.

This aligns upsert timestamps with the shared clock abstraction and improves determinism.


251-256: Consistent updatedAt clock source.

Using stemNow() here keeps execution metadata aligned with the shared time source.

packages/dashboard/lib/src/ui/content.dart (2)

9-9: LGTM — centralized clock import.

Nice, this keeps the UI aligned with the shared clock abstraction.


581-583: LGTM — relative time now uses stemNow.

Consistent time source for relative timestamps; no issues.

packages/stem/lib/src/canvas/canvas.dart (4)

11-11: Clock import aligns with stemNow usage.
Keeps created-at timestamps consistent with the new clock abstraction.


155-246: Batch lifecycle contracts look solid.
Enums and immutable snapshots are clear and cohesive for public API usage.


715-759: Terminal-state filtering in _buildBatchStatus is correct.
Good use of isTerminal and positive equality checks to avoid misclassification.


761-790: _batchTaskIdsFromGroup dedupe/trim logic is robust.
Handles meta-vs-results fallback safely and returns an immutable list.

packages/dashboard/lib/src/services/stem_service.dart (2)

160-160: LGTM — explicit UTC normalization is preserved.


160-196: stemNow() is correctly exported and returns DateTime.

The function is defined in packages/stem/lib/src/core/clock.dart and re-exported via package:stem/stem.dart, which is already imported in the file. The return type DateTime is correct for all method calls (.toUtc(), .isBefore(), .add(), .difference()).

packages/stem/lib/src/core/contracts.dart (3)

851-901: Group rate-limit options are wired through cleanly.

Also applies to: 919-930, 950-998


41-41: Clock abstraction applied to enqueue delays.
Using stemNow() keeps scheduling consistent with the shared clock.

Also applies to: 2004-2007


2063-2070: RateLimiterFailureMode enum looks good.

packages/stem_cli/lib/src/cli/worker.dart (8)

23-24: Pause/resume subcommands are registered correctly.

Keeps the new controls discoverable alongside existing worker actions.


677-677: Revoke timestamps now use the shared stem clock.

Good alignment with the new clock abstraction and UTC normalization.


932-1080: Shared queue-control base class is cleanly structured.

Centralized parsing/payload/reply collection reduces duplication and keeps outputs consistent.


1106-1126: Pause/resume commands correctly specialize the base control.

Clear command naming, descriptions, and default reasons.


1452-1452: Heartbeat age now uses the shared clock.

Consistent with the new clock abstraction.


1885-1889: Consistent timestamp generation for multi-node operations.

Using the stem clock across start/stop/status keeps template timestamps aligned.

Also applies to: 1927-1932, 1970-1975


2097-2097: Uptime calculations now rely on the stem clock.

Keeps healthcheck and diagnostics aligned with the shared time source.

Also applies to: 2423-2424


2678-2681: Wait loop deadline aligned with the stem clock.

Consistent time source for timeout handling.

packages/stem_postgres/lib/src/control/postgres_revoke_store.dart (1)

131-131: LGTM — stemNow() migration is correct.

Consistent with the centralized clock migration across the codebase.

packages/stem_cli/lib/src/cli/schedule.dart (1)

62-62: LGTM — stemNow() replacements across the schedule commands are correct and consistent.

All three call sites (lines 62, 377, 706) are mechanical clock-source swaps with no logic changes.

packages/stem/lib/src/control/revoke_store.dart (1)

3-3: LGTM — clock migration preserves monotonic version semantics.

stemNow().toUtc().microsecondsSinceEpoch correctly replaces the previous DateTime.now().toUtc().microsecondsSinceEpoch.

Also applies to: 124-124

packages/stem/lib/src/workflow/core/workflow_watcher.dart (1)

1-2: LGTM — fallback timestamp now uses the centralized clock.

The stemNow().toUtc() fallback in fromJson correctly replaces DateTime.now().toUtc() while preserving the same behavior when createdAt is missing or unparseable.

Also applies to: 21-21

packages/stem/lib/src/scheduler/in_memory_lock_store.dart (1)

12-12: LGTM — stemNow() migration in acquire and ownerOf is clean.

Both methods correctly use the centralized clock for lock acquisition and expiration checks.

Also applies to: 27-27, 46-46

packages/stem_postgres/lib/src/backend/postgres_backend.dart (1)

112-112: LGTM — comprehensive stemNow() migration across the Postgres backend.

All 11 call sites consistently replace DateTime.now() with the centralized clock, with no logic changes. Each method captures the timestamp once where needed.

Also applies to: 171-171, 191-191, 243-243, 302-302, 365-365, 375-375, 404-404, 417-417, 460-460, 470-470

packages/stem/lib/src/signals/emitter.dart (3)

6-6: LGTM — emittedAt timestamp for taskRetry uses the centralized clock.

Clean integration: stemNow().toUtc() is captured once and passed into the payload.

Also applies to: 105-112


166-170: LGTM — worker lifecycle payloads now carry canonical signal names.

Each lifecycle emission consistently maps to its corresponding StemSignals.*Name constant. The workerChildLifecycle method correctly derives the signal name from the initializing flag.

Also applies to: 182-186, 198-202, 214-218, 242-248


349-349: LGTM — workflow run emissions use withSignalName builder pattern consistently.

All six workflow lifecycle methods correctly attach their corresponding signal name via the builder.

Also applies to: 360-360, 371-371, 382-382, 390-390, 401-401

packages/stem_postgres/lib/src/brokers/postgres_broker.dart (4)

328-328: LGTM — stemNow().toUtc() migration across broker operations is consistent.

All time-sensitive paths (nack, dead-letter, lease extension, pending/inflight counts, sweeper, and consumer loop) correctly use the centralized clock.

Also applies to: 359-359, 406-406, 417-417, 439-439, 526-526, 590-590, 710-710, 729-729, 866-866


231-242: LGTM — nullable queue with broadcast-only consumer support is well-guarded.

The logic correctly:

  • Derives a nullable queue when subscription.queues is empty (line 231–233).
  • Rejects if neither queue nor broadcast channels are provided (line 237–241).
  • Uses a '__broadcast__' sentinel for the locker identity in broadcast-only mode (line 242).

848-853: LGTM — job claiming is correctly gated on non-null queue.

The if (queue != null) guard ensures _claimNextJob is only called when a queue subscription exists, while broadcast delivery proceeds unconditionally below.


812-812: The required-nullable queue field is intentional and correctly designed.

The queue: queue parameter passed at instantiation (line 267) can legitimately be null to represent broadcast-only consumers, as evidenced by the null-coalescing checks throughout the code (queue ?? '__broadcast__', queue ?? '<broadcast-only>'). The required modifier enforces explicit caller intent rather than allowing accidental omission, making this the appropriate API contract.

packages/stem/lib/src/core/clock.dart (1)

42-53: LGTM — zone-based clock scoping is correctly implemented.

Zone values propagate through async continuations, so withStemClock composes correctly with async bodies. Nested calls shadow the outer clock in the expected direction. The is StemClock guard in stemNow() and the fallback to _systemClock are both safe.

packages/stem/lib/src/testing/fake_stem.dart (1)

78-78: LGTM — stemNow() correctly replaces DateTime.now() in both enqueue paths.

Both enqueueCall and enqueue are updated consistently, so a withStemClock scope in tests will control the recorded enqueuedAt timestamp in both paths.

Also applies to: 104-104

packages/stem_cli/lib/src/cli/observer.dart (1)

393-393: LGTM — stemNow() correctly anchors schedule drift calculations to the active clock.

The stemNow() export concern (public barrel) is tracked in the redis_backend.dart comment above.

packages/stem_postgres/lib/src/scheduler/postgres_lock_store.dart (1)

73-73: LGTM — all three time sites correctly substitute stemNow().toUtc().

Also applies to: 119-119, 167-167

packages/stem/lib/src/observability/metrics.dart (1)

35-35: LGTM — all three usages correctly substitute stemNow().toUtc() and the field initializer at line 289 is evaluated at construction time as intended.

Also applies to: 289-289, 294-294

packages/stem/lib/src/scheduler/in_memory_schedule_store.dart (1)

61-61: LGTM.

packages/stem/lib/src/core/envelope.dart (1)

194-194: LGTM — preserves existing local-time semantics; Envelope.fromJson handles the timezone offset correctly via DateTime.parse.

packages/stem_redis/lib/src/backend/redis_backend.dart (1)

142-142: Verification confirms: stemNow() is properly re-exported from package:stem/stem.dart.

The public barrel at line 90 exports the entire src/core/clock.dart module, making stemNow() available to all packages that import package:stem/stem.dart. All three cross-package consumers (packages/stem_redis, packages/stem_postgres, packages/stem_cli) correctly rely on this public re-export without requiring direct imports of clock.dart. Compilation will succeed.

packages/stem/lib/src/bootstrap/workflow_app.dart (1)

150-160: Consistent timeout tracking via Stem clock.

This keeps workflow completion polling aligned with the centralized clock abstraction.

packages/stem/lib/src/workflow/core/run_state.dart (1)

41-41: Aligned createdAt fallback with centralized clock.

This keeps run state hydration consistent with the Stem time source.

packages/stem_cli/lib/src/cli/dlq.dart (1)

337-337: Replay metadata now uses Stem clock.

Keeps replay annotations consistent with other time-based metadata.

packages/stem_redis/lib/src/scheduler/redis_schedule_store.dart (1)

221-221: Schedule upsert now aligned with Stem clock.

Good consistency for deterministic scheduling behavior.

packages/stem/lib/src/backend/in_memory_backend.dart (9)

60-76: In-memory task timestamps now use Stem clock.

Keeps result TTLs and update timestamps consistent with shared time.


88-90: Expiry checks now use Stem clock.

Consistent TTL evaluation for task status retrieval.


160-165: Group initialization TTLs now use Stem clock.

Aligns group expiry with centralized time source.


185-190: Group expiry checks now use Stem clock.

Ensures consistent expiration behavior across stores.


201-205: Expire() now uses Stem clock.

Consistent TTL extension for task entries.


233-238: Heartbeat TTLs now use Stem clock.

Matches the centralized time source for worker health tracking.


244-249: Heartbeat expiry check now uses Stem clock.

Keeps heartbeat validity aligned with shared time.


332-339: Heartbeat pruning now uses Stem clock.

Consistent cleanup behavior for worker heartbeats.


342-349: Task status pruning now uses Stem clock.

Unified time source for expiry eviction.

packages/stem_sqlite/lib/src/backend/sqlite_result_backend.dart (11)

174-176: SQLite set() now uses Stem clock.

Consistent TTL computation with centralized time.


204-210: SQLite get() now filters by Stem clock.

Keeps expiry checks aligned with shared time source.


240-251: List pagination now uses Stem clock for expiry filtering.

Consistent filtering across backends.


299-315: Worker heartbeat TTLs now use Stem clock.

Matches centralized time semantics for health tracking.


325-331: Heartbeat lookup now uses Stem clock.

Consistent TTL evaluation for heartbeat reads.


337-342: Heartbeat listing now uses Stem clock.

Aligns expiry filtering with centralized time.


349-359: Group init TTLs now use Stem clock.

Ensures group expiry aligns with shared time.


397-403: Group retrieval now uses Stem clock.

Consistent group expiry checks.


436-439: Expire() now uses Stem clock.

Keeps TTL extension consistent with shared time.


489-495: Cleanup cycle now uses Stem clock.

Unified time source for pruning expired rows.


524-530: Group existence checks now use Stem clock.

Consistent TTL semantics with other queries.

packages/stem/lib/src/core/stem.dart (3)

304-311: Unique task expiry now uses Stem clock.

Keeps uniqueness TTL aligned with the centralized time source.


466-467: Countdown scheduling now uses Stem clock.

Consistent ETA computation across the system.


645-646: Duplicate attempt timestamp now uses Stem clock.

Aligned metadata timestamps for deduplication tracking.

packages/stem/lib/src/scheduler/beat.dart (2)

110-111: Scheduler tick now uses Stem clock.

Ensures lag metrics and due filtering use the centralized time source.


218-257: Dispatch timing now uses Stem clock.

Keeps duration and drift measurements consistent across success and failure paths.

Also applies to: 297-306

packages/stem_sqlite/lib/src/broker/sqlite_broker.dart (1)

231-243: Clock migration LGTM — all stemNow() call-sites are correct.

Every changed segment calls stemNow() at the time of use (not at broker construction), preserving correct semantics for visibility timeouts, dead-letter timestamps, and lease expiry calculations.

Also applies to: 256-280, 289-293, 307-323, 327-335, 402-402, 460-461, 518-519, 567-568, 611-611, 754-756

packages/stem/test/unit/workflow/workflow_clock_test.dart (1)

15-22: LGTM — SystemWorkflowClock correctly delegates through stemNow() to the zone clock.

The test correctly exercises the zone-scoped override and the clock.now tear-off as a DateTime Function().

packages/stem/test/unit/core/clock_test.dart (1)

1-26: LGTM — tests correctly cover both synchronous and async-boundary scenarios for the clock abstraction.

packages/stem/lib/src/brokers/in_memory_broker.dart (1)

9-9: LGTM — stemNow() call-sites are consistent with the clock migration.

Also applies to: 23-28, 121-121, 319-319, 448-458, 513-513, 549-549

packages/stem_redis/lib/src/brokers/redis_broker.dart (1)

367-367: LGTM — stemNow() substitutions are correct throughout.

Also applies to: 430-430, 791-791, 839-839, 855-855, 965-965, 1033-1033, 1091-1091

packages/stem/lib/stem.dart (1)

96-99: LGTM — queue_events.dart and stem_event.dart exports are appropriate public API surface.

packages/stem/lib/src/signals/signal.dart (1)

156-168: Good: SignalContext now carries the StemEvent payload and occurredAt.

This keeps dispatch timing aligned with event time and gives handlers unified event metadata.

packages/dashboard/lib/src/state/dashboard_state.dart (1)

84-105: LGTM: dashboard timestamps now align with the Stem clock.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (8)
packages/stem_redis/lib/src/brokers/redis_broker.dart (2)

719-729: ⚠️ Potential issue | 🔴 Critical

Bug: NOGROUP recovery creates the wrong consumer group — will loop indefinitely.

Line 724 correctly invalidates the cache entry for group, but line 725 passes consumer (the consumer name) instead of group (the consumer group) to _ensureBroadcastGroup. This creates a group named after the consumer rather than the actual group used by XREADGROUP on line 709, so the next iteration will hit NOGROUP again, causing an infinite retry loop.

🐛 Proposed fix
             if ('$error'.contains('NOGROUP')) {
               _groupsCreated.remove('$streamKey|$group');
-              await _ensureBroadcastGroup(channel, consumer);
+              await _ensureBroadcastGroup(channel, group);
               continue;
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/stem_redis/lib/src/brokers/redis_broker.dart` around lines 719 -
729, The NOGROUP recovery path invalidates the cached group but then calls
_ensureBroadcastGroup with the consumer name instead of the consumer group,
causing a new group with the wrong name and an infinite retry; update the catch
block so that after _groupsCreated.remove('$streamKey|$group') it calls await
_ensureBroadcastGroup(channel, group) (use the group variable passed to
XREADGROUP) rather than passing consumer, leaving the _shouldSuppressClosedError
and rethrow behavior intact.

853-875: 🧹 Nitpick | 🔵 Trivial

Minor: _envelopeFromMap generates and immediately re-parses a timestamp string when enqueuedAt is missing.

Line 861 calls stemNow().toIso8601String() to produce a default, which DateTime.parse(...) then immediately parses back. This round-trip is wasteful and risks subtle precision loss from ISO 8601 serialization.

♻️ Suggested simplification
-      enqueuedAt: DateTime.parse(
-        map['enqueuedAt'] ?? stemNow().toIso8601String(),
-      ),
+      enqueuedAt: map['enqueuedAt'] != null
+          ? DateTime.parse(map['enqueuedAt']!)
+          : stemNow(),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/stem_redis/lib/src/brokers/redis_broker.dart` around lines 853 -
875, The _envelopeFromMap method currently creates a default enqueuedAt by
serializing stemNow() to ISO string then immediately parsing it back; change
this to avoid the round-trip by using the parsed DateTime only when
map['enqueuedAt'] is present and otherwise use stemNow() directly (i.e., replace
the DateTime.parse(map['enqueuedAt'] ?? stemNow().toIso8601String()) pattern
with a conditional that calls DateTime.parse(map['enqueuedAt']!) when non-null/
non-empty, else uses stemNow()), keeping the rest of the Envelope fields and
types unchanged.
packages/stem_redis/test/integration/brokers/redis_broker_integration_test.dart (2)

22-22: ⚠️ Potential issue | 🟡 Minor

Use a separate namespace variable for the queue-events factory.

Both runBrokerContractTests.create (line 28) and runQueueEventsContractTests.create (line 72) write to the same contractNamespace variable. Both additionalBrokerFactory callbacks read from it. Within a single file Dart tests run sequentially, so this is currently safe in practice — but if the two suites' setUp calls were ever interleaved (e.g., a framework update, @Timeout interaction, or parallel test runner), the queue-events create would silently overwrite the broker contract's namespace, causing additionalBrokerFactory to connect to the wrong stream group.

🔧 Proposed fix: scope each namespace variable to its factory
  String? contractNamespace;
+  String? queueEventsContractNamespace;
   runBrokerContractTests(
     ...
     factory: BrokerContractFactory(
       create: () async {
         final namespace = _uniqueNamespace();
         contractNamespace = namespace;
         ...
       },
       additionalBrokerFactory: () async {
         final namespace = contractNamespace;
         ...
       },
     ),
   );

   runQueueEventsContractTests(
     ...
     factory: QueueEventsContractFactory(
       create: () async {
         final namespace = _uniqueNamespace();
-        contractNamespace = namespace;
+        queueEventsContractNamespace = namespace;
         ...
       },
       additionalBrokerFactory: () async {
-        final namespace = contractNamespace;
+        final namespace = queueEventsContractNamespace;
         if (namespace == null || namespace.isEmpty) {
           throw StateError(
             'Redis queue-events contract requires primary broker namespace.',
           );
         }

Also applies to: 67-72

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/stem_redis/test/integration/brokers/redis_broker_integration_test.dart`
at line 22, The shared top-level variable contractNamespace is reused by both
runBrokerContractTests.create and runQueueEventsContractTests.create causing
both additionalBrokerFactory callbacks to potentially read/write the same value;
to fix, remove the shared contractNamespace and create separate, locally scoped
namespace variables inside each factory create block (e.g.,
brokerContractNamespace inside runBrokerContractTests.create and
queueEventsContractNamespace inside runQueueEventsContractTests.create), update
any references in additionalBrokerFactory closures to use the corresponding
local variable, and ensure no external code relies on the old shared variable.

325-335: ⚠️ Potential issue | 🟠 Major

CI failure: 50 ms warm-up is too tight for the broadcast consumer-group race.

The pipeline shows worker-one timing out (10 s) waiting for the fan-out message. The only synchronization between consume() and publish() is a fixed 50 ms delay (line 325). Under Redis load — now heavier with the newly added runQueueEventsContractTests suite sharing the same Redis instance — consumer-group creation and the first blocking XREADGROUP call can take longer than 50 ms, leaving the subscriber not yet registered when the message lands. A message published before the consumer group is fully set up in Redis is silently missed (it lands in the stream but the group's delivered-ID cursor is initialized after it, so XREADGROUP never returns it).

Increase the warm-up or, better, actively wait for consumer readiness:

🔧 Suggested robustness improvement
-      await Future<void>.delayed(const Duration(milliseconds: 50));
+      // Wait long enough for both consumer-group subscriptions to register on CI.
+      await Future<void>.delayed(const Duration(milliseconds: 500));

Or add a retryUntil-style poll (preferred — environment-independent):

// Poll until both consumers have created their groups.
// (Requires broker to expose a group-exists check, or use a sentinel publish/ack.)

Additionally, consider isolating the broadcast test's Redis namespace from the new contract-test suites to eliminate cross-suite load as a contributing factor.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/stem_redis/test/integration/brokers/redis_broker_integration_test.dart`
around lines 325 - 335, The 50ms fixed delay before publisher.publish is too
short and causes a race where the consumer group/consumer may not be registered
before the broadcast is sent; replace the fixed await Future.delayed(const
Duration(milliseconds: 50)) with an active wait that polls for consumer
readiness (e.g., a retryUntil or loop that checks the broker/consumer group
exists or waits for a sentinel message/ack) before calling
publisher.publish(broadcast, routing: RoutingInfo.broadcast(channel: channel));
ensure the poll returns only once both consumers are ready so
workerOne.next.timeout no longer intermittently times out.
packages/stem/lib/src/workflow/core/workflow_clock.dart (1)

10-16: ⚠️ Potential issue | 🟡 Minor

Update SystemWorkflowClock docstring to match stemNow() behavior.

It now delegates to stemNow() (which can be scoped), not strictly DateTime.now().

♻️ Suggested doc update
-/// Default clock that proxies to [DateTime.now].
+/// Default clock that proxies to [stemNow].
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/stem/lib/src/workflow/core/workflow_clock.dart` around lines 10 -
16, The docstring for SystemWorkflowClock is inaccurate: it says it proxies to
DateTime.now but the implementation calls stemNow(), which can be scoped; update
the class and/or constructor docstring to state that SystemWorkflowClock
delegates to stemNow() (the scoped system time helper) rather than
DateTime.now(), and mention that stemNow() may be overridden/scoped in tests or
runtime environments; reference the class SystemWorkflowClock, its override of
WorkflowClock.now(), and the stemNow() helper so readers know where the
delegation occurs.
packages/stem/test/unit/worker/worker_test.dart (1)

486-557: ⚠️ Potential issue | 🟡 Minor

Duplicate test: 'emits worker lifecycle signals on start and shutdown' is defined twice.

This test (lines 486–557) is a near-exact duplicate of the test at lines 344–415 — same name, same consumerName: 'worker-life', same signal subscriptions, and same assertions. Both will execute in the test suite, and since they share StemSignals global state, the second invocation could interfere with or be influenced by the first.

Remove one of the two definitions.

#!/bin/bash
# Verify the duplication
rg -n "emits worker lifecycle signals on start and shutdown" packages/stem/test/unit/worker/worker_test.dart
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/stem/test/unit/worker/worker_test.dart` around lines 486 - 557, The
test "emits worker lifecycle signals on start and shutdown" is duplicated;
remove one definition to avoid interfering global StemSignals state — locate the
duplicate test block (the test with name 'emits worker lifecycle signals on
start and shutdown' that constructs InMemoryBroker, InMemoryResultBackend,
SimpleTaskRegistry, and Worker with consumerName 'worker-life' and the
subscriptions to
StemSignals.workerInit/workerReady/workerStopping/workerShutdown) and delete it
(or consolidate any unique assertions if needed) so only a single test for
Worker lifecycle signals remains.
packages/stem/lib/src/signals/payloads.dart (2)

651-686: ⚠️ Potential issue | 🟡 Minor

ScheduleEntryFailedPayload.occurredAt returns scheduledFor — not when the failure actually occurred.

Line 677 returns scheduledFor.toUtc() as occurredAt, but the failure happens after the scheduled time (when execution was attempted and errored). This could be significantly later than scheduledFor if there was scheduling drift or retries.

Since no explicit failedAt field exists, consider adding one or defaulting to stemNow() like the other execution-phase payloads:

Proposed fix
 class ScheduleEntryFailedPayload implements StemEvent {
-  const ScheduleEntryFailedPayload({
+  ScheduleEntryFailedPayload({
     required this.entry,
     required this.scheduledFor,
     required this.error,
     required this.stackTrace,
-  });
+    DateTime? occurredAt,
+  }) : _occurredAt = (occurredAt ?? stemNow()).toUtc();

   final ScheduleEntry entry;
   final DateTime scheduledFor;
   final Object error;
   final StackTrace stackTrace;
+  final DateTime _occurredAt;

   `@override`
   String get eventName => 'schedule-entry-failed';

   `@override`
-  DateTime get occurredAt => scheduledFor.toUtc();
+  DateTime get occurredAt => _occurredAt;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/stem/lib/src/signals/payloads.dart` around lines 651 - 686,
ScheduleEntryFailedPayload currently returns scheduledFor.toUtc() from the
occurredAt getter, which misrepresents when the failure was observed; update
ScheduleEntryFailedPayload to record the actual failure time by adding a new
DateTime failedAt (or defaulting failedAt = stemNow() when not provided) and
change the occurredAt getter to return failedAt.toUtc(); also update the
constructor and the attributes map (replace or add 'failedAt' with
failedAt.toUtc().toIso8601String()) so the payload reflects the real
execution-failure timestamp while retaining entry, scheduledFor, error, and
stackTrace.

688-717: ⚠️ Potential issue | 🔴 Critical

occurredAt getter calls stemNow() on every access — returns a different time each call.

ControlCommandReceivedPayload is const-constructible (Line 691), so it can't store a timestamp field. But occurredAt (Line 706) invokes stemNow().toUtc() in the getter body, meaning every access yields a new timestamp. This violates the StemEvent contract where occurredAt should represent when the event occurred.

The same issue affects ControlCommandCompletedPayload at Line 749.

Make the constructors non-const and capture the timestamp at construction time, consistent with other payloads like TaskReceivedPayload:

Proposed fix for ControlCommandReceivedPayload
 class ControlCommandReceivedPayload implements StemEvent {
-  const ControlCommandReceivedPayload({
+  ControlCommandReceivedPayload({
     required this.worker,
     required this.command,
-  });
+    DateTime? occurredAt,
+  }) : _occurredAt = (occurredAt ?? stemNow()).toUtc();

   final WorkerInfo worker;
   final ControlCommandMessage command;
+  final DateTime _occurredAt;

   `@override`
   String get eventName => 'control-command-received';

   `@override`
-  DateTime get occurredAt => stemNow().toUtc();
+  DateTime get occurredAt => _occurredAt;

Apply the same pattern to ControlCommandCompletedPayload.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/stem/lib/src/signals/payloads.dart` around lines 688 - 717, The
occurredAt getter currently calls stemNow() on each access, so change
ControlCommandReceivedPayload and ControlCommandCompletedPayload to capture the
event timestamp at construction: remove the const constructor, add a final
DateTime occurredAt field (initialized to stemNow().toUtc() in the constructor)
and replace the existing occurredAt getter implementation with that field; keep
eventName as-is and ensure both classes' constructors accept the same required
parameters (worker, command / command, result) while setting occurredAt once at
construction time.
♻️ Duplicate comments (6)
packages/stem_cli/test/unit/cli/cli_worker_stats_test.dart (1)

132-176: pauseErr and resumeErr are still never asserted.

Both buffers are passed to runStemCli but their contents are never verified. A silent CLI error would go undetected. This was raised in a previous review and has not been addressed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/stem_cli/test/unit/cli/cli_worker_stats_test.dart` around lines 132
- 176, The test creates pauseErr and resumeErr and passes them to runStemCli but
never asserts their contents; add assertions after each runStemCli call to
verify the CLI produced no error output (e.g., expect(pauseErr.toString(),
isEmpty) after pauseCode check and expect(resumeErr.toString(), isEmpty) after
resumeCode check) so silent errors from runStemCli are detected; reference the
pauseErr/resumeErr buffers and the runStemCli calls in the test to locate where
to insert these expects.
packages/stem_cli/lib/src/cli/worker.dart (1)

1081-1099: Both previous review issues are resolved — column header and exit code.

The column header is now Paused queues (after) (line 1082), and the exit code correctly accounts for error status replies at line 1102. These directly address the past comments.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/stem_cli/lib/src/cli/worker.dart` around lines 1081 - 1099, No code
changes required: the column header has been corrected to 'Paused queues
(after)' in the dependencies.out.writeln call and the exit-code logic correctly
accounts for error-status replies in the worker reply handling (see the ordered
list iteration over replies and usage of reply.status and reply.payload), so
mark this PR approved and merge.
packages/stem_redis/test/integration/brokers/redis_broker_integration_test.dart (2)

40-55: Partial fix for the past additionalDispose concern — verify framework cleanup.

The _NoCloseBroker wrapper has been removed (line 54), addressing the root cause flagged in the previous review. However, additionalDispose was not added. Whether the additional broker's connection is now properly reclaimed depends on whether BrokerContractFactory calls broker.close() on the additional broker implicitly, or relies solely on additionalDispose. Please confirm this.

#!/bin/bash
# Description: Check whether BrokerContractFactory disposes the additional broker
# automatically (i.e. calls close() on it) or only via additionalDispose.
# Also verify whether QueueEventsContractFactory has an additionalDispose parameter.

# 1. Find BrokerContractFactory definition
ast-grep --pattern 'class BrokerContractFactory {
  $$$
}'

# 2. Check how additionalBrokerFactory result is closed inside the broker contract runner
rg -n 'additionalDispose\|additionalBroker' --type dart -A 5

# 3. Check QueueEventsContractFactory for additionalDispose support
ast-grep --pattern 'class QueueEventsContractFactory {
  $$$
}'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/stem_redis/test/integration/brokers/redis_broker_integration_test.dart`
around lines 40 - 55, The test removed the _NoCloseBroker wrapper but didn’t add
an additionalDispose handler, so the additional broker returned by
additionalBrokerFactory (created via RedisStreamsBroker.connect) may not be
closed; update BrokerContractFactory to either call close() on any broker
produced by additionalBrokerFactory or add an additionalDispose parameter and
ensure the runner invokes it to close the broker, and similarly verify/extend
QueueEventsContractFactory to accept and invoke additionalDispose if it supports
additional brokers; reference BrokerContractFactory, additionalBrokerFactory,
additionalDispose, RedisStreamsBroker.connect and QueueEventsContractFactory
when making the change.

84-99: ⚠️ Potential issue | 🟡 Minor

Missing additionalDispose for the queue-events contract's additional broker.

additionalBrokerFactory (lines 84-99) returns a RedisStreamsBroker with no corresponding additionalDispose. The QueueEventsContractFactory only cleans up the additional broker if additionalDispose is provided—without it, the connection is leaked for every test iteration. All other adapter tests (SQLite, PostgreSQL, in-memory) provide this callback; the redis queue-events contract should match.

🔧 Proposed fix
      additionalBrokerFactory: () async {
        final namespace = contractNamespace;
        if (namespace == null || namespace.isEmpty) {
          throw StateError(
            'Redis queue-events contract requires primary broker namespace.',
          );
        }
        final broker = await RedisStreamsBroker.connect(
          redisUrl,
          namespace: namespace,
          defaultVisibilityTimeout: const Duration(seconds: 1),
          claimInterval: const Duration(milliseconds: 200),
          blockTime: const Duration(milliseconds: 100),
        );
        return broker;
      },
+     additionalDispose: (broker) =>
+         _safeCloseRedisBroker(broker as RedisStreamsBroker),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/stem_redis/test/integration/brokers/redis_broker_integration_test.dart`
around lines 84 - 99, The additionalBrokerFactory currently creates a
RedisStreamsBroker via RedisStreamsBroker.connect but there is no
additionalDispose, causing leaked connections; add an additionalDispose callback
alongside additionalBrokerFactory (used by QueueEventsContractFactory) that
accepts the broker returned by additionalBrokerFactory and properly
closes/disposes it (e.g., await broker.close()/disconnect()) to mirror other
adapters and ensure the RedisStreamsBroker connection is cleaned up after each
test.
packages/stem/lib/src/worker/worker.dart (1)

3728-3747: Past review concern about non-string queue identifiers has been addressed.

Lines 3737-3738 now correctly skip non-String values with if (value is! String) continue;.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/stem/lib/src/worker/worker.dart` around lines 3728 - 3747, The
previous issue was non-string queue identifiers being processed in
_extractQueueTargets; the current change correctly guards against that by
checking value is String and continuing otherwise, so no further code changes
are needed—keep the guard (if (value is! String) continue;) in
_extractQueueTargets and ensure the function still trims, filters empty names,
deduplicates using the Set queues, and returns a sorted List.
packages/stem/lib/src/signals/payloads.dart (1)

107-142: Execution-phase payloads now capture runtime timestamps — past concern addressed.

TaskReceivedPayload (and similarly TaskPrerunPayload, TaskPostrunPayload, TaskSuccessPayload, TaskFailurePayload, TaskRevokedPayload) now accept an optional occurredAt parameter and default to stemNow().toUtc(), replacing the previous use of envelope.enqueuedAt. This correctly reflects when the event actually occurred rather than when the task was enqueued.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/stem/lib/src/signals/payloads.dart` around lines 107 - 142, The
payloads should record runtime timestamps instead of envelope.enqueuedAt; update
each execution-phase payload (e.g., TaskReceivedPayload, TaskPrerunPayload,
TaskPostrunPayload, TaskSuccessPayload, TaskFailurePayload, TaskRevokedPayload)
to accept an optional occurredAt parameter, default it to (occurredAt ??
stemNow()).toUtc(), stop using envelope.enqueuedAt, and ensure the class exposes
occurredAt via the getter and includes relevant identifiers (e.g., taskId,
taskName, queue, workerId) in attributes so the event reflects when it actually
occurred.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/stem_adapter_tests/README.md`:
- Around line 29-32: Quick Start is missing the runRevokeStoreContractTests
call; add a line alongside runQueueEventsContractTests that invokes
runRevokeStoreContractTests with the same adapterName and an appropriate factory
(e.g., runRevokeStoreContractTests(adapterName: 'my-adapter', factory:
RevokeStoreContractFactory(create: createBroker))), matching the pattern used
for runQueueEventsContractTests and using the existing createBroker helper.

In `@packages/stem_cli/lib/src/cli/worker.dart`:
- Around line 1089-1098: The table rendering loop in
WorkerPingCommand/WorkerInspectCommand suppresses the error details because
updated/paused are computed only from reply.payload; change the loop in
worker.dart where ordered is iterated so that when reply.status != 'ok' you
extract the error string from reply.error?['message'] (falling back to
reply.error?.toString() or '-' if null) and display that in the table (e.g., in
the Updated or a new Error column) instead of the '-' placeholders; update the
assignments for updated and/or paused to use this error value when status !=
'ok' so operators see the message in the text output without needing --json.

In `@packages/stem_cli/test/unit/cli/cli_worker_stats_test.dart`:
- Around line 514-529: _Remove the unused _FastTask class since it is never
registered in any SimpleTaskRegistry or referenced by tests; delete the entire
class definition (the _FastTask class implementing TaskHandler<void>) from the
test file to eliminate dead code and keep only the tasks actually used (e.g.,
_BlockingTask) in the pause/resume tests._

In `@packages/stem_redis/lib/src/brokers/redis_broker.dart`:
- Around line 572-575: Add a one-line comment above the early return in the
RedisBroker.loop() method explaining that the check for queue == null
intentionally skips the queue-consumption loop for broadcast-only subscriptions;
update the loop() function (where the variable queue is checked) to include that
explanatory comment so future readers know this early return is deliberate
rather than a bug.

In `@packages/stem/lib/src/core/queue_events.dart`:
- Around line 246-278: The lazy cast rawPayload.cast<String, Object?>() in
_eventFromEnvelope can defer a TypeError until access time if keys aren't
Strings; replace it with an eager, fail-fast conversion (e.g., build a new Map
by iterating rawPayload or use Map.from and validate/convert each key to String)
so non-string keys produce an immediate, clear error (throw a FormatException
with context if a key cannot be converted) and return a Map<String, Object?> for
payload.

In `@packages/stem/lib/src/worker/worker.dart`:
- Around line 786-867: The group rate-limiter block currently catches on Object
(in the try around rateLimiter!.acquire) which swallows programming errors;
change the catch to a narrower exception type (e.g., "on Exception catch (error,
stack)" or the specific rate-limiter exception class provided by your rate
limiter) so only runtime errors from the rate limiter are handled by the
fail-open/fail-closed logic (handler.options.groupRateLimiterFailureMode,
RateLimiterFailureMode.failOpen) and let non-Exception errors (TypeError,
NoSuchMethodError, etc.) propagate; keep the existing handling paths that call
retryStrategy.nextDelay and _retryDelivery, and adjust any log/error fields
(groupKey, envelope, delivery, resultEncoder) accordingly.
- Around line 2459-2540: _refreshQueueSubscriptions currently cancels every
tracked subscription then recreates only active ones, causing unnecessary
disruption; change it to a differential update: compute activeQueues from
_effectiveQueues.where(!_isQueuePaused) and then (1) determine toCancel =
subscriptions currently in _queueSubscriptionNames or _subscriptions that are
not in activeQueues and only cancel/remove those (using subscription.cancel()
with the existing error handling), and (2) determine toCreate = activeQueues
that are not already present in _subscriptions/_queueSubscriptionNames and only
call broker.consume/stream.listen to create those subscriptions (adding them to
_subscriptions and _queueSubscriptionNames as done now). Keep existing error
handling and the broadcast logic (index==0) when creating subscriptions and do
not change _cancelAllSubscriptions() semantics.
- Around line 3749-3817: _in _processQueuePauseCommand_, the code uses
expiresAt: now when paused is false to create an immediately-expired revoke
entry (so resume is implemented by writing an expired pause rather than deleting
it), which risks a brief race where another path (_syncRevocations_) could
reapply the pause before pruneExpired runs; update the function to include a
clear comment above the entries creation (and/or above the
upsertAll/pruneExpired block) stating this intentional pattern: that resume is
represented by an expired RevokeEntry, that _applyQueuePauseEntry removes
expired entries, and noting the tiny race window between upsertAll and
pruneExpired (and why pruneExpired is called immediately) so future readers
understand the behavior and tradeoffs involving revokeStore.upsertAll,
revokeStore.pruneExpired, and _applyQueuePauseEntry.
- Around line 2204-2215: The code calls stemNow() multiple times causing drift;
fix by capturing a single Timestamp (e.g., final now = stemNow()) at the start
of the block and use it when computing scheduledAt, delay, and notBefore so
scheduledAt = request.eta ?? (request.countdown != null ?
now.add(request.countdown!) : null), delay = scheduledAt != null ?
scheduledAt.difference(now) : _computeRetryDelay(envelope.attempt, request,
StackTrace.current, policy), and notBefore = scheduledAt ?? now.add(delay);
update references to scheduledAt, delay, and notBefore accordingly.
- Around line 2029-2036: The code calls stemNow() twice producing mismatched
timestamps: capture a single timestamp (e.g., final now = stemNow()) before
computing nextRunAt and before building the republished envelope, then use
now.add(delay) for both nextRunAt and the envelope.copyWith(... notBefore: ... )
so the broker.nack / broker.publish sequence uses a consistent retry timestamp;
update usages around nextRunAt, envelope.copyWith, and notBefore (and keep
maxRetries and attempt handling the same).
- Around line 3936-3957: The queue_pause and queue_resume case handlers call
_processQueuePauseCommand without per-case error handling and can leak stack
traces; wrap each of the queue_pause and queue_resume case bodies in a try/catch
like the revoke case: call _processQueuePauseCommand inside try, set reply to
ControlReplyMessage with status 'ok' and the result payload on success, and in
catch construct a ControlReplyMessage with status 'error', requestId from
command.requestId, workerId _workerIdentifier, and a payload containing only the
error message (e.g. error.toString() or a safe message) — do not include stack
traces.
- Around line 2291-2321: The backend is recording TaskState.retried with the
original envelope.attempt in _retryDelivery (used for rate-limit requeues) which
differs from _handleFailure and _handleRetryRequest that publish with attempt +
1; add a concise comment near the _retryDelivery function (and optionally
reference the related _handleFailure and _handleRetryRequest) clarifying that
rate-limit backoff intentionally does not increment envelope.attempt and that
TaskState.retried in this path represents a requeue due to rate-limiting (not a
consumed retry attempt), so monitoring systems should treat these retried
records differently; reference backend.set, _statusMeta, and the
envelope.attempt behavior in the comment.
- Around line 2323-2337: The current _handlePausedQueueDelivery always uses a
fixed 250ms backoff which causes tight requeue loops; change it to compute an
exponential backoff (e.g., base 250ms * 2^n, capped at a max like 30s, with
jitter) keyed to the envelope (use envelope.id) or envelope.queue so repeated
requeues for the same in-flight message back off progressively; implement a
small Map<String,int> (e.g., _pausedRequeueAttempts) incremented each time
_handlePausedQueueDelivery runs for that envelope and reset when queue is
unpaused (see _refreshQueueSubscriptions and RevokeEntry), compute backoff from
that counter and pass it to _retryDelivery, and keep envelope.attempt unchanged
and preserve the extra metadata ({'queuePaused': true, 'queue':
envelope.queue}).

---

Outside diff comments:
In `@packages/stem_redis/lib/src/brokers/redis_broker.dart`:
- Around line 719-729: The NOGROUP recovery path invalidates the cached group
but then calls _ensureBroadcastGroup with the consumer name instead of the
consumer group, causing a new group with the wrong name and an infinite retry;
update the catch block so that after _groupsCreated.remove('$streamKey|$group')
it calls await _ensureBroadcastGroup(channel, group) (use the group variable
passed to XREADGROUP) rather than passing consumer, leaving the
_shouldSuppressClosedError and rethrow behavior intact.
- Around line 853-875: The _envelopeFromMap method currently creates a default
enqueuedAt by serializing stemNow() to ISO string then immediately parsing it
back; change this to avoid the round-trip by using the parsed DateTime only when
map['enqueuedAt'] is present and otherwise use stemNow() directly (i.e., replace
the DateTime.parse(map['enqueuedAt'] ?? stemNow().toIso8601String()) pattern
with a conditional that calls DateTime.parse(map['enqueuedAt']!) when non-null/
non-empty, else uses stemNow()), keeping the rest of the Envelope fields and
types unchanged.

In
`@packages/stem_redis/test/integration/brokers/redis_broker_integration_test.dart`:
- Line 22: The shared top-level variable contractNamespace is reused by both
runBrokerContractTests.create and runQueueEventsContractTests.create causing
both additionalBrokerFactory callbacks to potentially read/write the same value;
to fix, remove the shared contractNamespace and create separate, locally scoped
namespace variables inside each factory create block (e.g.,
brokerContractNamespace inside runBrokerContractTests.create and
queueEventsContractNamespace inside runQueueEventsContractTests.create), update
any references in additionalBrokerFactory closures to use the corresponding
local variable, and ensure no external code relies on the old shared variable.
- Around line 325-335: The 50ms fixed delay before publisher.publish is too
short and causes a race where the consumer group/consumer may not be registered
before the broadcast is sent; replace the fixed await Future.delayed(const
Duration(milliseconds: 50)) with an active wait that polls for consumer
readiness (e.g., a retryUntil or loop that checks the broker/consumer group
exists or waits for a sentinel message/ack) before calling
publisher.publish(broadcast, routing: RoutingInfo.broadcast(channel: channel));
ensure the poll returns only once both consumers are ready so
workerOne.next.timeout no longer intermittently times out.

In `@packages/stem/lib/src/signals/payloads.dart`:
- Around line 651-686: ScheduleEntryFailedPayload currently returns
scheduledFor.toUtc() from the occurredAt getter, which misrepresents when the
failure was observed; update ScheduleEntryFailedPayload to record the actual
failure time by adding a new DateTime failedAt (or defaulting failedAt =
stemNow() when not provided) and change the occurredAt getter to return
failedAt.toUtc(); also update the constructor and the attributes map (replace or
add 'failedAt' with failedAt.toUtc().toIso8601String()) so the payload reflects
the real execution-failure timestamp while retaining entry, scheduledFor, error,
and stackTrace.
- Around line 688-717: The occurredAt getter currently calls stemNow() on each
access, so change ControlCommandReceivedPayload and
ControlCommandCompletedPayload to capture the event timestamp at construction:
remove the const constructor, add a final DateTime occurredAt field (initialized
to stemNow().toUtc() in the constructor) and replace the existing occurredAt
getter implementation with that field; keep eventName as-is and ensure both
classes' constructors accept the same required parameters (worker, command /
command, result) while setting occurredAt once at construction time.

In `@packages/stem/lib/src/workflow/core/workflow_clock.dart`:
- Around line 10-16: The docstring for SystemWorkflowClock is inaccurate: it
says it proxies to DateTime.now but the implementation calls stemNow(), which
can be scoped; update the class and/or constructor docstring to state that
SystemWorkflowClock delegates to stemNow() (the scoped system time helper)
rather than DateTime.now(), and mention that stemNow() may be overridden/scoped
in tests or runtime environments; reference the class SystemWorkflowClock, its
override of WorkflowClock.now(), and the stemNow() helper so readers know where
the delegation occurs.

In `@packages/stem/test/unit/worker/worker_test.dart`:
- Around line 486-557: The test "emits worker lifecycle signals on start and
shutdown" is duplicated; remove one definition to avoid interfering global
StemSignals state — locate the duplicate test block (the test with name 'emits
worker lifecycle signals on start and shutdown' that constructs InMemoryBroker,
InMemoryResultBackend, SimpleTaskRegistry, and Worker with consumerName
'worker-life' and the subscriptions to
StemSignals.workerInit/workerReady/workerStopping/workerShutdown) and delete it
(or consolidate any unique assertions if needed) so only a single test for
Worker lifecycle signals remains.

---

Duplicate comments:
In `@packages/stem_cli/lib/src/cli/worker.dart`:
- Around line 1081-1099: No code changes required: the column header has been
corrected to 'Paused queues (after)' in the dependencies.out.writeln call and
the exit-code logic correctly accounts for error-status replies in the worker
reply handling (see the ordered list iteration over replies and usage of
reply.status and reply.payload), so mark this PR approved and merge.

In `@packages/stem_cli/test/unit/cli/cli_worker_stats_test.dart`:
- Around line 132-176: The test creates pauseErr and resumeErr and passes them
to runStemCli but never asserts their contents; add assertions after each
runStemCli call to verify the CLI produced no error output (e.g.,
expect(pauseErr.toString(), isEmpty) after pauseCode check and
expect(resumeErr.toString(), isEmpty) after resumeCode check) so silent errors
from runStemCli are detected; reference the pauseErr/resumeErr buffers and the
runStemCli calls in the test to locate where to insert these expects.

In
`@packages/stem_redis/test/integration/brokers/redis_broker_integration_test.dart`:
- Around line 40-55: The test removed the _NoCloseBroker wrapper but didn’t add
an additionalDispose handler, so the additional broker returned by
additionalBrokerFactory (created via RedisStreamsBroker.connect) may not be
closed; update BrokerContractFactory to either call close() on any broker
produced by additionalBrokerFactory or add an additionalDispose parameter and
ensure the runner invokes it to close the broker, and similarly verify/extend
QueueEventsContractFactory to accept and invoke additionalDispose if it supports
additional brokers; reference BrokerContractFactory, additionalBrokerFactory,
additionalDispose, RedisStreamsBroker.connect and QueueEventsContractFactory
when making the change.
- Around line 84-99: The additionalBrokerFactory currently creates a
RedisStreamsBroker via RedisStreamsBroker.connect but there is no
additionalDispose, causing leaked connections; add an additionalDispose callback
alongside additionalBrokerFactory (used by QueueEventsContractFactory) that
accepts the broker returned by additionalBrokerFactory and properly
closes/disposes it (e.g., await broker.close()/disconnect()) to mirror other
adapters and ensure the RedisStreamsBroker connection is cleaned up after each
test.

In `@packages/stem/lib/src/signals/payloads.dart`:
- Around line 107-142: The payloads should record runtime timestamps instead of
envelope.enqueuedAt; update each execution-phase payload (e.g.,
TaskReceivedPayload, TaskPrerunPayload, TaskPostrunPayload, TaskSuccessPayload,
TaskFailurePayload, TaskRevokedPayload) to accept an optional occurredAt
parameter, default it to (occurredAt ?? stemNow()).toUtc(), stop using
envelope.enqueuedAt, and ensure the class exposes occurredAt via the getter and
includes relevant identifiers (e.g., taskId, taskName, queue, workerId) in
attributes so the event reflects when it actually occurred.

In `@packages/stem/lib/src/worker/worker.dart`:
- Around line 3728-3747: The previous issue was non-string queue identifiers
being processed in _extractQueueTargets; the current change correctly guards
against that by checking value is String and continuing otherwise, so no further
code changes are needed—keep the guard (if (value is! String) continue;) in
_extractQueueTargets and ensure the function still trims, filters empty names,
deduplicates using the Set queues, and returns a sorted List.

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8e8dc15 and 477ed1a.

📒 Files selected for processing (31)
  • .site/docs/comparisons/stem-vs-bullmq.md
  • .site/docs/core-concepts/queue-events.md
  • .site/docs/core-concepts/rate-limiting.md
  • .site/docs/core-concepts/signals.md
  • packages/dashboard/lib/src/services/stem_service.dart
  • packages/stem/example/docs_snippets/lib/canvas_batch.dart
  • packages/stem/lib/src/canvas/canvas.dart
  • packages/stem/lib/src/core/clock.dart
  • packages/stem/lib/src/core/contracts.dart
  • packages/stem/lib/src/core/queue_events.dart
  • packages/stem/lib/src/scheduler/in_memory_lock_store.dart
  • packages/stem/lib/src/signals/payloads.dart
  • packages/stem/lib/src/worker/worker.dart
  • packages/stem/lib/src/workflow/core/workflow_clock.dart
  • packages/stem/lib/stem.dart
  • packages/stem/lib/testing.dart
  • packages/stem/test/unit/canvas/canvas_test.dart
  • packages/stem/test/unit/core/clock_test.dart
  • packages/stem/test/unit/worker/worker_test.dart
  • packages/stem/test/unit/workflow/workflow_clock_test.dart
  • packages/stem_adapter_tests/README.md
  • packages/stem_adapter_tests/lib/src/queue_events_contract_suite.dart
  • packages/stem_adapter_tests/lib/src/revoke_store_contract_suite.dart
  • packages/stem_cli/lib/src/cli/worker.dart
  • packages/stem_cli/test/unit/cli/cli_worker_stats_test.dart
  • packages/stem_postgres/lib/src/control/postgres_revoke_store.dart
  • packages/stem_redis/lib/src/brokers/redis_broker.dart
  • packages/stem_redis/test/integration/brokers/redis_broker_integration_test.dart
  • packages/stem_sqlite/lib/src/database/migrations/m_20260224103000_add_revoke_store.dart
  • packages/stem_sqlite/lib/src/models/stem_revoke_entry.dart
  • packages/stem_sqlite/test/control/sqlite_revoke_store_test.dart
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
🧰 Additional context used
🪛 GitHub Actions: stem_redis
packages/stem_redis/test/integration/brokers/redis_broker_integration_test.dart

[error] 335-335: Redis broadcast fan-out delivers to all subscribers failed. worker-one timed out waiting for broadcast message.


[error] 332-332: Asynchronous gap detected in test traces (possible flaky test or timeout).

🔇 Additional comments (55)
packages/stem_sqlite/lib/src/database/migrations/m_20260224103000_add_revoke_store.dart (1)

8-33: LGTM — migration schema is consistent and TZ-aware.

No issues spotted in table definition, indexes, or down migration.

packages/stem_sqlite/lib/src/models/stem_revoke_entry.dart (1)

5-46: LGTM — ORM model matches the migration and is clear.

No issues found with field mappings or annotations.

packages/stem_redis/lib/src/brokers/redis_broker.dart (4)

330-344: LGTM: Parameter rename from consumer to group clarifies intent.

The _ensureBroadcastGroup method now correctly receives the resolved group name rather than deriving it internally, which aligns with the updated broadcast group resolution logic in consume.


486-501: Broadcast-only mode routing looks correct.

The queue == null guard, queueGroup derivation, and empty streamKeys for broadcast-only mode are well-structured. Downstream uses of queueGroup! are all safely guarded by the early return in loop() and by streamKeys being empty (so the _scheduleClaim loop on line 628 is a no-op).


633-643: consumerGroup is now properly threaded into broadcast listeners.

This addresses the prior concern about consumerGroup being silently discarded in broadcast-only mode. The caller-supplied consumerGroup is honored as the group name for both _listenBroadcast and _scheduleClaim.


366-366: stemNow() migration is consistent and correct across all time-sensitive paths.

All DateTime.now() call sites (delay comparison, drain scoring, lease expiry, envelope defaults, dead-letter timestamps, replay scheduling, and lease extension) have been uniformly migrated to stemNow(). This enables deterministic testing and a single clock source.

Also applies to: 429-429, 797-797, 845-845, 861-861, 971-971, 1039-1039, 1097-1097

packages/stem_cli/test/unit/cli/cli_worker_stats_test.dart (3)

131-189: LGTM — try/finally cleanup is properly wired.

The finally block guards release.complete() against double-completion and ensures worker.shutdown() / broker.dispose() are always called, even on assertion failure. This correctly addresses the past comment.


478-493: LGTM — deadline is now checked before each predicate invocation.

Moving the deadline guard to the top of the while (true) loop (line 485) and making predicate a Future<bool> Function() addresses both the semantic imprecision from the past review and the synchronous-predicate mismatch. All call sites have been updated with () async => lambdas.


495-512: LGTM — active-polling negative assertion is a solid replacement for the fixed sleep.

_assertTaskRemainsQueued first awaits a non-null state (removing write-lag false failures), then continuously validates the state is queued throughout the holdFor window. Any early state transition causes an immediate StateError, which is strictly better than the prior single-snapshot check after a blind wait.

packages/stem_cli/lib/src/cli/worker.dart (5)

23-24: LGTM — new commands correctly registered.


1109-1129: LGTM — clean concrete implementations.


677-677: LGTM — stemNow().toUtc() consistently replaces DateTime.now().toUtc() here and in _renderHeartbeat (line 1455).


1888-1892: LGTM — stemNow().toUtc() used consistently for timestamp generation in _start, _stop, _status (lines 1888, 1930, 1973), WorkerHealthcheckCommand (line 2100), and _describeUptime (line 2426).


2681-2683: LGTM — stemNow() used consistently on both sides of the deadline comparison; the absence of .toUtc() is harmless since both DateTime values share the same timezone.

packages/stem_postgres/lib/src/control/postgres_revoke_store.dart (1)

130-131: Redundant ternary removed; clock abstraction applied — LGTM.

Both the previously flagged redundant ternary (existing != null ? entry.version : entry.versionentry.version) and the DateTime.now()stemNow() migration are correctly applied. The UTC semantics are consistent with SystemStemClock.

.site/docs/core-concepts/queue-events.md (1)

44-45: Grammar fix applied — LGTM.

"built into" is now correct.

packages/stem/lib/testing.dart (1)

1-5: Well-scoped testing barrel — LGTM.

The selective show FakeStemClock keeps production clock types (SystemStemClock, withStemClock) out of the testing surface while exposing exactly what test authors need.

packages/stem_adapter_tests/README.md (1)

72-76: Both past-flagged gaps resolved — LGTM.

QueueEventsContractCapabilities and RevokeStoreContractCapabilities tables are added and the fan-out opt-out recipe is included.

Also applies to: 95-99, 149-161

packages/stem/example/docs_snippets/lib/canvas_batch.dart (1)

32-40: Polling loop addresses past feedback — LGTM.

Correctly breaks on terminal state before the delay, so the last successful poll isn't penalized with a redundant 50 ms sleep.

packages/stem/lib/src/core/clock.dart (2)

17-19: UTC normalization applied — LGTM.

DateTime.now().toUtc() eliminates the production/test clock-kind mismatch previously flagged.


39-54: Zone-based clock scoping — LGTM.

_zoneClockKey = Object() (non-const) correctly creates a unique runtime identity that won't be canonicalized across library loads. The is StemClock guard in stemNow() provides a clean fallback. Zone values propagate through async continuations in Dart, so stemNow() will resolve correctly across await points within a withStemClock scope.

packages/stem/lib/src/scheduler/in_memory_lock_store.dart (1)

63-68: Single stemNow() capture in renew — LGTM.

Captures stemNow() once and reuses for both the expiry check and the new expiresAt assignment, removing the subtle two-read window flagged previously.

packages/stem/test/unit/workflow/workflow_clock_test.dart (1)

16-23: Zone-scoped clock integration test — LGTM.

Passing the clock.now tear-off directly into withStemClock cleanly verifies that SystemWorkflowClock delegates to stemNow() rather than calling DateTime.now() directly. This doubles as a regression guard if the delegation is ever accidentally removed.

packages/dashboard/lib/src/services/stem_service.dart (2)

160-174: Centralized clock for control envelopes looks good.

Using stemNow().toUtc() for enqueuedAt keeps timestamps aligned with the shared clock source.


192-200: Timeout loop now uses a single timestamp per iteration.

This avoids deadline/remaining TOCTOU gaps and reads cleanly.

packages/stem/test/unit/core/clock_test.dart (1)

5-25: Clock abstraction tests are clear and deterministic.

Nice coverage for scoped clock behavior and async propagation.

.site/docs/core-concepts/signals.md (2)

11-46: Signal lifecycle + StemEvent shape documentation reads cleanly.

Nice consolidation of semantics and ordering details.


115-121: No issues found. All documented helper names match the actual StemSignals API exactly. The method names onTaskSuccess, onTaskFailure, and all others listed are correct as documented.

packages/stem_adapter_tests/lib/src/queue_events_contract_suite.dart (1)

7-242: Queue events contract suite looks comprehensive.

The scenarios cover core delivery, filtering, metadata, and fan-out behavior well.

packages/stem_adapter_tests/lib/src/revoke_store_contract_suite.dart (1)

7-199: Revoke store contract suite covers the core semantics well.

packages/stem/lib/src/core/contracts.dart (3)

848-1040: Group rate-limit options and failure-mode parsing look consistent.


2007-2010: Delay now uses stemNow() for deterministic timing.


2066-2073: RateLimiterFailureMode enum is concise and clear.

.site/docs/core-concepts/rate-limiting.md (1)

12-153: Group rate-limiting docs are clear and explicit about defaults.

.site/docs/comparisons/stem-vs-bullmq.md (1)

1-62: Documentation structure looks good.

The comparison matrix is well-organized with clear status semantics, evidence-backed rationale links, and the previously flagged empty "Update policy" heading is now properly populated with content. The BullMQ events parity mapping table is a useful addition for users migrating from BullMQ.

packages/stem_sqlite/test/control/sqlite_revoke_store_test.dart (1)

1-101: Test structure is clean and past review feedback has been addressed.

The redundant dbFile.delete() in tearDown and the unnecessary cast (store as SqliteRevokeStore) from prior reviews are both resolved. The three test paths (fromDataSource, connect URL, adapter factory) provide good coverage with proper try/finally cleanup.

packages/stem/lib/src/core/queue_events.dart (3)

14-73: QueueCustomEvent implementation looks correct; past feedback addressed.

The name field is now included in attributes (line 57), and the class properly implements the StemEvent contract. The toJson method correctly serializes emittedAt to ISO 8601 format.


76-125: QueueEventsProducer.emit — input validation and broadcast routing look correct.

Proper trimming and empty-check on both queue and eventName, UTC timestamp via stemNow(), and broadcast routing per-queue channel. Clean implementation.


128-236: QueueEvents lifecycle management is robust; past feedback addressed.

The constructor now validates the queue name via _normalizeQueue (fail-fast). The _closed guard in _onDelivery (line 216) and _emitError (lines 231-233) properly protect against adding to a closed stream. The _waitForCallbackSuccess consolidation from prior review is also reflected in the delegated helper pattern.

packages/stem/lib/stem.dart (1)

90-99: New exports are correctly scoped; past feedback addressed.

FakeStemClock is now hidden from the production barrel (line 90), and the new queue_events.dart and stem_event.dart exports properly surface the new public API.

packages/stem/lib/src/canvas/canvas.dart (5)

155-246: Batch data model is well-designed.

BatchLifecycleState, BatchSubmission, and BatchStatus provide a clean, complete surface for batch lifecycle management. pendingCount properly clamps to avoid negatives, and isTerminal covers all terminal states including partial.


418-479: submitBatch idempotency and immutability — past feedback fully addressed.

The method now:

  1. Guards against resubmission by checking backend.getGroup(id) before initGroup (line 432).
  2. Distinguishes batch vs. non-batch groups with stem.batch meta check (line 434).
  3. Serializes per-batchId within the process via _withBatchSubmissionLock.

481-486: inspectBatch correctly guards against non-batch groups.

Line 484 checks status.meta['stem.batch'] != true, returning null for groups not created via submitBatch. Past feedback addressed.


738-782: _buildBatchStatus terminal filtering and positive state checks — past feedback fully addressed.

Entries are filtered to isTerminal states before counting (line 740), and lifecycle state classification uses positive equality (succeeded == completed, etc.) instead of fragile negative exclusion.


488-504: The lock provides no protection against concurrent submissions with the same batchId from separate Canvas instances.

While _withBatchSubmissionLock serializes operations within a single Canvas instance, concurrent submissions from different Canvas instances (or separate processes) can both pass the getGroup check before either calls initGroup, causing a race condition. This is verified in the backend implementations:

  • PostgreSQL (insertOrIgnore): Only the first initGroup succeeds; duplicates fail silently. Provides some safety but the check-then-act pattern remains racy.
  • SQLite (upsert): Duplicates overwrite the group metadata, potentially causing a mismatch between expected count and actual dispatched tasks.
  • Redis and in-memory backend (SET/map assignment): Duplicates overwrite the group, creating the same mismatch issue.

If multiple Canvas instances can be submitted against the same batchId, consider either enforcing uniqueness constraints at the backend level or using a distributed lock (not instance-local).

packages/stem/test/unit/canvas/canvas_test.dart (2)

83-154: Batch test coverage is thorough and well-structured.

The three new tests cover the key batch scenarios: basic lifecycle, idempotency with caller-provided batchId, and terminal-only entry counting. The inspectBatch test (line 124) directly manipulates backend state to isolate the counting logic — a clean unit-test approach. The shared _waitForNonNull helper (past feedback addressed) reduces duplication nicely.


194-210: Generic polling helper is clean.

_waitForNonNull properly abstracts the polling pattern with configurable timeout and descriptive error messages. Both _waitForSuccess and _waitForBatchTerminal delegate to it, eliminating the previously flagged duplication.

packages/stem/test/unit/worker/worker_test.dart (5)

1291-1377: Group limiter cross-type key sharing test is well-constructed.

The _ScenarioRateLimiter precisely controls when rate limiting kicks in, and the test validates both the shared key behavior and the retry event metadata. Past feedback about the redundant isNotNull assertion has been addressed.


1427-1481: Fail-closed test: past timing feedback addressed.

maxRetries: 5 is now explicitly set (line 1447), and _waitForTaskState (line 1475) replaces the fixed delay, making this test more resilient to CI variability.


1483-1562: Pause/resume persistence test looks solid.

The _assertTaskRemainsQueued helper (line 1533) is an improvement over a fixed delay — it first confirms the task exists, then actively asserts it stays queued for the hold period. The full pause → restart → verify-paused → resume → verify-completed flow provides good end-to-end coverage.


1694-1722: Polling helpers: past feedback addressed.

_waitFor now accepts FutureOr<bool> predicates (line 1695), and _waitForCallbackSuccess correctly delegates to _waitForTaskState (lines 1716–1721), eliminating the previously flagged duplication.


1745-1765: _assertTaskRemainsQueued — reasonable approach for paused-queue assertion.

The helper first polls until the task exists (line 1750), then asserts the state remains queued for holdFor duration. This is inherently timing-sensitive but a pragmatic improvement: the two-phase approach (wait-for-existence then assert-hold) reduces the chance of vacuous passing while keeping the test straightforward.

packages/stem/lib/src/worker/worker.dart (1)

4301-4359: WorkerEvent now implements StemEvent — clean addition.

The timestamp field defaulting to stemNow() and the eventName / occurredAt / attributes getters are well-structured. The attributes map conditionally includes fields only when present, avoiding null noise. This aligns well with the broader StemEvent contract.

packages/stem/lib/src/signals/payloads.dart (2)

44-71: BeforeTaskPublishPayload and AfterTaskPublishPayload use envelope.enqueuedAt — correct for publish-phase events.

For these two payloads, the event is the publish action, so enqueuedAt is the appropriate semantic timestamp.

Also applies to: 73-104


530-588: WorkflowRunPayload.withSignalName — clean copy pattern for rebinding signal context.

The withSignalName method correctly propagates all fields including the captured _occurredAt, avoiding the multiple-stemNow() problem.

@kingwill101 kingwill101 merged commit 446aa88 into master Feb 24, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant