feat(stem): close BullMQ parity gaps across events, control, and stores#17
feat(stem): close BullMQ parity gaps across events, control, and stores#17kingwill101 merged 23 commits intomasterfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds queue-scoped custom events, a durable batch submission API with inspectable lifecycle, unified StemEvent contract and expanded signals, group-scoped rate limiting, SQLite-backed revoke store (migrations/ORM/adapter), worker pause/resume CLI, broadcast-only consumer support, a zone-aware clock (stemNow), many docs, examples, and tests. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Producer as QueueEventsProducer
participant Broker
participant Listener as QueueEvents
Client->>Producer: emit(queue, eventName, payload, headers)
Producer->>Broker: publish(broadcastChannel, envelope)
Broker->>Listener: deliver(envelope)
Listener->>Listener: validate & convert -> QueueCustomEvent
Listener-->>Client: stream QueueCustomEvent (on / events)
Client->>Listener: close()
Listener->>Broker: cancel subscription
sequenceDiagram
participant Caller
participant Canvas
participant GroupMgr
participant Broker as Worker/Broker
Caller->>Canvas: submitBatch(signatures)
Canvas->>GroupMgr: create durable group (batchId)
Canvas->>Broker: dispatch grouped tasks (groupId=batchId)
Canvas-->>Caller: return BatchSubmission(batchId, taskIds)
Caller->>Canvas: inspectBatch(batchId)
Canvas->>GroupMgr: read group status
GroupMgr-->>Canvas: group lifecycle snapshot
Canvas-->>Caller: BatchStatus(state, counts, meta)
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Package publishing
Documentation at https://github.com/dart-lang/ecosystem/wiki/Publishing-automation. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6321d54294
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Actionable comments posted: 31
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (7)
packages/stem_adapter_tests/README.md (1)
29-47:⚠️ Potential issue | 🟡 MinorAdd
runRevokeStoreContractTeststo Quick Start and documentRevokeStoreContractCapabilities.The package exports both
runRevokeStoreContractTestsandRevokeStoreContractCapabilities(withverifyPruneExpiredflag), but neither appears in the README. Adapter authors will lack guidance on wiring up the revoke-store contract suite or tuning its behavior.Add the test runner to the Quick Start snippet between
runQueueEventsContractTestsand the workflow factory setup:runRevokeStoreContractTests( adapterName: 'my-adapter', factory: RevokeStoreContractFactory(create: createRevokeStore), );Add a new RevokeStoreContractCapabilities section to the Capability Flags table (after QueueEventsContractCapabilities):
| `verifyPruneExpired` | `true` | Expiry pruning tests | Verifies expired revoke records are cleaned up. |Also add a corresponding Adapter Recipe example showing how to disable
verifyPruneExpiredif needed.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/stem_adapter_tests/README.md` around lines 29 - 47, Add the missing revoke-store contract docs: insert a call to runRevokeStoreContractTests (using RevokeStoreContractFactory(create: createRevokeStore)) into the Quick Start snippet between runQueueEventsContractTests and the workflow factory setup, and add a new Capability Flags entry for RevokeStoreContractCapabilities with the verifyPruneExpired flag (default true) describing its purpose ("Expiry pruning tests — verifies expired revoke records are cleaned up"); also add a short Adapter Recipe example showing how to construct RevokeStoreContractCapabilities with verifyPruneExpired: false to disable those pruning tests.packages/stem/test/unit/worker/worker_test.dart (1)
486-486:⚠️ Potential issue | 🟡 MinorDuplicate test name — "emits worker lifecycle signals on start and shutdown" at lines 344 and 486.
Both tests share the identical name, making it difficult to distinguish failures in test reports. Consider renaming one of them to give each a distinct description.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/stem/test/unit/worker/worker_test.dart` at line 486, Two tests share the exact name "emits worker lifecycle signals on start and shutdown", causing ambiguous test reports; locate the duplicate test declaration that uses that string (the test(...) call in worker_test.dart) and rename one of them to a distinct description (e.g. "emits worker lifecycle signals on start and stop" or "emits lifecycle signals on start and shutdown for multiple workers") so each test has a unique, descriptive name; keep the test body unchanged, only update the test string passed to test()..site/docs/brokers/sqlite.md (1)
74-88:⚠️ Potential issue | 🟡 Minor"Recommended layout" section is missing the revoke store file.
The section advises separate DB files for broker and backend but doesn't mention that the revoke store also requires its own file (e.g.,
stem_revoke.sqlite). The newconfigureSqliteRevokeStore()snippet uses a dedicated file, and the same WAL/single-writer constraints apply.📝 Suggested addition
A simple layout:./stem_broker.sqlite # broker only
./stem_backend.sqlite # result backend only
+./stem_revoke.sqlite # revoke store only🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.site/docs/brokers/sqlite.md around lines 74 - 88, The "Recommended layout" section omits the revoke store DB file; update the docs to mention that the revoke store needs its own SQLite file (e.g., stem_revoke.sqlite) and that it must follow the same WAL / single-writer guidance; reference the new configureSqliteRevokeStore() snippet and add the revoke file to the example list (./stem_revoke.sqlite # revoke store only) so readers know to keep broker, backend, and revoke stores in separate DB files..site/docs/core-concepts/canvas.md (1)
86-90:⚠️ Potential issue | 🟡 MinorMissing
canvas_batch.dartin the "Running the examples" list.The new batch example isn't included in the run instructions alongside chain, group, and chord.
Proposed fix
cd packages/stem/example/docs_snippets dart run lib/canvas_chain.dart dart run lib/canvas_group.dart +dart run lib/canvas_batch.dart dart run lib/canvas_chord.dart🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.site/docs/core-concepts/canvas.md around lines 86 - 90, Add the new batch example to the run instructions list by including the missing command that runs canvas_batch.dart alongside the existing entries (canvas_chain.dart, canvas_group.dart, canvas_chord.dart) so the "Running the examples" block also shows: dart run lib/canvas_batch.dart; update the snippet where those three run commands are listed to include this fourth command..site/docs/workers/worker-control.md (1)
216-233:⚠️ Potential issue | 🟡 MinorLifecycle-guards paragraph is orphaned under "Queue Pause/Resume".
Lines 226–233 ("Lifecycle guards can also recycle isolates automatically…") logically belong under "Shutdown Modes and Lifecycle Guards" (line 202) but now sit under the newly inserted "Queue Pause/Resume" heading. This makes the pause/resume section read as if lifecycle recycling is related to queue pausing.
Move the lifecycle-guards paragraph and its code snippet above the
## Queue Pause/Resumeheading so it remains part of the shutdown section.Proposed restructure
- **Hard** immediately requeues active deliveries and terminates isolates. By default, workers install signal handlers ... with `WorkerLifecycleConfig(installSignalHandlers: false)` when embedding Stem inside a larger host that already owns signal routing. +Lifecycle guards can also recycle isolates automatically: + +```dart file=<rootDir>/../packages/stem/example/docs_snippets/lib/worker_control.dart#worker-control-lifecycle + +``` + +Recycling occurs after the active task finishes; the worker logs the recycle +reason and spawns a fresh isolate before accepting new work. + ## Queue Pause/Resume `stem worker pause` and `stem worker resume` target queue names (repeatable `--queue`) and optionally specific workers (`--worker`). Paused queues are requeued instead of executed until resumed. - Pause/resume state is persisted when a revoke store is configured. - Without a revoke store, pause/resume still works for active workers but does not survive worker restarts. -Lifecycle guards can also recycle isolates automatically: - -```dart file=<rootDir>/../packages/stem/example/docs_snippets/lib/worker_control.dart#worker-control-lifecycle - -``` - -Recycling occurs after the active task finishes; the worker logs the recycle -reason and spawns a fresh isolate before accepting new work. - ## Configuration Summary🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.site/docs/workers/worker-control.md around lines 216 - 233, Move the lifecycle-guards paragraph and its associated code snippet (the block referencing worker-control-lifecycle) out of the "Queue Pause/Resume" section and place it directly under the "Shutdown Modes and Lifecycle Guards" heading so that the sentence "Lifecycle guards can also recycle isolates automatically:" plus the ```dart file=...#worker-control-lifecycle``` snippet and the following sentence about recycling appear before the "## Queue Pause/Resume" heading; ensure the snippet marker worker-control-lifecycle and the recycle explanation remain intact and that "Queue Pause/Resume" begins immediately after that moved content.packages/stem_redis/test/integration/brokers/redis_broker_integration_test.dart (1)
22-22: 🧹 Nitpick | 🔵 TrivialDedicate a separate namespace variable for the queue events contract to avoid fragile shared state.
Both
runBrokerContractTests.createandrunQueueEventsContractTests.createwrite to the samecontractNamespacevariable. Their respectiveadditionalBrokerFactoryclosures then read it. While Dart's default sequential test execution makes this safe today, the two suites clobbering the same variable is brittle — enabling--concurrencyor reordering registration would cause the wrong namespace to be picked up.♻️ Proposed refactor: dedicated namespace variables per contract suite
- String? contractNamespace; + String? contractNamespace; + String? queueEventsContractNamespace;Then update the queue events factory to use its own variable:
runQueueEventsContractTests( adapterName: 'Redis', factory: QueueEventsContractFactory( create: () async { final namespace = _uniqueNamespace(); - contractNamespace = namespace; + queueEventsContractNamespace = namespace; return RedisStreamsBroker.connect( redisUrl, namespace: namespace, ... ); }, ... additionalBrokerFactory: () async { - final namespace = contractNamespace; + final namespace = queueEventsContractNamespace; if (namespace == null || namespace.isEmpty) { throw StateError( 'Redis queue-events contract requires primary broker namespace.', ); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/stem_redis/test/integration/brokers/redis_broker_integration_test.dart` at line 22, The two test suites share the single variable contractNamespace causing fragile shared state; introduce a dedicated variable (e.g., queueContractNamespace) for the queue-events suite and keep contractNamespace for the broker suite, then update the additionalBrokerFactory closure passed to runQueueEventsContractTests.create to read the new queueContractNamespace instead of the shared contractNamespace so each suite writes/reads its own namespace (refer to contractNamespace, runBrokerContractTests.create, runQueueEventsContractTests.create, and the additionalBrokerFactory closures to locate the changes).packages/stem/lib/src/signals/payloads.dart (1)
661-733:⚠️ Potential issue | 🟠 Major
occurredAtshould be stable for control-command payloads.
UsingDateTime.now()in the getter produces a different timestamp on every access. Capture the timestamp at construction and return a stored value.🐛 Proposed fix
class ControlCommandReceivedPayload implements StemEvent { /// Creates a new [ControlCommandReceivedPayload] instance. - const ControlCommandReceivedPayload({ + ControlCommandReceivedPayload({ required this.worker, required this.command, - }); + DateTime? timestamp, + }) : _occurredAt = (timestamp ?? DateTime.now()).toUtc(); + + final DateTime _occurredAt; @@ - DateTime get occurredAt => DateTime.now().toUtc(); + DateTime get occurredAt => _occurredAt; @@ class ControlCommandCompletedPayload implements StemEvent { /// Creates a new [ControlCommandCompletedPayload] instance. - const ControlCommandCompletedPayload({ + ControlCommandCompletedPayload({ required this.worker, required this.command, required this.status, this.response, this.error, - }); + DateTime? timestamp, + }) : _occurredAt = (timestamp ?? DateTime.now()).toUtc(); + + final DateTime _occurredAt; @@ - DateTime get occurredAt => DateTime.now().toUtc(); + DateTime get occurredAt => _occurredAt;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/stem/lib/src/signals/payloads.dart` around lines 661 - 733, The occurredAt getters on ControlCommandReceivedPayload and ControlCommandCompletedPayload are non-stable because they call DateTime.now() on each access; fix by capturing the timestamp once during construction and returning that stored value. Add a final DateTime field (e.g., _occurredAt or occurredAtUtc) to each class, set it to DateTime.now().toUtc() in the constructor (or accept an optional DateTime parameter defaulting to now().toUtc()), and change the occurredAt getter to return that field instead of calling DateTime.now() again; update the const constructors and any call sites if you add an optional parameter.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.site/docs/comparisons/stem-vs-bullmq.md:
- Around line 46-48: The "## Update policy" heading is empty while the
procedural "When this matrix changes…" block currently sits after "## BullMQ
events parity notes"; move the entire "When this matrix changes…" block (the
procedural steps) so it appears directly under the "## Update policy" heading
(or delete the empty "## Update policy" heading if you prefer to keep the steps
where they are), ensuring the header and its body are contiguous and
semantically correct; look for the "## Update policy" and the paragraph
beginning "When this matrix changes" to relocate or remove accordingly.
In @.site/docs/core-concepts/queue-events.md:
- Around line 44-45: Change the phrase "built in to `QueueEvents`" to the
correct compound form "built into `QueueEvents`" in the documentation sentence
describing delivery behavior so the line reads: "Delivery follows the underlying
broker's broadcast behavior for active listeners (no historical replay API is
built into `QueueEvents`)."
In @.site/docs/core-concepts/rate-limiting.md:
- Around line 146-153: Update the docs for groupRateLimiterFailureMode to state
the default is "failOpen"; specifically, in the bullet list documenting
groupRateLimiterFailureMode (near the entries for groupRateLimit, groupRateKey,
groupRateKeyHeader) add a short sentence like "Default: failOpen" or annotate
the failOpen entry with "(default)" so readers know the API defaults to failOpen
when not specified; ensure the symbol name groupRateLimiterFailureMode and the
two options failOpen/failClosed are mentioned exactly as in the code/docs.
In @.site/docs/core-concepts/signals.md:
- Around line 36-37: Update the description for the signals to clarify ordering
and roles: change the `taskReceived` line to state it fires when the message is
claimed/dequeued, change the `taskPrerun` line to state it fires immediately
before the handler is invoked, and make the ordering explicit as `taskReceived →
taskPrerun → handler → taskPostrun` (keep `taskPostrun` as running after
completion) so readers can see the precise sequence and distinction between
`taskReceived` and `taskPrerun`.
In `@packages/stem_adapter_tests/lib/src/queue_events_contract_suite.dart`:
- Around line 245-251: The file-level mutable _counter used by _queueName() can
persist across suite runs in the same isolate; replace the top-level mutable
state by making the counter local to the generator or remove it entirely and
derive uniqueness from a non-mutable source (e.g., a UUID or Random value
combined with DateTime). Update _queueName to either maintain a local static
counter inside the function or build the suffix with
DateTime.now().microsecondsSinceEpoch plus a short random/UUID string so you no
longer need the top-level _counter variable.
In `@packages/stem_adapter_tests/lib/src/revoke_store_contract_suite.dart`:
- Around line 144-194: Add a subcase to the existing test that inserts a
RevokeEntry with namespace 'stem-contract-a' and expiresAt set to null via
current.upsertAll, then call current.pruneExpired(namespaceA, now) and assert
the pruned count does not remove that entry and that current.list(namespaceA)
still contains its taskId; reference the existing test scope using RevokeEntry,
current.upsertAll, current.pruneExpired, and current.list so the new assertion
verifies entries with null expiresAt survive pruning.
In `@packages/stem_adapter_tests/README.md`:
- Around line 72-77: Add an "Adapter Recipes" example showing how to opt out of
fan-out by demonstrating usage of QueueEventsContractCapabilities with
verifyFanout set to false; specifically add a snippet that calls
runQueueEventsContractTests with an adapterName (e.g.,
'single-listener-adapter'), a QueueEventsContractFactory(create: createBroker),
and settings: QueueEventsContractSettings(capabilities:
QueueEventsContractCapabilities(verifyFanout: false)), so adapter authors have a
copy-paste template for brokers that do not support fan-out.
- Around line 72-77: The README is missing documentation for the
RevokeStoreContractCapabilities class and its verifyPruneExpired flag; add a new
capability table titled "RevokeStoreContractCapabilities" mirroring the style of
other sections: include a row with Flag `verifyPruneExpired`, Default `true`,
Affects (describe what it affects, e.g., revoke store pruning tests or
prune-expired behavior), and Behavior when enabled (describe that it verifies
expired revocations are pruned). Ensure the table formatting matches the
existing markdown tables and place it alongside the other capability sections.
In `@packages/stem_cli/lib/src/cli/worker.dart`:
- Around line 1081-1095: The table header in _renderQueueControlReplies is
misleading for resume operations (e.g., WorkerResumeCommand) because
reply.payload['paused'] shows queues still paused after the operation; change
the header to make that explicit (e.g., "Paused queues (after)" or "Still
paused") and update the column formatting in the same function where the header
lines are written (the two dependencies.out.writeln calls and the subsequent row
formatting using reply.payload['paused']) so the label accurately reflects
"paused after" state; alternatively, if you prefer separate semantics, split
_renderQueueControlReplies into two renderers (one used by WorkerPauseCommand
and one by WorkerResumeCommand) with appropriate header text for each.
- Line 1099: The exit code logic currently returns 0 whenever any reply exists
(replies.isEmpty ? 70 : 0); change it to return 70 if replies is empty or if any
reply reports an error status. Locate the return using the replies variable and
update the condition to check replies.isEmpty || replies.any((r) => r['status']
== 'error') (or the codebase's Reply type/status accessor) so the command
mirrors WorkerPingCommand/WorkerInspectCommand/WorkerRevokeCommand behavior and
exits 70 when any worker reports an error.
In `@packages/stem_cli/test/unit/cli/cli_worker_stats_test.dart`:
- Around line 148-150: The test uses a fixed 180ms delay then checks
backend.get(taskId) for TaskState.queued which is flaky; replace the fixed sleep
with a deterministic wait: after issuing the pause command, either wait for an
explicit pause-ack/event from the worker (preferred) or poll backend.get(taskId)
in a short loop until you observe a stable queued state or a configurable
timeout, then assert pausedStatus?.state == TaskState.queued; ensure you
reference taskId, backend.get, and TaskState.queued and fail the test only after
the timeout to avoid timing-based flakes.
- Around line 129-143: The test allocates pauseErr and resumeErr StringBuffer
values but never asserts their contents, so stderr output from runStemCli calls
(for the 'worker pause' and 'worker resume' commands) can go unnoticed; update
the test around the runStemCli invocations to assert that pauseErr.toString()
and resumeErr.toString() are empty (or match expected stderr) after their
respective pauseCode and resumeCode checks, and keep existing assertions for
pauseOut/resumeOut and exit codes from runStemCli to ensure CLI errors are
detected.
- Around line 463-475: The _waitFor helper can exceed timeout because it calls
predicate before checking the deadline; change the loop to check the deadline
before each predicate invocation and before sleeping so the timeout is enforced
precisely. In practice, compute deadline as now+timeout (already done), then
loop: if DateTime.now().isAfter(deadline) throw TimeoutException(...); if (await
predicate()) return; await Future.delayed(pollInterval); This ensures _waitFor,
predicate, timeout and pollInterval semantics are preserved while preventing the
function from running past the specified timeout.
- Around line 111-175: The test 'pause/resume commands stop and restart queue
consumption' leaves the Worker and InMemoryBroker running if an assertion fails;
wrap the test's logic after creating/starting the Worker (after await
worker.start()) in a try/finally and move the cleanup calls into the finally
block so worker.shutdown() is always awaited and broker.dispose() always called;
reference the Worker instance named worker and the broker variable to ensure
those two resources are shut down in the finally block.
In
`@packages/stem_redis/test/integration/brokers/redis_broker_integration_test.dart`:
- Around line 90-106: The test's additionalBrokerFactory wraps
RedisStreamsBroker.connect in _NoCloseBroker which prevents closing the inner
connection and leaks Redis; remove the _NoCloseBroker wrapper and instead pass
an explicit additionalDispose that calls _safeCloseRedisBroker on the created
broker (i.e. keep using RedisStreamsBroker.connect in additionalBrokerFactory
but return the raw broker and supply additionalDispose: (b) =>
_safeCloseRedisBroker(b) so the underlying Redis connection is properly closed).
In
`@packages/stem_sqlite/lib/src/database/migrations/m_20260224103000_add_revoke_store.dart`:
- Around line 14-19: The issued_at and expires_at columns in the stem_revokes
migration use .timestamp() but should use timezone-aware .timestampTz() to match
audit columns and avoid serialization/comparison issues used by pruneExpired
(which compares expiresAt DateTimes); update the migration so the builder call
for issued_at uses timestampTz() and the expires_at column uses
timestampTz().nullable(), keeping other column definitions (terminate, reason,
requested_by) unchanged so timestamp handling is consistent with timestampsTz()
used elsewhere (task results, groups, worker heartbeats).
In `@packages/stem_sqlite/lib/src/models/stem_revoke_entry.dart`:
- Around line 34-35: The doc comment for the `terminate` field in
StemRevokeEntry is ambiguous about which integer values map to true/false;
update the comment on `final int terminate` to explicitly state the contract
(e.g., "0 = false, 1 = true; only 0/1 are valid" or "0 = false, any non-zero =
true") and, if the repository uses adapter methods (e.g., toDomain/fromDomain or
any conversion functions for RevokeEntry), add validation and explicit
conversion there so the domain model uses bool and the DB layer enforces or
normalizes the int to the chosen contract.
In `@packages/stem_sqlite/test/control/sqlite_revoke_store_test.dart`:
- Around line 28-31: Remove the redundant cast in the contract factory's dispose
closure: the parameter `store` already implements `RevokeStore` (which declares
`close()`), so update the `dispose` passed to `RevokeStoreContractFactory` to
call `store.close()` directly instead of `(store as SqliteRevokeStore).close()`;
locate the factory creation using `RevokeStoreContractFactory` and change the
`dispose` lambda accordingly.
- Around line 19-24: The tearDown block currently checks and deletes dbFile
explicitly before calling tempDir.delete(recursive: true), which is redundant
because tempDir.delete(recursive: true) already removes dbFile; remove the
dbFile.existsSync() and await dbFile.delete() lines from the tearDown so it
simply awaits tempDir.delete(recursive: true) (locate the tearDown function and
the dbFile/tempDir identifiers to apply the change).
In `@packages/stem/example/docs_snippets/lib/canvas_batch.dart`:
- Around line 32-36: The example calls
app.canvas.inspectBatch(submission.batchId) immediately after submitBatch which
will usually show a non-terminal state; update the snippet to either await a
short delay (e.g., sleep/pause) before calling inspectBatch or add an inline
comment clarifying this demonstrates the inspection API and may return
pending/running states—not a completed batch (refer to inspectBatch,
submitBatch, and submission.batchId in the snippet).
In `@packages/stem/lib/src/canvas/canvas.dart`:
- Around line 418-444: submitBatch currently always calls backend.initGroup and
group(...) which re-initializes metadata and re-publishes tasks for a
caller-provided batchId; change submitBatch to first check whether a group with
the computed id already exists in the backend (e.g., via a backend.getGroup /
backend.groupExists or similar method) and if it exists return an existing
BatchSubmission (fetch taskIds from the backend group record) without calling
backend.initGroup or group(...); only call backend.initGroup and publish tasks
with group(signatures, groupId: id) when the group does not already exist to
prevent re-initialization and duplicate task dispatching.
- Around line 685-727: The _buildBatchStatus function misclassifies batch state
because it treats all status.results.entries as completed and infers succeeded
via negative checks; fix it by first filtering entries to terminal states using
the existing TaskStateX.isTerminal (or TaskState.isTerminal) extension, compute
completed = terminalEntries.length, compute succeeded/failed/cancelled by
counting terminalEntries where entry.value.state ==
TaskState.succeeded/failed/cancelled, and then determine BatchLifecycleState in
positive terms (e.g., if completed == 0 -> pending, else if completed <
status.expected -> running, else if succeeded == expected -> succeeded, else if
failed == expected -> failed, else if cancelled == expected -> cancelled, else
-> partial) while returning BatchStatus as before.
In `@packages/stem/lib/src/core/contracts.dart`:
- Around line 1026-1036: _parseFailureMode currently only handles string inputs;
update it to accept RateLimiterFailureMode instances as well by checking if
value is a RateLimiterFailureMode and returning it directly, otherwise convert
strings via toString/trim/lowercase and compare against
RateLimiterFailureMode.values (keep the existing loop). Also fix the null check
order so you test for null before calling toString to avoid unnecessary toString
calls on null; reference the _parseFailureMode function and the
RateLimiterFailureMode enum.
In `@packages/stem/lib/src/core/queue_events.dart`:
- Around line 219-234: The _onDelivery handler can call _events.addError after
the stream is closed; add a top-level guard using a boolean flag (e.g., _closed)
to immediately return if the controller is closed, and also check that flag
before calling _events.add or _events.addError so neither is invoked when
closed; update the controller close path to set _closed = true; keep the
existing _eventFromEnvelope(...) logic and the broker.ack(...) finally block
unchanged.
- Around line 128-136: Validate the trimmed queue string inside the QueueEvents
constructor (not only in start()) and fail-fast when it's empty: in the
QueueEvents constructor (the initializer that sets queue = queue.trim()), check
if the resulting queue is empty and throw an ArgumentError (or use assert) with
a clear message so callers constructing QueueEvents(broker: ..., queue: ' ')
get immediate feedback; leave start() validation unchanged or remove its
redundant check after adding this constructor-level validation.
In `@packages/stem/lib/src/worker/worker.dart`:
- Around line 3729-3745: The _extractQueueTargets function currently calls
toString() on items from payload['queues'], which can produce unintended names
like "null" or "Instance of ..."; update the iteration to only accept String
entries (e.g., check value is String), trim and add non-empty strings, and skip
any non-string values so only explicit string queue identifiers are included;
keep the existing handling of payload['queue'] as a nullable String and the
final sort/return behavior.
In `@packages/stem/test/unit/canvas/canvas_test.dart`:
- Around line 126-142: Extract a generic polling helper (e.g., _pollUntil<T>)
and replace both _waitForBatchTerminal and _waitForSuccess to call it: the
helper should accept a polling closure that returns Future<T?>, a predicate (T?
-> bool) to decide terminal/success, a timeout Duration, and an interval
Duration (currently 50ms); implement the same loop/timeout/delay logic once
inside _pollUntil and have _waitForBatchTerminal call
_pollUntil(canvas.inspectBatch(batchId), predicate: (s) => s != null &&
s.isTerminal, timeout: ..., interval: ...), likewise adapt _waitForSuccess to
use _pollUntil with its own predicate so the duplicated polling code is removed.
In `@packages/stem/test/unit/worker/worker_test.dart`:
- Around line 1485-1566: Replace the fixed 180ms sleep with a short polling/wait
that first ensures workerB has attempted a poll cycle before checking the task
state: subscribe to workerB.events (workerB.events) and use _waitFor to wait for
an event that indicates the worker attempted consumption (e.g., an event with
WorkerEventType.poll/heartbeat or any WorkerEvent emitted after start), then
call backend.get(taskId) and assert the state is TaskState.queued; if your
codebase lacks a poll/heartbeat event, instead loop with _waitFor polling
backend.get until either the worker has emitted any event or a timeout, and only
then assert the queued state. Ensure you remove the fixed Future.delayed(const
Duration(milliseconds: 180)) and reference workerB.events, WorkerEventType,
backend.get, and _waitFor when making the change.
- Around line 1732-1751: The helper _waitForTaskState duplicates logic in
_waitForCallbackSuccess; replace the body of _waitForCallbackSuccess to delegate
to _waitForTaskState by calling _waitForTaskState(backend, taskId,
TaskState.succeeded, timeout: timeout) (or forwarding any named params) and
remove the duplicated polling loop; reference the functions _waitForTaskState
and _waitForCallbackSuccess to find and update the code accordingly so only the
generic polling function contains the loop and timeout handling.
- Around line 1428-1483: The test "group limiter fail-closed requeues while
limiter is unavailable" is timing-sensitive and may flake if retries are
exhausted before the fixed 220ms delay; update the TaskOptions for the
registered FunctionTaskHandler (name: 'tasks.group.failclosed') to include an
explicit maxRetries (e.g., maxRetries: 5) and replace the fixed Future.delayed
check with a polling helper that waits for TaskState.retried (use or add a
helper like _waitForTaskState(taskId, TaskState.retried, backend, timeout)) so
the assertion uses a bounded poll instead of a hard sleep and the test can
reliably observe the retried state without exhausting retries.
- Around line 1291-1378: Remove the redundant null-check assertion for the first
task result: delete the expect(await backend.get(firstId), isNotNull) line since
the subsequent expect((await backend.get(firstId))?.state, TaskState.succeeded)
already verifies the result and its non-null state; update the test 'shares
group limiter keys across task types' (references: Worker, Stem.enqueue,
backend.get, TaskState.succeeded) by removing that single redundant expect to
keep the test clearer.
---
Outside diff comments:
In @.site/docs/brokers/sqlite.md:
- Around line 74-88: The "Recommended layout" section omits the revoke store DB
file; update the docs to mention that the revoke store needs its own SQLite file
(e.g., stem_revoke.sqlite) and that it must follow the same WAL / single-writer
guidance; reference the new configureSqliteRevokeStore() snippet and add the
revoke file to the example list (./stem_revoke.sqlite # revoke store only) so
readers know to keep broker, backend, and revoke stores in separate DB files.
In @.site/docs/core-concepts/canvas.md:
- Around line 86-90: Add the new batch example to the run instructions list by
including the missing command that runs canvas_batch.dart alongside the existing
entries (canvas_chain.dart, canvas_group.dart, canvas_chord.dart) so the
"Running the examples" block also shows: dart run lib/canvas_batch.dart; update
the snippet where those three run commands are listed to include this fourth
command.
In @.site/docs/workers/worker-control.md:
- Around line 216-233: Move the lifecycle-guards paragraph and its associated
code snippet (the block referencing worker-control-lifecycle) out of the "Queue
Pause/Resume" section and place it directly under the "Shutdown Modes and
Lifecycle Guards" heading so that the sentence "Lifecycle guards can also
recycle isolates automatically:" plus the ```dart
file=...#worker-control-lifecycle``` snippet and the following sentence about
recycling appear before the "## Queue Pause/Resume" heading; ensure the snippet
marker worker-control-lifecycle and the recycle explanation remain intact and
that "Queue Pause/Resume" begins immediately after that moved content.
In `@packages/stem_adapter_tests/README.md`:
- Around line 29-47: Add the missing revoke-store contract docs: insert a call
to runRevokeStoreContractTests (using RevokeStoreContractFactory(create:
createRevokeStore)) into the Quick Start snippet between
runQueueEventsContractTests and the workflow factory setup, and add a new
Capability Flags entry for RevokeStoreContractCapabilities with the
verifyPruneExpired flag (default true) describing its purpose ("Expiry pruning
tests — verifies expired revoke records are cleaned up"); also add a short
Adapter Recipe example showing how to construct RevokeStoreContractCapabilities
with verifyPruneExpired: false to disable those pruning tests.
In
`@packages/stem_redis/test/integration/brokers/redis_broker_integration_test.dart`:
- Line 22: The two test suites share the single variable contractNamespace
causing fragile shared state; introduce a dedicated variable (e.g.,
queueContractNamespace) for the queue-events suite and keep contractNamespace
for the broker suite, then update the additionalBrokerFactory closure passed to
runQueueEventsContractTests.create to read the new queueContractNamespace
instead of the shared contractNamespace so each suite writes/reads its own
namespace (refer to contractNamespace, runBrokerContractTests.create,
runQueueEventsContractTests.create, and the additionalBrokerFactory closures to
locate the changes).
In `@packages/stem/lib/src/signals/payloads.dart`:
- Around line 661-733: The occurredAt getters on ControlCommandReceivedPayload
and ControlCommandCompletedPayload are non-stable because they call
DateTime.now() on each access; fix by capturing the timestamp once during
construction and returning that stored value. Add a final DateTime field (e.g.,
_occurredAt or occurredAtUtc) to each class, set it to DateTime.now().toUtc() in
the constructor (or accept an optional DateTime parameter defaulting to
now().toUtc()), and change the occurredAt getter to return that field instead of
calling DateTime.now() again; update the const constructors and any call sites
if you add an optional parameter.
In `@packages/stem/test/unit/worker/worker_test.dart`:
- Line 486: Two tests share the exact name "emits worker lifecycle signals on
start and shutdown", causing ambiguous test reports; locate the duplicate test
declaration that uses that string (the test(...) call in worker_test.dart) and
rename one of them to a distinct description (e.g. "emits worker lifecycle
signals on start and stop" or "emits lifecycle signals on start and shutdown for
multiple workers") so each test has a unique, descriptive name; keep the test
body unchanged, only update the test string passed to test().
ℹ️ Review info
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (62)
.site/docs/brokers/sqlite.md.site/docs/comparisons/stem-vs-bullmq.md.site/docs/core-concepts/canvas.md.site/docs/core-concepts/cli-control.md.site/docs/core-concepts/index.md.site/docs/core-concepts/persistence.md.site/docs/core-concepts/queue-events.md.site/docs/core-concepts/rate-limiting.md.site/docs/core-concepts/signals.md.site/docs/getting-started/index.md.site/docs/workers/worker-control.md.site/sidebars.tspackages/stem/example/docs_snippets/lib/canvas_batch.dartpackages/stem/example/docs_snippets/lib/cli_control.dartpackages/stem/example/docs_snippets/lib/persistence.dartpackages/stem/example/docs_snippets/lib/queue_events.dartpackages/stem/example/docs_snippets/lib/rate_limiting.dartpackages/stem/example/docs_snippets/lib/signals.dartpackages/stem/lib/src/canvas/canvas.dartpackages/stem/lib/src/core/contracts.dartpackages/stem/lib/src/core/queue_events.dartpackages/stem/lib/src/core/stem_event.dartpackages/stem/lib/src/signals/emitter.dartpackages/stem/lib/src/signals/payloads.dartpackages/stem/lib/src/signals/signal.dartpackages/stem/lib/src/signals/stem_signals.dartpackages/stem/lib/src/worker/worker.dartpackages/stem/lib/src/workflow/runtime/workflow_introspection.dartpackages/stem/lib/stem.dartpackages/stem/test/unit/canvas/canvas_test.dartpackages/stem/test/unit/core/contracts_test.dartpackages/stem/test/unit/core/queue_events_test.dartpackages/stem/test/unit/core/stem_event_test.dartpackages/stem/test/unit/signals/signal_test.dartpackages/stem/test/unit/signals/stem_signals_test.dartpackages/stem/test/unit/worker/worker_test.dartpackages/stem_adapter_tests/README.mdpackages/stem_adapter_tests/lib/src/contract_capabilities.dartpackages/stem_adapter_tests/lib/src/queue_events_contract_suite.dartpackages/stem_adapter_tests/lib/src/revoke_store_contract_suite.dartpackages/stem_adapter_tests/lib/stem_adapter_tests.dartpackages/stem_adapter_tests/test/contract_suite_exports_test.dartpackages/stem_adapter_tests/test/queue_events_contract_suite_test.dartpackages/stem_adapter_tests/test/revoke_store_contract_suite_test.dartpackages/stem_cli/lib/src/cli/revoke_store_factory.dartpackages/stem_cli/lib/src/cli/worker.dartpackages/stem_cli/test/unit/cli/cli_worker_stats_test.dartpackages/stem_cli/test/unit/cli/revoke_store_factory_test.dartpackages/stem_postgres/test/integration/brokers/postgres_broker_integration_test.dartpackages/stem_redis/test/integration/brokers/redis_broker_integration_test.dartpackages/stem_sqlite/lib/orm_registry.g.dartpackages/stem_sqlite/lib/src/control/sqlite_revoke_store.dartpackages/stem_sqlite/lib/src/database/migrations.dartpackages/stem_sqlite/lib/src/database/migrations/m_20260224103000_add_revoke_store.dartpackages/stem_sqlite/lib/src/models/models.dartpackages/stem_sqlite/lib/src/models/stem_revoke_entry.dartpackages/stem_sqlite/lib/src/models/stem_revoke_entry.orm.dartpackages/stem_sqlite/lib/src/stack/sqlite_adapter.dartpackages/stem_sqlite/lib/src/workflow/sqlite_factories.dartpackages/stem_sqlite/lib/stem_sqlite.dartpackages/stem_sqlite/test/broker/sqlite_broker_test.dartpackages/stem_sqlite/test/control/sqlite_revoke_store_test.dart
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
🔇 Additional comments (82)
packages/stem/test/unit/worker/worker_test.dart (3)
1380-1426: LGTM!Clean test of the fail-open behavior: the limiter always throws, and the test correctly verifies the task still succeeds. Resource cleanup is appropriate.
1753-1791: LGTM — clean control-command helper.The
finallyblock ensures the subscription is always cancelled, and theCompleter.isCompletedguard prevents double-completion. Well structured.
1793-1821: LGTM — test doubles are minimal and fit for purpose.
_ScenarioRateLimitercorrectly tracks per-key attempt counts and exposes the keys list for assertions._FixedRetryStrategyis a clean constant-delay implementation..site/docs/core-concepts/rate-limiting.md (2)
12-14: Nice intro for group-scoped rate limits.
Clear and concise context-setting.
131-131: Good syntax parity callout.
This helps reduce ambiguity for readers.packages/stem/lib/src/core/contracts.dart (6)
855-859: Constructor defaults for group limits look good.
Clear defaults (tenant,failOpen) are helpful.
879-894: Parsing logic is solid with safe fallback.
The default tofailOpenis sensible.
918-929: Field docs are clear and aligned with behavior.
No issues here.
954-977: copyWith covers the new group-rate options.
Good parity with the constructor.
987-997: Serialization includes all new fields.
Good to persist the failure mode by name.
2062-2069: Enum addition is clean and scoped.
Good fit for the new failure mode behavior.packages/stem/example/docs_snippets/lib/rate_limiting.dart (1)
73-93: GroupRateLimitedTask snippet looks solid.
It showcases the new options cleanly.packages/stem/test/unit/core/contracts_test.dart (1)
127-157: Good coverage for new TaskOptions fields.
The enum mapping assertion is especially useful.packages/stem/lib/src/core/stem_event.dart (1)
1-11: LGTM — clean minimal interface contract.
abstract interface classis the correct Dart 3 idiom for a pure interface, and the three members cover the canonical event shape (name, timestamp, attributes).packages/stem/example/docs_snippets/lib/cli_control.dart (1)
38-44: LGTM — follows established snippet pattern.Region markers, constant naming, and CLI syntax are consistent with existing entries. The
--queue defaultargument correctly satisfies_WorkerQueueControlCommand's mandatory queue validation.packages/stem/example/docs_snippets/lib/persistence.dart (1)
149-169: LGTM — cleanup order and wiring are correct.Resource teardown follows the right dependency order, and the region tag matches the documentation reference.
packages/stem_sqlite/lib/src/database/migrations.dart (1)
41-47: LGTM — migration entry is correctly registered.Timestamp
DateTime.utc(2026, 2, 24, 10, 30)matches the filename prefix, the string ID is consistent, and chronological order in_entriesis maintained..site/docs/getting-started/index.md (1)
21-21: LGTM!Good placement — the comparison link is added after the onboarding steps as supplementary reference material, consistent with its sidebar placement under "Guides."
.site/sidebars.ts (1)
32-32: LGTM!Both new sidebar entries are logically placed — the comparison page leads the Guides section, and queue-events follows signals in Core Concepts.
Also applies to: 56-56
.site/docs/core-concepts/cli-control.md (1)
23-24: LGTM!The pause/resume additions are consistently reflected across the command listing, semantics description, snippet references, CLI examples, and the backend requirements table. The revoke-store scheme list update is also accurate.
Also applies to: 39-41, 92-98, 167-168, 238-238, 246-247
.site/docs/core-concepts/canvas.md (1)
34-45: Batches section looks good.The API description is clear and the lifecycle states are comprehensive.
.site/docs/workers/worker-control.md (2)
29-30: LGTM!Pause/resume rows in the CLI overview table are consistent with the existing format.
251-255: LGTM!Good addition of a concrete SQLite revoke-store example for local/single-node deployments.
packages/stem/example/docs_snippets/lib/signals.dart (3)
53-76: LGTM!The worker-scoped signal subscriptions clearly demonstrate the new filtering capabilities (
taskName,workerId,commandType), and the payload access patterns are consistent with the StemEvent-based model.
78-89: LGTM!Good defensive null check on
context.eventbefore accessing event properties.
47-49: LGTM!The migration from
.workerReady.connect(...)→.onWorkerReady(...)and.signalHookups.connect(...)→.onControlCommandCompleted(...)is consistent with the new convenience API surface introduced in this PR.Also applies to: 101-103
packages/stem_adapter_tests/lib/src/contract_capabilities.dart (1)
84-105: LGTM!Both new capability classes follow the existing pattern precisely—
constconstructors, named bool flags with sensible defaults, and proper doc comments. Clean additions.packages/stem_adapter_tests/lib/src/revoke_store_contract_suite.dart (2)
1-49: Well-structured contract test scaffold.Settings, factory, and lifecycle management follow the established patterns from other contract suites. The nullable
storewith null-check before dispose is clean.
51-112: Good coverage of upsert semantics and version conflict resolution.The version-ordering test (line 79) and the stale-version-rejection test (lines 104–111) together verify the core idempotent upsert contract. Clean assertions.
packages/stem/lib/src/workflow/runtime/workflow_introspection.dart (1)
59-75: Clean StemEvent conformance.The
eventNamederivation fromtype.name, delegation ofoccurredAttotimestamp, and conditional attribute inclusion are all idiomatic. Thetypeandtimestampfields are properly surfaced through the StemEvent getters rather than duplicated inattributes.packages/stem/test/unit/core/stem_event_test.dart (1)
4-45: Good StemEvent conformance tests.Each test verifies the three StemEvent properties (
eventName,occurredAt,attributes) for a distinct event type. The assertions are focused and sufficient for contract validation.packages/stem/test/unit/canvas/canvas_test.dart (1)
83-104: Thorough batch lifecycle test.Good coverage: verifies the submission response shape (
batchId,taskIds), then polls for terminal state and asserts the full lifecycle summary (state, counts, absence of failures, metadata flag)..site/docs/core-concepts/signals.md (1)
114-120: Good enumeration of worker-scoped helpers.Listing all convenience helpers that support worker-scoped filtering provides a clear reference. This aligns well with the new tab example above.
packages/stem/lib/src/core/queue_events.dart (2)
82-122: Well-implemented producer with proper input validation.The
emitmethod correctly trims and validates bothqueueandeventName, constructs the envelope with the sentinel name and structured args, publishes via broadcast routing, and returns the stable envelope ID. Clean design.
237-269: Robust envelope parsing with sensible fallbacks.The function gracefully handles missing
queue(falls back toenvelope.queue), missingemittedAt(falls back toenvelope.enqueuedAt), and non-map payloads (defaults to empty map). The sentinel name check at line 238 properly filters non-event envelopes.One note: the
.cast<String, Object?>()on line 257 creates a lazy wrapper that defers type checking to access time. If an envelope from a misbehaving producer contains non-Stringkeys, the error surfaces downstream rather than here. Since the sentinel name guards against third-party envelopes this is low-risk, but aMap.fromwith explicit casting would fail fast if preferred.packages/stem_sqlite/lib/src/models/stem_revoke_entry.dart (1)
6-7: The composite primary key is correctly resolved through@OrmField(columnName:)mappings. The generated ORM file marks bothnamespaceandtaskIdfields as primary keys and uses the correct column names in all operations:'namespace'and'task_id'in encoding, decoding, and DTO mappings..site/docs/core-concepts/index.md (1)
20-22: LGTM!New feature highlights and navigation entry for Queue Events are consistent with the new API surface introduced in this PR.
packages/stem/example/docs_snippets/lib/queue_events.dart (2)
9-34: LGTM!Clean producer/listener snippet with proper lifecycle management (start → subscribe → emit → cancel → close). Region markers align with the doc page references.
38-68: LGTM!Fan-out snippet correctly demonstrates two listeners on the same queue, each receiving emitted events independently.
packages/stem/test/unit/core/queue_events_test.dart (2)
22-53: LGTM!Good pattern: subscribing to the
.firstfuture before emitting avoids race conditions. Assertions cover all event fields comprehensively.
119-134: Theemittest uses an incorrect matcher for async validation throws.The
expect(() => producer.emit('', 'evt'), throwsA(...))pattern only catches synchronous throws. However,emitis declaredasyncwith validation that occurs inside the async body (afterqueue.trim()). Errors thrown inside an async function are captured in the returnedFutureand won't be caught by this matcher — you should useexpectLater(producer.emit('', 'evt'), throwsA(...))instead.The
listener.on()test is correct becauseon()is synchronous and throws directly before returning the Stream.packages/stem_adapter_tests/lib/src/queue_events_contract_suite.dart (2)
50-68: LGTM!Clean contract test harness with configurable settings, proper broker lifecycle via setUp/tearDown, and null-safe disposal.
190-241: LGTM!Good design: the fan-out test supports an optional secondary broker for cross-broker verification, uses
identicalto avoid double-disposing the same instance, and gates on theverifyFanoutcapability flag.packages/stem_sqlite/lib/src/models/stem_revoke_entry.orm.dart (1)
1-4: Generated ORM code — no manual modifications needed.This is auto-generated code. The field definitions, composite primary key (namespace + taskId), codec, DTOs, and tracked model class are all consistent with the StemRevokeEntry model.
packages/stem_sqlite/lib/src/models/models.dart (1)
5-5: LGTM!New export for
stem_revoke_entry.dartis correctly placed in alphabetical order.packages/stem_sqlite/lib/orm_registry.g.dart (1)
15-39: LGTM!Generated registry correctly inserts
StemRevokeEntryat index 4 and re-indexes all subsequent type aliases consistently in bothbuildOrmRegistryandregisterGeneratedModels.packages/stem_adapter_tests/test/contract_suite_exports_test.dart (1)
9-10: Export checks look good.The new contract runners are validated consistently with existing exports.
packages/stem_sqlite/lib/stem_sqlite.dart (1)
5-17: Exports align with new revoke-store support.Public surface now exposes the SQLite revoke store and factory as expected.
packages/stem_sqlite/lib/src/workflow/sqlite_factories.dart (1)
6-79: Factory wiring is consistent and safe.The new revoke-store factory mirrors existing factories and closes the store cleanly.
packages/stem_cli/lib/src/cli/revoke_store_factory.dart (1)
7-49: SQLite scheme handling looks correct.The new branch follows the same resolution style as Redis/Postgres.
packages/stem_sqlite/lib/src/stack/sqlite_adapter.dart (1)
84-87: Adapter now exposes revoke-store factory as expected.This completes the SQLite adapter contract surface.
packages/stem_cli/test/unit/cli/revoke_store_factory_test.dart (1)
25-61: Good coverage for SQLite revoke-store resolution.The tests validate both explicit and fallback paths with cleanup.
packages/stem_sqlite/test/broker/sqlite_broker_test.dart (1)
55-73: Queue-events contract coverage looks solid.Factory and disposal mirror existing broker contract tests.
.site/docs/core-concepts/persistence.md (1)
77-89: Docs update matches new revoke-store backends.The added snippets and wording align with the new SQLite/Postgres support.
packages/stem/lib/stem.dart (1)
95-98: LGTM — new public exports are correctly placed.
queue_events.dartandstem_event.dartare inserted in alphabetical order within the export list and match the new core modules introduced in this PR.packages/stem_adapter_tests/lib/stem_adapter_tests.dart (1)
4-6: LGTM — contract suite exports are in the correct alphabetical position.packages/stem_adapter_tests/test/revoke_store_contract_suite_test.dart (1)
1-12: LGTM — minimal, correct in-memory wiring for the revoke store contract suite.packages/stem_postgres/test/integration/brokers/postgres_broker_integration_test.dart (1)
59-79: LGTM —additionalDisposecorrectly mirrorsdispose, ensuring both Postgres broker connections are explicitly closed.packages/stem/test/unit/signals/signal_test.dart (1)
88-146: LGTM —_TestEventis a clean, minimalStemEventtest double; new test cases exercise payload identity and listener tracking correctly.packages/stem_adapter_tests/test/queue_events_contract_suite_test.dart (1)
5-14: No action needed—state isolation is properly handled by design.The shared namespace across
createandadditionalBrokerFactoryis intentional and documented.InMemoryBrokeruses reference counting to manage the broadcast hub lifecycle: each broker increments the ref count on creation and decrements on disposal. When the reference count reaches zero (after both brokers dispose), the broadcast hub is removed from the static registry, ensuring the next test gets a fresh instance. Queue state itself is instance-level (not static), so there is no cross-broker queue bleeding. ThesetUp/tearDownpattern in the contract suite ensures proper cleanup between tests.Likely an incorrect or invalid review comment.
packages/stem/test/unit/signals/stem_signals_test.dart (3)
54-90: Worker/task scoping coverage looks solid.
The added workerId filter plus the extra failure emission validate that only the targeted worker/task is captured.
93-187: Worker-scoped lifecycle/control command tests are thorough.
Covers init/stop/shutdown plus command received/completed with worker filtering.
213-260: Canonical signal-name assertions are clear.
Validates eventName mapping for worker and workflow signals.packages/stem/lib/src/worker/worker.dart (10)
589-638: Queue pause state is wired cleanly into worker startup.
The new pause tracking plus subscription refresh fits the control-plane design.
778-867: Pause short-circuit + group rate-limit flow is well structured.
Fail-open metrics/logging and retry behavior read cleanly.
2292-2338: Retry helper centralizes paused/rate-limited requeue behavior.
Nice consolidation of broker nack/publish + status metadata.
2460-2549: Subscription refresh/cleanup logic looks consistent.
Cancelling prior subscriptions before resubscribing to active queues is clean.
3143-3223: Queue pause sync/prune helpers look good.
The revoke-store integration and expiry pruning are cohesive.
3747-3815: Pause/resume command handling is cohesive.
Persists state, prunes expired entries, and refreshes subscriptions in one flow.
3934-3966: Control-plane queue pause/resume/status wiring looks good.
Switch cases integrate cleanly with the control command handler.
4048-4103: Paused queue visibility in stats/inspect is helpful.
Nice addition for operational introspection.
4299-4357: WorkerEvent StemEvent conformance is solid.
Attributes mapping is clear and consistent with the new event contract.
2276-2286: No guard needed —groupRateKeyHeaderis non-nullable with a default value.The field is declared as
final String groupRateKeyHeader(notString?) and has a default value of'tenant'in the TaskOptions constructor. Even when deserializing from JSON, null is coalesced to'tenant'. Calling.trim()on this field is always safe.Likely an incorrect or invalid review comment.
packages/stem/lib/src/signals/emitter.dart (3)
160-216: Worker lifecycle signal naming is now explicit.
Payloads now carry canonical signal names consistently.
232-255: Child lifecycle signal naming is correct.
Signal selection and payload wiring look clean.
340-399: Workflow signal naming viawithSignalNameis clean.
Keeps payloads consistent across workflow lifecycle emissions.packages/stem_sqlite/lib/src/control/sqlite_revoke_store.dart (2)
9-37: Overall implementation aligns with the RevokeStore contract.
Factories plus transactional list/prune/upsert usage look solid.Also applies to: 71-167
47-67: The code is valid as written. Dart 3.0+ does not requirebreakstatements in non-emptycaseclauses—each case body automatically exits theswitchupon completion. The project targets Dart SDK >=3.9.2, which fully supports this behavior. No changes are needed.Likely an incorrect or invalid review comment.
packages/stem/lib/src/signals/signal.dart (1)
3-188: StemEvent-typed signals and context propagation look good.
The new bounds and context event wiring align with the shared event contract.packages/stem/lib/src/signals/payloads.dart (1)
43-659: StemEvent adoption across payloads is consistent.
Event naming and attribute mappings are clear for task/worker/workflow/schedule payloads.packages/stem/lib/src/signals/stem_signals.dart (2)
35-372: Signal registry expansion looks consistent.
Workflow run signals are now part of the canonical list and configuration flow.
418-741: Merged filters and worker/command scoping are well integrated.
WorkerId and commandType filtering read cleanly across task and control signals.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/stem_redis/lib/src/brokers/redis_broker.dart`:
- Around line 574-576: The condition checking both queue == null and
streamKeys.isEmpty is redundant because streamKeys is derived from queue via
_priorityStreamKeys and that helper never returns an empty list; update the
guard in the method containing the variables queue and streamKeys to only check
if queue == null and return early (i.e., remove the streamKeys.isEmpty clause),
keeping the rest of the method logic unchanged.
ℹ️ Review info
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
packages/stem_postgres/lib/src/brokers/postgres_broker.dartpackages/stem_redis/lib/src/brokers/redis_broker.dart
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: test
🔇 Additional comments (6)
packages/stem_redis/lib/src/brokers/redis_broker.dart (1)
488-502: LGTM — clean broadcast-only path.The nullable queue derivation, validation, sentinel group, and empty stream keys are consistent. The guard at line 493 properly enforces that at least one of queue or broadcast channel is present.
packages/stem_postgres/lib/src/brokers/postgres_broker.dart (5)
231-242: LGTM — broadcast-only support is consistent with the Redis broker.Nullable queue derivation, validation, and sentinel locker key follow the same pattern as the Redis broker. The
__broadcast__sentinel in the locker is inert in broadcast-only mode since_claimNextJobis never called.
248-261: LGTM — clean logging for broadcast-only cancel path.The
queueLabelfallback to<broadcast-only>provides clear observability in logs when no queue is set.
812-812: Correct use ofqueue!after null-guard — Dart field promotion caveat.Since
queueis a class field (final String? queue), Dart does not promote it even after a null check. The explicitqueue!at line 850 inside theif (queue != null)block is the correct pattern here. The logic properly skips job claiming entirely in broadcast-only mode while still processing broadcast reservations below.Also applies to: 848-854
821-838: LGTM — start/stop logging updated consistently.Both
start()andstop()use the samequeueLabelfallback pattern for observability.
878-889: LGTM — error-path logging properly handles nullable queue.The catch block correctly derives
queueLabelfor the error log context.
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
packages/stem/lib/src/signals/payloads.dart (4)
630-664:⚠️ Potential issue | 🟡 Minor
ScheduleEntryFailedPayload.occurredAtreports the scheduled time, not the failure time.`@override` DateTime get occurredAt => scheduledFor.toUtc(); // ← scheduled time, not when it failedA schedule entry can fail well after
scheduledFor(e.g., if execution is delayed).ScheduleEntryDispatchedPayloadcorrectly usesexecutedAt—ScheduleEntryFailedPayloadshould do the same. Adding afailedAt(orexecutedAt) parameter is the minimal fix:🐛 Proposed fix
class ScheduleEntryFailedPayload implements StemEvent { const ScheduleEntryFailedPayload({ required this.entry, required this.scheduledFor, + required this.failedAt, required this.error, required this.stackTrace, }); final ScheduleEntry entry; final DateTime scheduledFor; + final DateTime failedAt; final Object error; final StackTrace stackTrace; `@override` - DateTime get occurredAt => scheduledFor.toUtc(); + DateTime get occurredAt => failedAt.toUtc(); `@override` Map<String, Object?> get attributes => { 'entryId': entry.id, 'scheduledFor': scheduledFor.toUtc().toIso8601String(), + 'failedAt': failedAt.toUtc().toIso8601String(), 'error': error.toString(), 'stackTrace': stackTrace.toString(), }; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/stem/lib/src/signals/payloads.dart` around lines 630 - 664, The occurredAt getter on ScheduleEntryFailedPayload currently returns scheduledFor instead of the actual failure time; update ScheduleEntryFailedPayload by adding a new DateTime field/constructor param (name it failedAt or executedAt to match ScheduleEntryDispatchedPayload), set the occurredAt getter to return failedAt.toUtc(), and include failedAt in the attributes map (e.g., 'failedAt': failedAt.toUtc().toIso8601String()); ensure constructors and references use the new symbol (failedAt/executedAt) and keep scheduledFor as-is for the originally scheduled time.
233-284:⚠️ Potential issue | 🟡 Minor
nextRetryAtis not UTC-normalized at construction, creating a field/attribute divergence.
emittedAtis stored as UTC ((emittedAt ?? DateTime.now()).toUtc()), butnextRetryAtis stored as passed. Theattributesgetter then callsnextRetryAt.toUtc()for serialization. If a local-timezoneDateTimeis passed,payload.nextRetryAtandDateTime.parse(payload.attributes['nextRetryAt']! as String)represent the same instant but in different timezone representations, which can surprise callers doing direct field comparisons or round-trip deserialisation.🐛 Proposed fix
TaskRetryPayload({ required this.envelope, required this.worker, required this.reason, - required this.nextRetryAt, + required DateTime nextRetryAt, DateTime? emittedAt, -}) : emittedAt = (emittedAt ?? DateTime.now()).toUtc(); +}) : nextRetryAt = nextRetryAt.toUtc(), + emittedAt = (emittedAt ?? DateTime.now()).toUtc();With
nextRetryAtalready UTC, the explicit.toUtc()call inattributes(line 282) becomes a no-op and can be removed for consistency withemittedAt.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/stem/lib/src/signals/payloads.dart` around lines 233 - 284, TaskRetryPayload stores emittedAt as UTC but leaves nextRetryAt as-is, causing field vs serialized-attribute divergence; normalize nextRetryAt to UTC at construction (in TaskRetryPayload constructor) similar to emittedAt, and then update the attributes getter to use the now-UTC nextRetryAt (you can remove the explicit .toUtc() there) so payload.nextRetryAt and attributes['nextRetryAt'] are consistent.
509-566: 🧹 Nitpick | 🔵 Trivial
eventNamefallback mapsrunning→"workflow-run-running", diverging from the canonical"workflow-run-started".String get eventName => signalName ?? 'workflow-run-${status.name}'; // WorkflowRunStatus.running.name == 'running' // → 'workflow-run-running' ≠ StemSignals.workflowRunStartedName ('workflow-run-started')The emitter always passes a
signalName, so this fallback is unreachable in normal flow. However, any code that constructs aWorkflowRunPayload(status: WorkflowRunStatus.running)directly — e.g., in tests or custom event sources — will observeeventName == 'workflow-run-running', which won't match the canonical constant. Consider either renaming the enum member tostarted, or hardcoding the fallback to the canonical strings:♻️ Option A — explicit fallback map
+ static const _statusEventName = { + WorkflowRunStatus.running: 'workflow-run-started', + WorkflowRunStatus.suspended: 'workflow-run-suspended', + WorkflowRunStatus.completed: 'workflow-run-completed', + WorkflowRunStatus.failed: 'workflow-run-failed', + WorkflowRunStatus.cancelled: 'workflow-run-cancelled', + }; + `@override` - String get eventName => signalName ?? 'workflow-run-${status.name}'; + String get eventName => signalName ?? _statusEventName[status]!;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/stem/lib/src/signals/payloads.dart` around lines 509 - 566, The eventName getter on WorkflowRunPayload produces non-canonical names like "workflow-run-running" when signalName is null because it uses status.name; update WorkflowRunPayload.eventName to map WorkflowRunStatus values to the canonical signal strings (e.g., map WorkflowRunStatus.running -> StemSignals.workflowRunStartedName) instead of using status.name, so any direct constructions of WorkflowRunPayload produce the correct canonical eventName; keep withSignalName unchanged so explicit signalName still overrides the fallback.
667-695:⚠️ Potential issue | 🟠 Major
occurredAtis a drifting clock, not the event timestamp — fix by storing at construction time.Both
ControlCommandReceivedPayload(line 684) andControlCommandCompletedPayload(line 727) implementoccurredAtas:DateTime get occurredAt => DateTime.now().toUtc();Because these are
constconstructors, no stable timestamp is captured. Every access tooccurredAtreturns a fresh wall-clock value, so a consumer that reads the field twice (log then serialize, dedup comparison, replay) gets different timestamps — breaking theStemEventcontract. Follow the same pattern asWorkerLifecyclePayload:🐛 Proposed fix for both payloads
-class ControlCommandReceivedPayload implements StemEvent { - const ControlCommandReceivedPayload({ +class ControlCommandReceivedPayload implements StemEvent { + ControlCommandReceivedPayload({ required this.worker, required this.command, + DateTime? occurredAt, - }); + }) : _occurredAt = (occurredAt ?? DateTime.now()).toUtc(); final WorkerInfo worker; final ControlCommandMessage command; + final DateTime _occurredAt; `@override` String get eventName => 'control-command-received'; `@override` - DateTime get occurredAt => DateTime.now().toUtc(); + DateTime get occurredAt => _occurredAt; // ... attributes unchanged }Apply the identical change to
ControlCommandCompletedPayload.Also applies to: 698-738
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/stem/lib/src/signals/payloads.dart` around lines 667 - 695, ControlCommandReceivedPayload (and likewise ControlCommandCompletedPayload) currently implements occurredAt as a getter that returns DateTime.now().toUtc(), producing a drifting clock; change this to a stored final DateTime field set at construction time. Replace the getter override with a final DateTime occurredAt field, update the constructor (remove const or accept an optional DateTime occurredAt parameter) and initialize it like: this.occurredAt = occurredAt ?? DateTime.now().toUtc(); so the timestamp is captured once at construction; apply the identical change to ControlCommandCompletedPayload and remove the dynamic getter.packages/stem/test/unit/signals/payloads_test.dart (1)
53-67: 🧹 Nitpick | 🔵 TrivialLGTM – assertions are correct; consider extending attribute coverage.
The new
emittedAt/occurredAtassertions are accurate against the implementation. Optionally, coveringretry.eventName('task-retry') andretry.attributes['emittedAt'](the other new attribute key added in this PR) would complete the round-trip validation for theStemEventsurface.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/stem/test/unit/signals/payloads_test.dart` around lines 53 - 67, Add assertions to fully validate the StemEvent surface for TaskRetryPayload: verify retry.eventName equals 'task-retry' and assert retry.attributes['emittedAt'] equals DateTime.utc(2024).toIso8601String(); locate the TaskRetryPayload test where the existing retry instance is created (symbol: TaskRetryPayload and variable retry) and add these two expectations alongside the existing assertions for taskId/attempt/occurredAt/nextRetryAt.
♻️ Duplicate comments (2)
packages/stem_cli/test/unit/cli/cli_worker_stats_test.dart (2)
132-177:⚠️ Potential issue | 🟡 MinorAssert pause/resume stderr buffers are empty.
pauseErrandresumeErrare never checked, so unexpected CLI errors can slip through.✅ Proposed fix
expect(pauseCode, equals(0)); + expect(pauseErr.toString().trim(), isEmpty, + reason: 'pause command produced unexpected stderr output'); final stem = Stem(broker: broker, registry: registry, backend: backend); final taskId = await stem.enqueue('tasks.blocking'); await _assertTaskRemainsQueued(backend, taskId); expect(started.isCompleted, isFalse); @@ expect(resumeCode, equals(0)); + expect(resumeErr.toString().trim(), isEmpty, + reason: 'resume command produced unexpected stderr output');🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/stem_cli/test/unit/cli/cli_worker_stats_test.dart` around lines 132 - 177, Add assertions to verify the CLI produced no stderr output by checking the pauseErr and resumeErr buffers after the runStemCli calls: after obtaining pauseCode and resumeCode, assert pauseErr.toString().isEmpty (or equals('')) and resumeErr.toString().isEmpty to ensure no unexpected CLI errors; reference the pauseErr and resumeErr StringBuffer variables and the corresponding runStemCli invocations used to pause/resume the worker.
478-490: 🧹 Nitpick | 🔵 TrivialTighten timeout semantics in
_waitFor.The current loop checks the deadline after evaluating the predicate, which can overshoot the timeout by one predicate duration and poll interval. Consider checking the deadline before each predicate call.
♻️ Suggested adjustment
final deadline = DateTime.now().add(timeout); - while (!await predicate()) { - if (DateTime.now().isAfter(deadline)) { - throw TimeoutException('Condition not met within $timeout'); - } - await Future<void>.delayed(pollInterval); - } + while (true) { + if (DateTime.now().isAfter(deadline)) { + throw TimeoutException('Condition not met within $timeout'); + } + if (await predicate()) return; + await Future<void>.delayed(pollInterval); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/stem_cli/test/unit/cli/cli_worker_stats_test.dart` around lines 478 - 490, The loop in _waitFor can overshoot timeout because it evaluates predicate before checking deadline; modify _waitFor to check the deadline before calling await predicate() (and throw TimeoutException if passed), and also check the deadline immediately after the delayed pollInterval before the next iteration to avoid an extra sleep/predicate call past the timeout; keep the existing timeout, pollInterval, and exception message, and update the loop in the _waitFor function accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/stem_redis/lib/src/brokers/redis_broker.dart`:
- Around line 498-502: The code currently discards a caller-supplied
consumerGroup in broadcast-only mode; fix by threading consumerGroup into
_listenBroadcast and using it there instead of always deriving via
_broadcastGroupKey. Update loop() to compute group (retain consumerGroup if
non-null) and pass it into _listenBroadcast (add a consumerGroup parameter to
_listenBroadcast), have _listenBroadcast use group = consumerGroup ??
_broadcastGroupKey(channel, consumer) for XREADGROUP, and ensure any
claim/scheduler paths that expected a group use this passed-in group; remove the
unused '__broadcast__' literal or keep it only as a default inside
_listenBroadcast.
In `@packages/stem/lib/src/canvas/canvas.dart`:
- Around line 475-480: inspectBatch currently converts any group returned by
backend.getGroup into a BatchStatus; change it to first detect whether the
returned group represents a batch created by submitBatch (e.g. check a
type/role/metadata field on the returned group object from getGroup such as a
"type" or specific batch marker) and if that marker is missing return null (or
throw) instead of calling _buildBatchStatus; place this guard in inspectBatch
immediately after retrieving status and before calling _buildBatchStatus so only
genuine batch groups are converted.
- Around line 414-473: submitBatch currently races: two callers can both see
backend.getGroup null and both proceed to initGroup and dispatch; make the
operation idempotent by ensuring dispatch only occurs if initGroup actually
created the group. Change submitBatch to attempt backend.initGroup and then
check its result (or catch a “already exists” error) — using backend.getGroup
after initGroup if needed — and if the group exists and has meta['stem.batch']
== true return BatchSubmission using _batchTaskIdsFromGroup instead of
dispatching; only call group(...) and dispatch when initGroup confirms the group
was newly created. Ensure you handle initGroup failure that signals pre-existing
group and reuse GroupDescriptor/meta to build the return BatchSubmission.
In `@packages/stem/lib/src/signals/payloads.dart`:
- Around line 119-135: Several execution-phase payloads (TaskReceivedPayload,
TaskPrerunPayload, TaskPostrunPayload, TaskSuccessPayload, TaskFailurePayload,
TaskRevokedPayload) incorrectly set occurredAt to envelope.enqueuedAt; change
each payload to accept an optional DateTime? occurredAt parameter (defaulting to
DateTime.now().toUtc()), make the constructors non-const, store the value on the
instance, and have the occurredAt getter return that stored timestamp instead of
envelope.enqueuedAt.toUtc(); follow the same pattern used by
TaskRetryPayload/emittedAt for consistency so consumers get the actual runtime
event time for ordering/latency/audit.
---
Outside diff comments:
In `@packages/stem/lib/src/signals/payloads.dart`:
- Around line 630-664: The occurredAt getter on ScheduleEntryFailedPayload
currently returns scheduledFor instead of the actual failure time; update
ScheduleEntryFailedPayload by adding a new DateTime field/constructor param
(name it failedAt or executedAt to match ScheduleEntryDispatchedPayload), set
the occurredAt getter to return failedAt.toUtc(), and include failedAt in the
attributes map (e.g., 'failedAt': failedAt.toUtc().toIso8601String()); ensure
constructors and references use the new symbol (failedAt/executedAt) and keep
scheduledFor as-is for the originally scheduled time.
- Around line 233-284: TaskRetryPayload stores emittedAt as UTC but leaves
nextRetryAt as-is, causing field vs serialized-attribute divergence; normalize
nextRetryAt to UTC at construction (in TaskRetryPayload constructor) similar to
emittedAt, and then update the attributes getter to use the now-UTC nextRetryAt
(you can remove the explicit .toUtc() there) so payload.nextRetryAt and
attributes['nextRetryAt'] are consistent.
- Around line 509-566: The eventName getter on WorkflowRunPayload produces
non-canonical names like "workflow-run-running" when signalName is null because
it uses status.name; update WorkflowRunPayload.eventName to map
WorkflowRunStatus values to the canonical signal strings (e.g., map
WorkflowRunStatus.running -> StemSignals.workflowRunStartedName) instead of
using status.name, so any direct constructions of WorkflowRunPayload produce the
correct canonical eventName; keep withSignalName unchanged so explicit
signalName still overrides the fallback.
- Around line 667-695: ControlCommandReceivedPayload (and likewise
ControlCommandCompletedPayload) currently implements occurredAt as a getter that
returns DateTime.now().toUtc(), producing a drifting clock; change this to a
stored final DateTime field set at construction time. Replace the getter
override with a final DateTime occurredAt field, update the constructor (remove
const or accept an optional DateTime occurredAt parameter) and initialize it
like: this.occurredAt = occurredAt ?? DateTime.now().toUtc(); so the timestamp
is captured once at construction; apply the identical change to
ControlCommandCompletedPayload and remove the dynamic getter.
In `@packages/stem/test/unit/signals/payloads_test.dart`:
- Around line 53-67: Add assertions to fully validate the StemEvent surface for
TaskRetryPayload: verify retry.eventName equals 'task-retry' and assert
retry.attributes['emittedAt'] equals DateTime.utc(2024).toIso8601String();
locate the TaskRetryPayload test where the existing retry instance is created
(symbol: TaskRetryPayload and variable retry) and add these two expectations
alongside the existing assertions for taskId/attempt/occurredAt/nextRetryAt.
---
Duplicate comments:
In `@packages/stem_cli/test/unit/cli/cli_worker_stats_test.dart`:
- Around line 132-177: Add assertions to verify the CLI produced no stderr
output by checking the pauseErr and resumeErr buffers after the runStemCli
calls: after obtaining pauseCode and resumeCode, assert
pauseErr.toString().isEmpty (or equals('')) and resumeErr.toString().isEmpty to
ensure no unexpected CLI errors; reference the pauseErr and resumeErr
StringBuffer variables and the corresponding runStemCli invocations used to
pause/resume the worker.
- Around line 478-490: The loop in _waitFor can overshoot timeout because it
evaluates predicate before checking deadline; modify _waitFor to check the
deadline before calling await predicate() (and throw TimeoutException if
passed), and also check the deadline immediately after the delayed pollInterval
before the next iteration to avoid an extra sleep/predicate call past the
timeout; keep the existing timeout, pollInterval, and exception message, and
update the loop in the _waitFor function accordingly.
ℹ️ Review info
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (7)
packages/stem/lib/src/canvas/canvas.dartpackages/stem/lib/src/signals/emitter.dartpackages/stem/lib/src/signals/payloads.dartpackages/stem/test/unit/canvas/canvas_test.dartpackages/stem/test/unit/signals/payloads_test.dartpackages/stem_cli/test/unit/cli/cli_worker_stats_test.dartpackages/stem_redis/lib/src/brokers/redis_broker.dart
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
🔇 Additional comments (13)
packages/stem_cli/test/unit/cli/cli_worker_stats_test.dart (3)
406-412: Async predicate wait looks good.The async predicate usage keeps the event wait consistent with the new
_waitForcontract.
492-509: Helper reads well and improves pause assertions.The queued-state guard is clear and keeps the pause/resume test intent explicit.
511-526:_FastTasklooks fine.Straightforward no-op task handler for test scenarios.
packages/stem_redis/lib/src/brokers/redis_broker.dart (1)
574-576: LGTM — correct early return for broadcast-only modeThe
queue == nullguard is the right single-condition check here and cleanly handles the addressed past review note about the redundantstreamKeys.isEmptytest.packages/stem/lib/src/canvas/canvas.dart (3)
154-245: Batch lifecycle models look solid.
Clear enum states and status snapshot fields make the new batch API easy to reason about.
714-758: Batch status classification is robust.
Filtering to terminal states and using positive equality checks makes the lifecycle computation stable.
760-789: Task ID extraction is defensive and order-preserving.
Deduping and meta-first ordering aligns with the batch immutability contract.packages/stem/test/unit/canvas/canvas_test.dart (2)
82-154: Batch lifecycle tests provide good coverage.
Covers terminal status, idempotency, and non-terminal completion counting.
176-192: Polling helper is consistent with existing test patterns.
Timeout + interval behavior matches the suite’s conventions.packages/stem/lib/src/signals/emitter.dart (4)
104-115: LGTM – pre-emit timestamp capture is the correct pattern.Capturing
emittedAtbefore theemitcall ensures a stable, deterministicoccurredAton the payload. The constructor's redundant.toUtc()on an already-UTC value is a no-op.
163-220: LGTM –signalNameinjection alignseventNamewith the concrete signal for all worker lifecycle payloads.
241-257: LGTM – signal name selection and emission branch are consistent.
343-403: LGTM –withSignalNamecorrectly threads the canonical name into a timestamp-preserving copy before emit.
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
packages/dashboard/lib/src/services/models.dart (1)
1-1:⚠️ Potential issue | 🔴 CriticalFix missing import for
stemNow(compile error).
stemNow()is referenced but not included in theshowlist, so this won’t compile. Add it to the import.🛠️ Proposed fix
-import 'package:stem/stem.dart' show QueueHeartbeat, WorkerHeartbeat; +import 'package:stem/stem.dart' show QueueHeartbeat, WorkerHeartbeat, stemNow;Also applies to: 158-158
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/dashboard/lib/src/services/models.dart` at line 1, Import statement currently shows QueueHeartbeat and WorkerHeartbeat but omits stemNow, causing a compile error where stemNow() is referenced; update the import that uses "show" to include stemNow alongside QueueHeartbeat and WorkerHeartbeat (and mirror this change at the other occurrence around line 158) so stemNow is available to the code.packages/stem/lib/src/brokers/in_memory_broker.dart (1)
342-347: 🧹 Nitpick | 🔵 TrivialMisleading indentation on the cascade in
purgeDeadLetters.In Dart,
..has lower precedence than?:, so..forEach(state.deadLetters.remove)is applied to the whole ternary — not only tocandidatesin the else branch. The code is functionally correct, but the 6-space indent belowcandidatesmakes it look like the cascade only affects the else branch, which could lead to a future reader incorrectly concluding that the if-branch doesn't purge entries.♻️ Clarify intent with explicit parentheses and consistent indentation
- final toRemove = - limit != null && limit >= 0 - ? candidates.take(limit).toList() - : candidates - ..forEach(state.deadLetters.remove); + final toRemove = (limit != null && limit >= 0 + ? candidates.take(limit).toList() + : candidates) + ..forEach(state.deadLetters.remove);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/stem/lib/src/brokers/in_memory_broker.dart` around lines 342 - 347, The ternary with a cascade in purgeDeadLetters is misleading because the ..forEach(state.deadLetters.remove) applies to the whole expression; make the intent explicit by wrapping the else-side expression in parentheses (e.g., (candidates)..forEach(...)) or by applying the cascade to the selected value after the ternary, and fix indentation so it's clear the removal runs for both branches; reference the variables limit, candidates and the call to state.deadLetters.remove to locate and adjust the code.packages/stem/lib/src/signals/payloads.dart (1)
667-739:⚠️ Potential issue | 🟡 MinorFreeze control-command timestamps at creation.
occurredAtcurrently callsstemNow()in the getter, so the timestamp changes on every read. Store the timestamp when the payload is created.♻️ Proposed fix
-class ControlCommandReceivedPayload implements StemEvent { - const ControlCommandReceivedPayload({ - required this.worker, - required this.command, - }); +class ControlCommandReceivedPayload implements StemEvent { + ControlCommandReceivedPayload({ + required this.worker, + required this.command, + DateTime? occurredAt, + }) : _occurredAt = (occurredAt ?? stemNow()).toUtc(); @@ - DateTime get occurredAt => stemNow().toUtc(); + final DateTime _occurredAt; + DateTime get occurredAt => _occurredAt; } -class ControlCommandCompletedPayload implements StemEvent { - const ControlCommandCompletedPayload({ - required this.worker, - required this.command, - required this.status, - this.response, - this.error, - }); +class ControlCommandCompletedPayload implements StemEvent { + ControlCommandCompletedPayload({ + required this.worker, + required this.command, + required this.status, + this.response, + this.error, + DateTime? occurredAt, + }) : _occurredAt = (occurredAt ?? stemNow()).toUtc(); @@ - DateTime get occurredAt => stemNow().toUtc(); + final DateTime _occurredAt; + DateTime get occurredAt => _occurredAt; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/stem/lib/src/signals/payloads.dart` around lines 667 - 739, The occurredAt getters in ControlCommandReceivedPayload and ControlCommandCompletedPayload call stemNow() on each read and must be frozen at creation; add a final DateTime field (e.g., final DateTime occurredAt) to each class, initialize it in the constructor (provide a default of stemNow().toUtc() if none supplied), and change the overridden occurredAt getter to return that stored field instead of calling stemNow() directly; update constructors for ControlCommandReceivedPayload and ControlCommandCompletedPayload to accept/assign the timestamp.
♻️ Duplicate comments (10)
packages/stem/lib/src/canvas/canvas.dart (2)
476-480:⚠️ Potential issue | 🟡 MinorGuard
inspectBatchagainst non-batch groups.
Line 479 should also verify thestem.batchmarker to avoid misclassifying generic groups as batches.🔧 Suggested fix
Future<BatchStatus?> inspectBatch(String batchId) async { final status = await backend.getGroup(batchId); - if (status == null) return null; + if (status == null || status.meta['stem.batch'] != true) return null; return _buildBatchStatus(status); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/stem/lib/src/canvas/canvas.dart` around lines 476 - 480, The inspectBatch method currently treats any group returned from backend.getGroup(batchId) as a batch; update inspectBatch to verify the group has the stem.batch marker before building a BatchStatus: after retrieving status in inspectBatch, check the group's metadata/markers for the "stem.batch" marker and return null if it's absent, and only call _buildBatchStatus(status) when that marker is present (use the same status object returned from backend.getGroup for the check).
415-474:⚠️ Potential issue | 🟠 MajorMake
submitBatchidempotent under concurrent callers.
Line 428’s pre-check is not atomic; two callers with the samebatchIdcan both pass and dispatch duplicate tasks. Gate dispatch on an atomicinitGroup(unique constraint) or handle an “already exists” result and return the existing batch instead of publishing.#!/bin/bash # Inspect ResultBackend.initGroup contract and implementations for atomicity/uniqueness. rg -n "abstract class ResultBackend" --type dart -A 60 rg -n "initGroup\\(" --type dart -A 12 -B 4 rg -n "GroupDescriptor" --type dart -A 8 rg -n "already exists|unique|conflict" --type dart -i -C 2🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/stem/lib/src/canvas/canvas.dart` around lines 415 - 474, submitBatch is not atomic: the pre-check using backend.getGroup can race so two callers with the same batchId both dispatch tasks; change submitBatch to attempt a single atomic creation or handle initGroup conflicts by calling backend.initGroup and treating "already exists" as success — i.e., remove reliance on the separate pre-check or make it best-effort, call backend.initGroup(GroupDescriptor ...) first (or catch the specific error/return value it emits when the group already exists), and if initGroup reports the group exists then fetch the existing group (backend.getGroup) and return its taskIds instead of dispatching; ensure the code references the same id, GroupDescriptor, normalizedSignatures, and uses group(..., groupId: id) only after successful initGroup to guarantee idempotence.packages/stem/lib/src/core/contracts.dart (1)
1028-1037:⚠️ Potential issue | 🟡 MinorHandle
RateLimiterFailureModeinstances in_parseFailureMode.
If a map already contains the enum (e.g.,groupRateLimiterFailureMode: RateLimiterFailureMode.failClosed),toString()yieldsRateLimiterFailureMode.failClosed, so parsing falls back tofailOpenand silently changes behavior. Accept the enum directly to avoid this downgrade.🛠️ Suggested fix
static RateLimiterFailureMode? _parseFailureMode(Object? value) { + if (value is RateLimiterFailureMode) return value; final raw = value?.toString().trim().toLowerCase(); if (raw == null || raw.isEmpty) return null; for (final mode in RateLimiterFailureMode.values) { if (mode.name.toLowerCase() == raw) { return mode; } } return null; }Verification (find map literals that might pass enums directly into
TaskOptions.fromJson):#!/bin/bash # Search for uses of groupRateLimiterFailureMode in map literals or fromJson inputs. rg -n --type=dart -C3 'groupRateLimiterFailureMode' . rg -n --type=dart -C3 'TaskOptions\.fromJson' .🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/stem/lib/src/core/contracts.dart` around lines 1028 - 1037, _parsFailureMode currently only parses strings and falls back incorrectly when given a RateLimiterFailureMode instance (or a toString like "RateLimiterFailureMode.failClosed"). Update _parseFailureMode to first check if value is a RateLimiterFailureMode and return it immediately; otherwise convert value to string, strip any enum qualifier (e.g., take substring after the last '.' so "RateLimiterFailureMode.failClosed" becomes "failclosed"), trim/lowercase and match against RateLimiterFailureMode.values by comparing mode.name; return null if no match. This touches the _parseFailureMode function and references RateLimiterFailureMode.packages/stem_cli/lib/src/cli/worker.dart (2)
1099-1099:⚠️ Potential issue | 🟡 MinorExit code should reflect error replies.
Returning 0 whenever any reply exists hides worker failures and diverges from other worker commands.
🐛 Suggested fix
- return replies.isEmpty ? 70 : 0; + final hasError = replies.any((reply) => reply.status != 'ok'); + return (replies.isEmpty || hasError) ? 70 : 0;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/stem_cli/lib/src/cli/worker.dart` at line 1099, The return expression currently returns 70 when replies.isEmpty and 0 otherwise, which inverts the intended exit semantics; change the return to return 0 when there are no replies and a non‑zero error code when replies exist (flip the ternary to return replies.isEmpty ? 0 : 70) so the process exit code reflects error replies; locate the return that references "replies" (the return line currently "return replies.isEmpty ? 70 : 0;") and replace it with the corrected ternary.
1081-1094:⚠️ Potential issue | 🟡 MinorClarify the “Paused” column for resume output.
For resume operations,
pausedrepresents queues still paused after the command, not the queues resumed. Rename the header to avoid operator confusion.💡 Suggested tweak
- dependencies.out.writeln('Worker | Status | Updated | Paused'); - dependencies.out.writeln( - '--------------+--------+---------+----------------', - ); + dependencies.out.writeln( + 'Worker | Status | Updated | Paused (after)', + ); + dependencies.out.writeln( + '--------------+--------+---------+------------------', + );🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/stem_cli/lib/src/cli/worker.dart` around lines 1081 - 1094, The header "Paused" is misleading because the `paused` variable in the resume output reflects queues still paused after the command; update the column header string in the dependencies.out.writeln call (the line currently printing 'Worker | Status | Updated | Paused') to a clearer label such as 'Still Paused' or 'Paused (after)' and adjust the separator line printed below it if needed so column alignment remains correct; ensure the rest of the row output that uses `${paused}` and the `paused` variable logic remains unchanged.packages/stem/lib/src/core/queue_events.dart (2)
129-137:⚠️ Potential issue | 🟡 MinorQueue emptiness still validated lazily in
start()instead of the constructor.♻️ Fail fast at construction time
QueueEvents({ required this.broker, required String queue, String? consumerName, this.prefetch = 10, - }) : queue = queue.trim(), + }) : queue = queue.trim() { + if (this.queue.isEmpty) { + throw ArgumentError.value(queue, 'queue', 'Queue name must not be empty'); + } + } + // ignore: prefer_initializing_formals + QueueEvents._init({ + required this.broker, + required this.queue, consumerName = consumerName ?? - 'stem-queue-events-${generateEnvelopeId().replaceAll('-', '')}'; + 'stem-queue-events-${generateEnvelopeId().replaceAll('-', '')}', + required this.prefetch, });(Alternatively, use an initializer list with an
assertor a redirecting constructor.)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/stem/lib/src/core/queue_events.dart` around lines 129 - 137, The constructor for QueueEvents currently defers validating the provided queue string until start(); change it to fail fast by validating in the constructor (QueueEvents) itself — e.g., use an initializer-check or assert/throw in the constructor’s initializer list for the trimmed `queue` parameter (or a redirecting constructor) so empty/invalid queue names are rejected immediately instead of in `start()`. Reference: QueueEvents constructor, the `queue` field and `start()` method.
203-205:⚠️ Potential issue | 🟡 Minor
_events.addErrorcalled without a_closedguard in both_onDeliveryand theonErrorstream handler.
_events.addErrorthrowsStateErrorwhen the controller is already closed. Both theonErrorcallback registered instart()(line 204) and the catch block in_onDelivery(line 227) can reach_events.addErrorafterclose()has been called.🛡️ Proposed fix
Future<void> _onDelivery(Delivery delivery) async { + if (_closed) { + await broker.ack(delivery); + return; + } try { final event = _eventFromEnvelope(delivery.envelope); if (event != null && event.queue == queue) { - _events.add(event); + if (!_closed) _events.add(event); } } on Object catch (error, stackTrace) { - _events.addError(error, stackTrace); + if (!_closed) _events.addError(error, stackTrace); } finally {And guard the stream
onErrorhandler instart():.listen( _onDelivery, onError: (Object error, StackTrace stackTrace) { - _events.addError(error, stackTrace); + if (!_closed) _events.addError(error, stackTrace); }, );Also applies to: 220-234
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/stem/lib/src/core/queue_events.dart` around lines 203 - 205, Both the onError handler in start() and the catch block inside _onDelivery call _events.addError without checking whether the controller has been closed, which can throw StateError; update both sites (the start() onError closure and the catch in _onDelivery) to guard the call by checking a closed flag (e.g., if (!_closed) or if (!_events.isClosed) before calling _events.addError(error, stackTrace), and ensure the same guard is used consistently where close() may have been called.packages/stem_redis/lib/src/brokers/redis_broker.dart (1)
497-501:⚠️ Potential issue | 🟡 Minor
consumerGroupis still silently discarded in broadcast-only mode.When
queue == null,groupis set toconsumerGroup ?? '__broadcast__', but this value is never used:loop()returns immediately (line 573), and_listenBroadcastderives its own group via_broadcastGroupKey(channel, consumer)without reading the outergroup. A caller passing an explicitconsumerGroupin broadcast-only mode has it silently dropped.♻️ Options to resolve
Option A — fail fast (minimal):
if (queue == null && broadcastChannels.isEmpty) { throw ArgumentError( 'RedisStreamsBroker requires at least one queue or broadcast channel.', ); } +if (queue == null && consumerGroup != null) { + throw ArgumentError( + 'consumerGroup is not supported in broadcast-only mode.', + ); +}Option B — thread
consumerGroupinto_listenBroadcastso callers can share a consumer group across multiple broadcast subscriptions.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/stem_redis/lib/src/brokers/redis_broker.dart` around lines 497 - 501, The code currently ignores an explicit consumerGroup when queue == null because loop() returns early and _listenBroadcast computes its own group via _broadcastGroupKey; thread the consumerGroup through instead: change the call site in loop() to pass the resolved group into _listenBroadcast (using the existing variable group when queue == null) and add an optional consumerGroup parameter to _listenBroadcast so it uses the passed consumerGroup (if non-null) rather than always calling _broadcastGroupKey(channel, consumer); update any helper call sites and documentation/comments accordingly so broadcast-only subscriptions honor an explicitly provided consumerGroup.packages/stem/lib/src/worker/worker.dart (1)
3728-3741:⚠️ Potential issue | 🟡 MinorValidate queue targets are strings before adding.
toString()on null/objects can create unintended queue names (“null”, “Instance of …”). Skip non-string entries instead.♻️ Proposed fix
- if (rawQueues is List) { - for (final value in rawQueues) { - final queueName = value.toString().trim(); - if (queueName.isNotEmpty) { - queues.add(queueName); - } - } - } + if (rawQueues is List) { + for (final value in rawQueues) { + if (value is! String) continue; + final queueName = value.trim(); + if (queueName.isNotEmpty) { + queues.add(queueName); + } + } + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/stem/lib/src/worker/worker.dart` around lines 3728 - 3741, The _extractQueueTargets function currently calls toString() on rawQueues elements which can yield unintended names like "null" or "Instance of ..."; change the loop over rawQueues to only accept values that are actually String (non-null) — e.g., check `if (value is String)` then trim and add to the queues set, otherwise skip the entry; similarly ensure the single `queue` extracted from payload is validated as a String (already using as String? but keep the explicit type check) so only genuine string queue names are added.packages/stem/lib/src/signals/payloads.dart (1)
106-136:⚠️ Potential issue | 🟡 Minor
occurredAtshould reflect execution time, not enqueue time.Execution-phase payloads still use
envelope.enqueuedAt, which misstates when the event occurred. Capture a timestamp at payload creation (defaultstemNow().toUtc()), and return that foroccurredAt. Apply the same pattern toTaskPrerunPayload,TaskPostrunPayload,TaskSuccessPayload,TaskFailurePayload, andTaskRevokedPayload.♻️ Proposed fix pattern (example for TaskReceivedPayload)
-class TaskReceivedPayload implements StemEvent { - const TaskReceivedPayload({required this.envelope, required this.worker}); +class TaskReceivedPayload implements StemEvent { + TaskReceivedPayload({ + required this.envelope, + required this.worker, + DateTime? occurredAt, + }) : _occurredAt = (occurredAt ?? stemNow()).toUtc(); @@ - DateTime get occurredAt => envelope.enqueuedAt.toUtc(); + final DateTime _occurredAt; + DateTime get occurredAt => _occurredAt;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/stem/lib/src/signals/payloads.dart` around lines 106 - 136, The payloads for execution-phase events (TaskReceivedPayload, TaskPrerunPayload, TaskPostrunPayload, TaskSuccessPayload, TaskFailurePayload, TaskRevokedPayload) currently return envelope.enqueuedAt from their occurredAt getters; change them to capture a timestamp at construction (e.g., add a final DateTime occurredAtUtc field defaulting to stemNow().toUtc()) and have the occurredAt getter return that field instead of envelope.enqueuedAt so the event reflects actual execution time; ensure constructors accept an optional DateTime or initialize the private field and update attributes where occurredAt is used.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/dashboard/lib/src/services/stem_service.dart`:
- Around line 195-196: The loop currently calls stemNow() twice per iteration
causing a TOCTOU gap; in the loop that uses deadline and remaining (the while
guard and the remaining = deadline.difference(...) calculation), call stemNow()
once at the top of the loop (e.g., assign final now = stemNow()) and use now for
both the while condition and computing remaining so remaining cannot become
negative between calls; update references in the same function/method where
stemNow(), deadline, and remaining are used to rely on the single captured now
value.
In `@packages/stem_postgres/lib/src/control/postgres_revoke_store.dart`:
- Line 130: The ternary expression setting "version" uses identical branches and
should be simplified: replace the expression "existing != null ? entry.version :
entry.version" with just "entry.version" wherever it appears (referencing the
variables existing, entry, and the version field) to remove the redundant
conditional.
In `@packages/stem/lib/src/core/clock.dart`:
- Around line 13-19: SystemStemClock.now() currently returns a local-kind
DateTime which causes inconsistent kinds between production and tests
(FakeStemClock uses UTC); change SystemStemClock (class SystemStemClock, method
now()) to normalize to UTC at the source by returning DateTime.now().toUtc() and
update the doc comment to state it always returns a UTC DateTime so callers (and
components like Envelope.enqueuedAt and FakeStem) can rely on UTC-kind
timestamps.
In `@packages/stem/lib/src/core/queue_events.dart`:
- Around line 53-60: The attributes map in the overridden getter Map<String,
Object?> get attributes is missing the event name, so update the attributes
returned by QueueCustomEvent (the attributes getter) to include the event
discriminator by adding a 'name' entry (i.e., include 'name': name or the
appropriate eventName property) alongside id, queue, payload, headers, and meta
so consumers that introspect attributes can see the event name.
In `@packages/stem/lib/src/scheduler/in_memory_lock_store.dart`:
- Around line 63-67: In renew(), capture stemNow() once into a local variable
(e.g., now) and use that single timestamp for both the isExpired check and for
computing the new expiration; specifically, replace the two stemNow() calls in
the renew logic so you call stemNow() once, use now for lock.isExpired(now) and
set lock.expiresAt = now.add(ttl), and keep the existing _locks.remove(key) and
return false behavior when expired.
In `@packages/stem/lib/src/workflow/core/workflow_clock.dart`:
- Around line 5-6: Remove the now-stale lint suppression by deleting the "//
ignore: one_member_abstracts" comment above the WorkflowClock declaration;
WorkflowClock already extends StemClock and does not declare its own abstract
members, so simply remove the ignore annotation from the WorkflowClock class
definition (referencing the WorkflowClock class and its extension of StemClock).
In `@packages/stem/lib/stem.dart`:
- Line 90: The public barrel currently re-exports test utility FakeStemClock via
export 'src/core/clock.dart' alongside production symbols (StemClock, stemNow,
withStemClock); remove FakeStemClock from the production barrel by either
(preferred) moving the FakeStemClock symbol into a testing-only export (create
lib/testing.dart that exports FakeStemClock from src/core/clock.dart and other
test utilities) and adjust export 'src/core/clock.dart' to only expose
production symbols, or (alternative) change the existing export to a show/hide
list that omits FakeStemClock so it is only re-exported from the testing barrel;
update any imports in tests to import package:stem/testing.dart.
---
Outside diff comments:
In `@packages/dashboard/lib/src/services/models.dart`:
- Line 1: Import statement currently shows QueueHeartbeat and WorkerHeartbeat
but omits stemNow, causing a compile error where stemNow() is referenced; update
the import that uses "show" to include stemNow alongside QueueHeartbeat and
WorkerHeartbeat (and mirror this change at the other occurrence around line 158)
so stemNow is available to the code.
In `@packages/stem/lib/src/brokers/in_memory_broker.dart`:
- Around line 342-347: The ternary with a cascade in purgeDeadLetters is
misleading because the ..forEach(state.deadLetters.remove) applies to the whole
expression; make the intent explicit by wrapping the else-side expression in
parentheses (e.g., (candidates)..forEach(...)) or by applying the cascade to the
selected value after the ternary, and fix indentation so it's clear the removal
runs for both branches; reference the variables limit, candidates and the call
to state.deadLetters.remove to locate and adjust the code.
In `@packages/stem/lib/src/signals/payloads.dart`:
- Around line 667-739: The occurredAt getters in ControlCommandReceivedPayload
and ControlCommandCompletedPayload call stemNow() on each read and must be
frozen at creation; add a final DateTime field (e.g., final DateTime occurredAt)
to each class, initialize it in the constructor (provide a default of
stemNow().toUtc() if none supplied), and change the overridden occurredAt getter
to return that stored field instead of calling stemNow() directly; update
constructors for ControlCommandReceivedPayload and
ControlCommandCompletedPayload to accept/assign the timestamp.
---
Duplicate comments:
In `@packages/stem_cli/lib/src/cli/worker.dart`:
- Line 1099: The return expression currently returns 70 when replies.isEmpty and
0 otherwise, which inverts the intended exit semantics; change the return to
return 0 when there are no replies and a non‑zero error code when replies exist
(flip the ternary to return replies.isEmpty ? 0 : 70) so the process exit code
reflects error replies; locate the return that references "replies" (the return
line currently "return replies.isEmpty ? 70 : 0;") and replace it with the
corrected ternary.
- Around line 1081-1094: The header "Paused" is misleading because the `paused`
variable in the resume output reflects queues still paused after the command;
update the column header string in the dependencies.out.writeln call (the line
currently printing 'Worker | Status | Updated | Paused') to a clearer
label such as 'Still Paused' or 'Paused (after)' and adjust the separator line
printed below it if needed so column alignment remains correct; ensure the rest
of the row output that uses `${paused}` and the `paused` variable logic remains
unchanged.
In `@packages/stem_redis/lib/src/brokers/redis_broker.dart`:
- Around line 497-501: The code currently ignores an explicit consumerGroup when
queue == null because loop() returns early and _listenBroadcast computes its own
group via _broadcastGroupKey; thread the consumerGroup through instead: change
the call site in loop() to pass the resolved group into _listenBroadcast (using
the existing variable group when queue == null) and add an optional
consumerGroup parameter to _listenBroadcast so it uses the passed consumerGroup
(if non-null) rather than always calling _broadcastGroupKey(channel, consumer);
update any helper call sites and documentation/comments accordingly so
broadcast-only subscriptions honor an explicitly provided consumerGroup.
In `@packages/stem/lib/src/canvas/canvas.dart`:
- Around line 476-480: The inspectBatch method currently treats any group
returned from backend.getGroup(batchId) as a batch; update inspectBatch to
verify the group has the stem.batch marker before building a BatchStatus: after
retrieving status in inspectBatch, check the group's metadata/markers for the
"stem.batch" marker and return null if it's absent, and only call
_buildBatchStatus(status) when that marker is present (use the same status
object returned from backend.getGroup for the check).
- Around line 415-474: submitBatch is not atomic: the pre-check using
backend.getGroup can race so two callers with the same batchId both dispatch
tasks; change submitBatch to attempt a single atomic creation or handle
initGroup conflicts by calling backend.initGroup and treating "already exists"
as success — i.e., remove reliance on the separate pre-check or make it
best-effort, call backend.initGroup(GroupDescriptor ...) first (or catch the
specific error/return value it emits when the group already exists), and if
initGroup reports the group exists then fetch the existing group
(backend.getGroup) and return its taskIds instead of dispatching; ensure the
code references the same id, GroupDescriptor, normalizedSignatures, and uses
group(..., groupId: id) only after successful initGroup to guarantee
idempotence.
In `@packages/stem/lib/src/core/contracts.dart`:
- Around line 1028-1037: _parsFailureMode currently only parses strings and
falls back incorrectly when given a RateLimiterFailureMode instance (or a
toString like "RateLimiterFailureMode.failClosed"). Update _parseFailureMode to
first check if value is a RateLimiterFailureMode and return it immediately;
otherwise convert value to string, strip any enum qualifier (e.g., take
substring after the last '.' so "RateLimiterFailureMode.failClosed" becomes
"failclosed"), trim/lowercase and match against RateLimiterFailureMode.values by
comparing mode.name; return null if no match. This touches the _parseFailureMode
function and references RateLimiterFailureMode.
In `@packages/stem/lib/src/core/queue_events.dart`:
- Around line 129-137: The constructor for QueueEvents currently defers
validating the provided queue string until start(); change it to fail fast by
validating in the constructor (QueueEvents) itself — e.g., use an
initializer-check or assert/throw in the constructor’s initializer list for the
trimmed `queue` parameter (or a redirecting constructor) so empty/invalid queue
names are rejected immediately instead of in `start()`. Reference: QueueEvents
constructor, the `queue` field and `start()` method.
- Around line 203-205: Both the onError handler in start() and the catch block
inside _onDelivery call _events.addError without checking whether the controller
has been closed, which can throw StateError; update both sites (the start()
onError closure and the catch in _onDelivery) to guard the call by checking a
closed flag (e.g., if (!_closed) or if (!_events.isClosed) before calling
_events.addError(error, stackTrace), and ensure the same guard is used
consistently where close() may have been called.
In `@packages/stem/lib/src/signals/payloads.dart`:
- Around line 106-136: The payloads for execution-phase events
(TaskReceivedPayload, TaskPrerunPayload, TaskPostrunPayload, TaskSuccessPayload,
TaskFailurePayload, TaskRevokedPayload) currently return envelope.enqueuedAt
from their occurredAt getters; change them to capture a timestamp at
construction (e.g., add a final DateTime occurredAtUtc field defaulting to
stemNow().toUtc()) and have the occurredAt getter return that field instead of
envelope.enqueuedAt so the event reflects actual execution time; ensure
constructors accept an optional DateTime or initialize the private field and
update attributes where occurredAt is used.
In `@packages/stem/lib/src/worker/worker.dart`:
- Around line 3728-3741: The _extractQueueTargets function currently calls
toString() on rawQueues elements which can yield unintended names like "null" or
"Instance of ..."; change the loop over rawQueues to only accept values that are
actually String (non-null) — e.g., check `if (value is String)` then trim and
add to the queues set, otherwise skip the entry; similarly ensure the single
`queue` extracted from payload is validated as a String (already using as
String? but keep the explicit type check) so only genuine string queue names are
added.
ℹ️ Review info
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (43)
packages/dashboard/lib/src/services/models.dartpackages/dashboard/lib/src/services/stem_service.dartpackages/dashboard/lib/src/state/dashboard_state.dartpackages/dashboard/lib/src/ui/content.dartpackages/stem/lib/src/backend/in_memory_backend.dartpackages/stem/lib/src/bootstrap/workflow_app.dartpackages/stem/lib/src/brokers/in_memory_broker.dartpackages/stem/lib/src/canvas/canvas.dartpackages/stem/lib/src/control/revoke_store.dartpackages/stem/lib/src/core/clock.dartpackages/stem/lib/src/core/contracts.dartpackages/stem/lib/src/core/envelope.dartpackages/stem/lib/src/core/queue_events.dartpackages/stem/lib/src/core/stem.dartpackages/stem/lib/src/observability/metrics.dartpackages/stem/lib/src/scheduler/beat.dartpackages/stem/lib/src/scheduler/in_memory_lock_store.dartpackages/stem/lib/src/scheduler/in_memory_schedule_store.dartpackages/stem/lib/src/signals/emitter.dartpackages/stem/lib/src/signals/payloads.dartpackages/stem/lib/src/signals/signal.dartpackages/stem/lib/src/testing/fake_stem.dartpackages/stem/lib/src/worker/worker.dartpackages/stem/lib/src/workflow/core/run_state.dartpackages/stem/lib/src/workflow/core/workflow_clock.dartpackages/stem/lib/src/workflow/core/workflow_watcher.dartpackages/stem/lib/stem.dartpackages/stem/test/unit/core/clock_test.dartpackages/stem/test/unit/workflow/workflow_clock_test.dartpackages/stem_cli/lib/src/cli/dlq.dartpackages/stem_cli/lib/src/cli/observer.dartpackages/stem_cli/lib/src/cli/schedule.dartpackages/stem_cli/lib/src/cli/worker.dartpackages/stem_postgres/lib/src/backend/postgres_backend.dartpackages/stem_postgres/lib/src/brokers/postgres_broker.dartpackages/stem_postgres/lib/src/control/postgres_revoke_store.dartpackages/stem_postgres/lib/src/scheduler/postgres_lock_store.dartpackages/stem_postgres/lib/src/scheduler/postgres_schedule_store.dartpackages/stem_redis/lib/src/backend/redis_backend.dartpackages/stem_redis/lib/src/brokers/redis_broker.dartpackages/stem_redis/lib/src/scheduler/redis_schedule_store.dartpackages/stem_sqlite/lib/src/backend/sqlite_result_backend.dartpackages/stem_sqlite/lib/src/broker/sqlite_broker.dart
📜 Review details
🔇 Additional comments (79)
packages/stem_postgres/lib/src/scheduler/postgres_schedule_store.dart (2)
125-127: Good standardization on Stem clock for writes.This aligns
upserttimestamps with the shared clock abstraction and improves determinism.
251-256: Consistent updatedAt clock source.Using
stemNow()here keeps execution metadata aligned with the shared time source.packages/dashboard/lib/src/ui/content.dart (2)
9-9: LGTM — centralized clock import.Nice, this keeps the UI aligned with the shared clock abstraction.
581-583: LGTM — relative time now uses stemNow.Consistent time source for relative timestamps; no issues.
packages/stem/lib/src/canvas/canvas.dart (4)
11-11: Clock import aligns with stemNow usage.
Keeps created-at timestamps consistent with the new clock abstraction.
155-246: Batch lifecycle contracts look solid.
Enums and immutable snapshots are clear and cohesive for public API usage.
715-759: Terminal-state filtering in_buildBatchStatusis correct.
Good use ofisTerminaland positive equality checks to avoid misclassification.
761-790:_batchTaskIdsFromGroupdedupe/trim logic is robust.
Handles meta-vs-results fallback safely and returns an immutable list.packages/dashboard/lib/src/services/stem_service.dart (2)
160-160: LGTM — explicit UTC normalization is preserved.
160-196:stemNow()is correctly exported and returnsDateTime.The function is defined in
packages/stem/lib/src/core/clock.dartand re-exported viapackage:stem/stem.dart, which is already imported in the file. The return typeDateTimeis correct for all method calls (.toUtc(),.isBefore(),.add(),.difference()).packages/stem/lib/src/core/contracts.dart (3)
851-901: Group rate-limit options are wired through cleanly.Also applies to: 919-930, 950-998
41-41: Clock abstraction applied to enqueue delays.
UsingstemNow()keeps scheduling consistent with the shared clock.Also applies to: 2004-2007
2063-2070:RateLimiterFailureModeenum looks good.packages/stem_cli/lib/src/cli/worker.dart (8)
23-24: Pause/resume subcommands are registered correctly.Keeps the new controls discoverable alongside existing worker actions.
677-677: Revoke timestamps now use the shared stem clock.Good alignment with the new clock abstraction and UTC normalization.
932-1080: Shared queue-control base class is cleanly structured.Centralized parsing/payload/reply collection reduces duplication and keeps outputs consistent.
1106-1126: Pause/resume commands correctly specialize the base control.Clear command naming, descriptions, and default reasons.
1452-1452: Heartbeat age now uses the shared clock.Consistent with the new clock abstraction.
1885-1889: Consistent timestamp generation for multi-node operations.Using the stem clock across start/stop/status keeps template timestamps aligned.
Also applies to: 1927-1932, 1970-1975
2097-2097: Uptime calculations now rely on the stem clock.Keeps healthcheck and diagnostics aligned with the shared time source.
Also applies to: 2423-2424
2678-2681: Wait loop deadline aligned with the stem clock.Consistent time source for timeout handling.
packages/stem_postgres/lib/src/control/postgres_revoke_store.dart (1)
131-131: LGTM —stemNow()migration is correct.Consistent with the centralized clock migration across the codebase.
packages/stem_cli/lib/src/cli/schedule.dart (1)
62-62: LGTM —stemNow()replacements across the schedule commands are correct and consistent.All three call sites (lines 62, 377, 706) are mechanical clock-source swaps with no logic changes.
packages/stem/lib/src/control/revoke_store.dart (1)
3-3: LGTM — clock migration preserves monotonic version semantics.
stemNow().toUtc().microsecondsSinceEpochcorrectly replaces the previousDateTime.now().toUtc().microsecondsSinceEpoch.Also applies to: 124-124
packages/stem/lib/src/workflow/core/workflow_watcher.dart (1)
1-2: LGTM — fallback timestamp now uses the centralized clock.The
stemNow().toUtc()fallback infromJsoncorrectly replacesDateTime.now().toUtc()while preserving the same behavior whencreatedAtis missing or unparseable.Also applies to: 21-21
packages/stem/lib/src/scheduler/in_memory_lock_store.dart (1)
12-12: LGTM —stemNow()migration inacquireandownerOfis clean.Both methods correctly use the centralized clock for lock acquisition and expiration checks.
Also applies to: 27-27, 46-46
packages/stem_postgres/lib/src/backend/postgres_backend.dart (1)
112-112: LGTM — comprehensivestemNow()migration across the Postgres backend.All 11 call sites consistently replace
DateTime.now()with the centralized clock, with no logic changes. Each method captures the timestamp once where needed.Also applies to: 171-171, 191-191, 243-243, 302-302, 365-365, 375-375, 404-404, 417-417, 460-460, 470-470
packages/stem/lib/src/signals/emitter.dart (3)
6-6: LGTM —emittedAttimestamp fortaskRetryuses the centralized clock.Clean integration:
stemNow().toUtc()is captured once and passed into the payload.Also applies to: 105-112
166-170: LGTM — worker lifecycle payloads now carry canonical signal names.Each lifecycle emission consistently maps to its corresponding
StemSignals.*Nameconstant. TheworkerChildLifecyclemethod correctly derives the signal name from theinitializingflag.Also applies to: 182-186, 198-202, 214-218, 242-248
349-349: LGTM — workflow run emissions usewithSignalNamebuilder pattern consistently.All six workflow lifecycle methods correctly attach their corresponding signal name via the builder.
Also applies to: 360-360, 371-371, 382-382, 390-390, 401-401
packages/stem_postgres/lib/src/brokers/postgres_broker.dart (4)
328-328: LGTM —stemNow().toUtc()migration across broker operations is consistent.All time-sensitive paths (nack, dead-letter, lease extension, pending/inflight counts, sweeper, and consumer loop) correctly use the centralized clock.
Also applies to: 359-359, 406-406, 417-417, 439-439, 526-526, 590-590, 710-710, 729-729, 866-866
231-242: LGTM — nullable queue with broadcast-only consumer support is well-guarded.The logic correctly:
- Derives a nullable
queuewhensubscription.queuesis empty (line 231–233).- Rejects if neither queue nor broadcast channels are provided (line 237–241).
- Uses a
'__broadcast__'sentinel for the locker identity in broadcast-only mode (line 242).
848-853: LGTM — job claiming is correctly gated on non-null queue.The
if (queue != null)guard ensures_claimNextJobis only called when a queue subscription exists, while broadcast delivery proceeds unconditionally below.
812-812: The required-nullablequeuefield is intentional and correctly designed.The
queue: queueparameter passed at instantiation (line 267) can legitimately be null to represent broadcast-only consumers, as evidenced by the null-coalescing checks throughout the code (queue ?? '__broadcast__',queue ?? '<broadcast-only>'). The required modifier enforces explicit caller intent rather than allowing accidental omission, making this the appropriate API contract.packages/stem/lib/src/core/clock.dart (1)
42-53: LGTM — zone-based clock scoping is correctly implemented.Zone values propagate through async continuations, so
withStemClockcomposes correctly withasyncbodies. Nested calls shadow the outer clock in the expected direction. Theis StemClockguard instemNow()and the fallback to_systemClockare both safe.packages/stem/lib/src/testing/fake_stem.dart (1)
78-78: LGTM —stemNow()correctly replacesDateTime.now()in both enqueue paths.Both
enqueueCallandenqueueare updated consistently, so awithStemClockscope in tests will control the recordedenqueuedAttimestamp in both paths.Also applies to: 104-104
packages/stem_cli/lib/src/cli/observer.dart (1)
393-393: LGTM —stemNow()correctly anchors schedule drift calculations to the active clock.The
stemNow()export concern (public barrel) is tracked in theredis_backend.dartcomment above.packages/stem_postgres/lib/src/scheduler/postgres_lock_store.dart (1)
73-73: LGTM — all three time sites correctly substitutestemNow().toUtc().Also applies to: 119-119, 167-167
packages/stem/lib/src/observability/metrics.dart (1)
35-35: LGTM — all three usages correctly substitutestemNow().toUtc()and the field initializer at line 289 is evaluated at construction time as intended.Also applies to: 289-289, 294-294
packages/stem/lib/src/scheduler/in_memory_schedule_store.dart (1)
61-61: LGTM.packages/stem/lib/src/core/envelope.dart (1)
194-194: LGTM — preserves existing local-time semantics;Envelope.fromJsonhandles the timezone offset correctly viaDateTime.parse.packages/stem_redis/lib/src/backend/redis_backend.dart (1)
142-142: Verification confirms:stemNow()is properly re-exported frompackage:stem/stem.dart.The public barrel at line 90 exports the entire
src/core/clock.dartmodule, makingstemNow()available to all packages that importpackage:stem/stem.dart. All three cross-package consumers (packages/stem_redis,packages/stem_postgres,packages/stem_cli) correctly rely on this public re-export without requiring direct imports ofclock.dart. Compilation will succeed.packages/stem/lib/src/bootstrap/workflow_app.dart (1)
150-160: Consistent timeout tracking via Stem clock.This keeps workflow completion polling aligned with the centralized clock abstraction.
packages/stem/lib/src/workflow/core/run_state.dart (1)
41-41: Aligned createdAt fallback with centralized clock.This keeps run state hydration consistent with the Stem time source.
packages/stem_cli/lib/src/cli/dlq.dart (1)
337-337: Replay metadata now uses Stem clock.Keeps replay annotations consistent with other time-based metadata.
packages/stem_redis/lib/src/scheduler/redis_schedule_store.dart (1)
221-221: Schedule upsert now aligned with Stem clock.Good consistency for deterministic scheduling behavior.
packages/stem/lib/src/backend/in_memory_backend.dart (9)
60-76: In-memory task timestamps now use Stem clock.Keeps result TTLs and update timestamps consistent with shared time.
88-90: Expiry checks now use Stem clock.Consistent TTL evaluation for task status retrieval.
160-165: Group initialization TTLs now use Stem clock.Aligns group expiry with centralized time source.
185-190: Group expiry checks now use Stem clock.Ensures consistent expiration behavior across stores.
201-205: Expire() now uses Stem clock.Consistent TTL extension for task entries.
233-238: Heartbeat TTLs now use Stem clock.Matches the centralized time source for worker health tracking.
244-249: Heartbeat expiry check now uses Stem clock.Keeps heartbeat validity aligned with shared time.
332-339: Heartbeat pruning now uses Stem clock.Consistent cleanup behavior for worker heartbeats.
342-349: Task status pruning now uses Stem clock.Unified time source for expiry eviction.
packages/stem_sqlite/lib/src/backend/sqlite_result_backend.dart (11)
174-176: SQLite set() now uses Stem clock.Consistent TTL computation with centralized time.
204-210: SQLite get() now filters by Stem clock.Keeps expiry checks aligned with shared time source.
240-251: List pagination now uses Stem clock for expiry filtering.Consistent filtering across backends.
299-315: Worker heartbeat TTLs now use Stem clock.Matches centralized time semantics for health tracking.
325-331: Heartbeat lookup now uses Stem clock.Consistent TTL evaluation for heartbeat reads.
337-342: Heartbeat listing now uses Stem clock.Aligns expiry filtering with centralized time.
349-359: Group init TTLs now use Stem clock.Ensures group expiry aligns with shared time.
397-403: Group retrieval now uses Stem clock.Consistent group expiry checks.
436-439: Expire() now uses Stem clock.Keeps TTL extension consistent with shared time.
489-495: Cleanup cycle now uses Stem clock.Unified time source for pruning expired rows.
524-530: Group existence checks now use Stem clock.Consistent TTL semantics with other queries.
packages/stem/lib/src/core/stem.dart (3)
304-311: Unique task expiry now uses Stem clock.Keeps uniqueness TTL aligned with the centralized time source.
466-467: Countdown scheduling now uses Stem clock.Consistent ETA computation across the system.
645-646: Duplicate attempt timestamp now uses Stem clock.Aligned metadata timestamps for deduplication tracking.
packages/stem/lib/src/scheduler/beat.dart (2)
110-111: Scheduler tick now uses Stem clock.Ensures lag metrics and due filtering use the centralized time source.
218-257: Dispatch timing now uses Stem clock.Keeps duration and drift measurements consistent across success and failure paths.
Also applies to: 297-306
packages/stem_sqlite/lib/src/broker/sqlite_broker.dart (1)
231-243: Clock migration LGTM — allstemNow()call-sites are correct.Every changed segment calls
stemNow()at the time of use (not at broker construction), preserving correct semantics for visibility timeouts, dead-letter timestamps, and lease expiry calculations.Also applies to: 256-280, 289-293, 307-323, 327-335, 402-402, 460-461, 518-519, 567-568, 611-611, 754-756
packages/stem/test/unit/workflow/workflow_clock_test.dart (1)
15-22: LGTM —SystemWorkflowClockcorrectly delegates throughstemNow()to the zone clock.The test correctly exercises the zone-scoped override and the
clock.nowtear-off as aDateTime Function().packages/stem/test/unit/core/clock_test.dart (1)
1-26: LGTM — tests correctly cover both synchronous and async-boundary scenarios for the clock abstraction.packages/stem/lib/src/brokers/in_memory_broker.dart (1)
9-9: LGTM —stemNow()call-sites are consistent with the clock migration.Also applies to: 23-28, 121-121, 319-319, 448-458, 513-513, 549-549
packages/stem_redis/lib/src/brokers/redis_broker.dart (1)
367-367: LGTM —stemNow()substitutions are correct throughout.Also applies to: 430-430, 791-791, 839-839, 855-855, 965-965, 1033-1033, 1091-1091
packages/stem/lib/stem.dart (1)
96-99: LGTM —queue_events.dartandstem_event.dartexports are appropriate public API surface.packages/stem/lib/src/signals/signal.dart (1)
156-168: Good: SignalContext now carries the StemEvent payload and occurredAt.This keeps dispatch timing aligned with event time and gives handlers unified event metadata.
packages/dashboard/lib/src/state/dashboard_state.dart (1)
84-105: LGTM: dashboard timestamps now align with the Stem clock.
packages/stem_postgres/lib/src/control/postgres_revoke_store.dart
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 13
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (8)
packages/stem_redis/lib/src/brokers/redis_broker.dart (2)
719-729:⚠️ Potential issue | 🔴 CriticalBug: NOGROUP recovery creates the wrong consumer group — will loop indefinitely.
Line 724 correctly invalidates the cache entry for
group, but line 725 passesconsumer(the consumer name) instead ofgroup(the consumer group) to_ensureBroadcastGroup. This creates a group named after the consumer rather than the actual group used byXREADGROUPon line 709, so the next iteration will hitNOGROUPagain, causing an infinite retry loop.🐛 Proposed fix
if ('$error'.contains('NOGROUP')) { _groupsCreated.remove('$streamKey|$group'); - await _ensureBroadcastGroup(channel, consumer); + await _ensureBroadcastGroup(channel, group); continue; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/stem_redis/lib/src/brokers/redis_broker.dart` around lines 719 - 729, The NOGROUP recovery path invalidates the cached group but then calls _ensureBroadcastGroup with the consumer name instead of the consumer group, causing a new group with the wrong name and an infinite retry; update the catch block so that after _groupsCreated.remove('$streamKey|$group') it calls await _ensureBroadcastGroup(channel, group) (use the group variable passed to XREADGROUP) rather than passing consumer, leaving the _shouldSuppressClosedError and rethrow behavior intact.
853-875: 🧹 Nitpick | 🔵 TrivialMinor:
_envelopeFromMapgenerates and immediately re-parses a timestamp string whenenqueuedAtis missing.Line 861 calls
stemNow().toIso8601String()to produce a default, whichDateTime.parse(...)then immediately parses back. This round-trip is wasteful and risks subtle precision loss from ISO 8601 serialization.♻️ Suggested simplification
- enqueuedAt: DateTime.parse( - map['enqueuedAt'] ?? stemNow().toIso8601String(), - ), + enqueuedAt: map['enqueuedAt'] != null + ? DateTime.parse(map['enqueuedAt']!) + : stemNow(),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/stem_redis/lib/src/brokers/redis_broker.dart` around lines 853 - 875, The _envelopeFromMap method currently creates a default enqueuedAt by serializing stemNow() to ISO string then immediately parsing it back; change this to avoid the round-trip by using the parsed DateTime only when map['enqueuedAt'] is present and otherwise use stemNow() directly (i.e., replace the DateTime.parse(map['enqueuedAt'] ?? stemNow().toIso8601String()) pattern with a conditional that calls DateTime.parse(map['enqueuedAt']!) when non-null/ non-empty, else uses stemNow()), keeping the rest of the Envelope fields and types unchanged.packages/stem_redis/test/integration/brokers/redis_broker_integration_test.dart (2)
22-22:⚠️ Potential issue | 🟡 MinorUse a separate namespace variable for the queue-events factory.
Both
runBrokerContractTests.create(line 28) andrunQueueEventsContractTests.create(line 72) write to the samecontractNamespacevariable. BothadditionalBrokerFactorycallbacks read from it. Within a single file Dart tests run sequentially, so this is currently safe in practice — but if the two suites'setUpcalls were ever interleaved (e.g., a framework update,@Timeoutinteraction, or parallel test runner), the queue-eventscreatewould silently overwrite the broker contract's namespace, causingadditionalBrokerFactoryto connect to the wrong stream group.🔧 Proposed fix: scope each namespace variable to its factory
String? contractNamespace; + String? queueEventsContractNamespace; runBrokerContractTests( ... factory: BrokerContractFactory( create: () async { final namespace = _uniqueNamespace(); contractNamespace = namespace; ... }, additionalBrokerFactory: () async { final namespace = contractNamespace; ... }, ), ); runQueueEventsContractTests( ... factory: QueueEventsContractFactory( create: () async { final namespace = _uniqueNamespace(); - contractNamespace = namespace; + queueEventsContractNamespace = namespace; ... }, additionalBrokerFactory: () async { - final namespace = contractNamespace; + final namespace = queueEventsContractNamespace; if (namespace == null || namespace.isEmpty) { throw StateError( 'Redis queue-events contract requires primary broker namespace.', ); }Also applies to: 67-72
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/stem_redis/test/integration/brokers/redis_broker_integration_test.dart` at line 22, The shared top-level variable contractNamespace is reused by both runBrokerContractTests.create and runQueueEventsContractTests.create causing both additionalBrokerFactory callbacks to potentially read/write the same value; to fix, remove the shared contractNamespace and create separate, locally scoped namespace variables inside each factory create block (e.g., brokerContractNamespace inside runBrokerContractTests.create and queueEventsContractNamespace inside runQueueEventsContractTests.create), update any references in additionalBrokerFactory closures to use the corresponding local variable, and ensure no external code relies on the old shared variable.
325-335:⚠️ Potential issue | 🟠 MajorCI failure: 50 ms warm-up is too tight for the broadcast consumer-group race.
The pipeline shows worker-one timing out (10 s) waiting for the fan-out message. The only synchronization between
consume()andpublish()is a fixed 50 ms delay (line 325). Under Redis load — now heavier with the newly addedrunQueueEventsContractTestssuite sharing the same Redis instance — consumer-group creation and the first blocking XREADGROUP call can take longer than 50 ms, leaving the subscriber not yet registered when the message lands. A message published before the consumer group is fully set up in Redis is silently missed (it lands in the stream but the group's delivered-ID cursor is initialized after it, so XREADGROUP never returns it).Increase the warm-up or, better, actively wait for consumer readiness:
🔧 Suggested robustness improvement
- await Future<void>.delayed(const Duration(milliseconds: 50)); + // Wait long enough for both consumer-group subscriptions to register on CI. + await Future<void>.delayed(const Duration(milliseconds: 500));Or add a
retryUntil-style poll (preferred — environment-independent):// Poll until both consumers have created their groups. // (Requires broker to expose a group-exists check, or use a sentinel publish/ack.)Additionally, consider isolating the broadcast test's Redis namespace from the new contract-test suites to eliminate cross-suite load as a contributing factor.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/stem_redis/test/integration/brokers/redis_broker_integration_test.dart` around lines 325 - 335, The 50ms fixed delay before publisher.publish is too short and causes a race where the consumer group/consumer may not be registered before the broadcast is sent; replace the fixed await Future.delayed(const Duration(milliseconds: 50)) with an active wait that polls for consumer readiness (e.g., a retryUntil or loop that checks the broker/consumer group exists or waits for a sentinel message/ack) before calling publisher.publish(broadcast, routing: RoutingInfo.broadcast(channel: channel)); ensure the poll returns only once both consumers are ready so workerOne.next.timeout no longer intermittently times out.packages/stem/lib/src/workflow/core/workflow_clock.dart (1)
10-16:⚠️ Potential issue | 🟡 MinorUpdate
SystemWorkflowClockdocstring to matchstemNow()behavior.It now delegates to
stemNow()(which can be scoped), not strictlyDateTime.now().♻️ Suggested doc update
-/// Default clock that proxies to [DateTime.now]. +/// Default clock that proxies to [stemNow].🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/stem/lib/src/workflow/core/workflow_clock.dart` around lines 10 - 16, The docstring for SystemWorkflowClock is inaccurate: it says it proxies to DateTime.now but the implementation calls stemNow(), which can be scoped; update the class and/or constructor docstring to state that SystemWorkflowClock delegates to stemNow() (the scoped system time helper) rather than DateTime.now(), and mention that stemNow() may be overridden/scoped in tests or runtime environments; reference the class SystemWorkflowClock, its override of WorkflowClock.now(), and the stemNow() helper so readers know where the delegation occurs.packages/stem/test/unit/worker/worker_test.dart (1)
486-557:⚠️ Potential issue | 🟡 MinorDuplicate test:
'emits worker lifecycle signals on start and shutdown'is defined twice.This test (lines 486–557) is a near-exact duplicate of the test at lines 344–415 — same name, same
consumerName: 'worker-life', same signal subscriptions, and same assertions. Both will execute in the test suite, and since they shareStemSignalsglobal state, the second invocation could interfere with or be influenced by the first.Remove one of the two definitions.
#!/bin/bash # Verify the duplication rg -n "emits worker lifecycle signals on start and shutdown" packages/stem/test/unit/worker/worker_test.dart🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/stem/test/unit/worker/worker_test.dart` around lines 486 - 557, The test "emits worker lifecycle signals on start and shutdown" is duplicated; remove one definition to avoid interfering global StemSignals state — locate the duplicate test block (the test with name 'emits worker lifecycle signals on start and shutdown' that constructs InMemoryBroker, InMemoryResultBackend, SimpleTaskRegistry, and Worker with consumerName 'worker-life' and the subscriptions to StemSignals.workerInit/workerReady/workerStopping/workerShutdown) and delete it (or consolidate any unique assertions if needed) so only a single test for Worker lifecycle signals remains.packages/stem/lib/src/signals/payloads.dart (2)
651-686:⚠️ Potential issue | 🟡 Minor
ScheduleEntryFailedPayload.occurredAtreturnsscheduledFor— not when the failure actually occurred.Line 677 returns
scheduledFor.toUtc()asoccurredAt, but the failure happens after the scheduled time (when execution was attempted and errored). This could be significantly later thanscheduledForif there was scheduling drift or retries.Since no explicit
failedAtfield exists, consider adding one or defaulting tostemNow()like the other execution-phase payloads:Proposed fix
class ScheduleEntryFailedPayload implements StemEvent { - const ScheduleEntryFailedPayload({ + ScheduleEntryFailedPayload({ required this.entry, required this.scheduledFor, required this.error, required this.stackTrace, - }); + DateTime? occurredAt, + }) : _occurredAt = (occurredAt ?? stemNow()).toUtc(); final ScheduleEntry entry; final DateTime scheduledFor; final Object error; final StackTrace stackTrace; + final DateTime _occurredAt; `@override` String get eventName => 'schedule-entry-failed'; `@override` - DateTime get occurredAt => scheduledFor.toUtc(); + DateTime get occurredAt => _occurredAt;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/stem/lib/src/signals/payloads.dart` around lines 651 - 686, ScheduleEntryFailedPayload currently returns scheduledFor.toUtc() from the occurredAt getter, which misrepresents when the failure was observed; update ScheduleEntryFailedPayload to record the actual failure time by adding a new DateTime failedAt (or defaulting failedAt = stemNow() when not provided) and change the occurredAt getter to return failedAt.toUtc(); also update the constructor and the attributes map (replace or add 'failedAt' with failedAt.toUtc().toIso8601String()) so the payload reflects the real execution-failure timestamp while retaining entry, scheduledFor, error, and stackTrace.
688-717:⚠️ Potential issue | 🔴 Critical
occurredAtgetter callsstemNow()on every access — returns a different time each call.
ControlCommandReceivedPayloadisconst-constructible (Line 691), so it can't store a timestamp field. ButoccurredAt(Line 706) invokesstemNow().toUtc()in the getter body, meaning every access yields a new timestamp. This violates theStemEventcontract whereoccurredAtshould represent when the event occurred.The same issue affects
ControlCommandCompletedPayloadat Line 749.Make the constructors non-
constand capture the timestamp at construction time, consistent with other payloads likeTaskReceivedPayload:Proposed fix for ControlCommandReceivedPayload
class ControlCommandReceivedPayload implements StemEvent { - const ControlCommandReceivedPayload({ + ControlCommandReceivedPayload({ required this.worker, required this.command, - }); + DateTime? occurredAt, + }) : _occurredAt = (occurredAt ?? stemNow()).toUtc(); final WorkerInfo worker; final ControlCommandMessage command; + final DateTime _occurredAt; `@override` String get eventName => 'control-command-received'; `@override` - DateTime get occurredAt => stemNow().toUtc(); + DateTime get occurredAt => _occurredAt;Apply the same pattern to
ControlCommandCompletedPayload.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/stem/lib/src/signals/payloads.dart` around lines 688 - 717, The occurredAt getter currently calls stemNow() on each access, so change ControlCommandReceivedPayload and ControlCommandCompletedPayload to capture the event timestamp at construction: remove the const constructor, add a final DateTime occurredAt field (initialized to stemNow().toUtc() in the constructor) and replace the existing occurredAt getter implementation with that field; keep eventName as-is and ensure both classes' constructors accept the same required parameters (worker, command / command, result) while setting occurredAt once at construction time.
♻️ Duplicate comments (6)
packages/stem_cli/test/unit/cli/cli_worker_stats_test.dart (1)
132-176:pauseErrandresumeErrare still never asserted.Both buffers are passed to
runStemClibut their contents are never verified. A silent CLI error would go undetected. This was raised in a previous review and has not been addressed.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/stem_cli/test/unit/cli/cli_worker_stats_test.dart` around lines 132 - 176, The test creates pauseErr and resumeErr and passes them to runStemCli but never asserts their contents; add assertions after each runStemCli call to verify the CLI produced no error output (e.g., expect(pauseErr.toString(), isEmpty) after pauseCode check and expect(resumeErr.toString(), isEmpty) after resumeCode check) so silent errors from runStemCli are detected; reference the pauseErr/resumeErr buffers and the runStemCli calls in the test to locate where to insert these expects.packages/stem_cli/lib/src/cli/worker.dart (1)
1081-1099: Both previous review issues are resolved — column header and exit code.The column header is now
Paused queues (after)(line 1082), and the exit code correctly accounts for error status replies at line 1102. These directly address the past comments.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/stem_cli/lib/src/cli/worker.dart` around lines 1081 - 1099, No code changes required: the column header has been corrected to 'Paused queues (after)' in the dependencies.out.writeln call and the exit-code logic correctly accounts for error-status replies in the worker reply handling (see the ordered list iteration over replies and usage of reply.status and reply.payload), so mark this PR approved and merge.packages/stem_redis/test/integration/brokers/redis_broker_integration_test.dart (2)
40-55: Partial fix for the pastadditionalDisposeconcern — verify framework cleanup.The
_NoCloseBrokerwrapper has been removed (line 54), addressing the root cause flagged in the previous review. However,additionalDisposewas not added. Whether the additional broker's connection is now properly reclaimed depends on whetherBrokerContractFactorycallsbroker.close()on the additional broker implicitly, or relies solely onadditionalDispose. Please confirm this.#!/bin/bash # Description: Check whether BrokerContractFactory disposes the additional broker # automatically (i.e. calls close() on it) or only via additionalDispose. # Also verify whether QueueEventsContractFactory has an additionalDispose parameter. # 1. Find BrokerContractFactory definition ast-grep --pattern 'class BrokerContractFactory { $$$ }' # 2. Check how additionalBrokerFactory result is closed inside the broker contract runner rg -n 'additionalDispose\|additionalBroker' --type dart -A 5 # 3. Check QueueEventsContractFactory for additionalDispose support ast-grep --pattern 'class QueueEventsContractFactory { $$$ }'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/stem_redis/test/integration/brokers/redis_broker_integration_test.dart` around lines 40 - 55, The test removed the _NoCloseBroker wrapper but didn’t add an additionalDispose handler, so the additional broker returned by additionalBrokerFactory (created via RedisStreamsBroker.connect) may not be closed; update BrokerContractFactory to either call close() on any broker produced by additionalBrokerFactory or add an additionalDispose parameter and ensure the runner invokes it to close the broker, and similarly verify/extend QueueEventsContractFactory to accept and invoke additionalDispose if it supports additional brokers; reference BrokerContractFactory, additionalBrokerFactory, additionalDispose, RedisStreamsBroker.connect and QueueEventsContractFactory when making the change.
84-99:⚠️ Potential issue | 🟡 MinorMissing
additionalDisposefor the queue-events contract's additional broker.
additionalBrokerFactory(lines 84-99) returns aRedisStreamsBrokerwith no correspondingadditionalDispose. TheQueueEventsContractFactoryonly cleans up the additional broker ifadditionalDisposeis provided—without it, the connection is leaked for every test iteration. All other adapter tests (SQLite, PostgreSQL, in-memory) provide this callback; the redis queue-events contract should match.🔧 Proposed fix
additionalBrokerFactory: () async { final namespace = contractNamespace; if (namespace == null || namespace.isEmpty) { throw StateError( 'Redis queue-events contract requires primary broker namespace.', ); } final broker = await RedisStreamsBroker.connect( redisUrl, namespace: namespace, defaultVisibilityTimeout: const Duration(seconds: 1), claimInterval: const Duration(milliseconds: 200), blockTime: const Duration(milliseconds: 100), ); return broker; }, + additionalDispose: (broker) => + _safeCloseRedisBroker(broker as RedisStreamsBroker),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/stem_redis/test/integration/brokers/redis_broker_integration_test.dart` around lines 84 - 99, The additionalBrokerFactory currently creates a RedisStreamsBroker via RedisStreamsBroker.connect but there is no additionalDispose, causing leaked connections; add an additionalDispose callback alongside additionalBrokerFactory (used by QueueEventsContractFactory) that accepts the broker returned by additionalBrokerFactory and properly closes/disposes it (e.g., await broker.close()/disconnect()) to mirror other adapters and ensure the RedisStreamsBroker connection is cleaned up after each test.packages/stem/lib/src/worker/worker.dart (1)
3728-3747: Past review concern about non-string queue identifiers has been addressed.Lines 3737-3738 now correctly skip non-
Stringvalues withif (value is! String) continue;.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/stem/lib/src/worker/worker.dart` around lines 3728 - 3747, The previous issue was non-string queue identifiers being processed in _extractQueueTargets; the current change correctly guards against that by checking value is String and continuing otherwise, so no further code changes are needed—keep the guard (if (value is! String) continue;) in _extractQueueTargets and ensure the function still trims, filters empty names, deduplicates using the Set queues, and returns a sorted List.packages/stem/lib/src/signals/payloads.dart (1)
107-142: Execution-phase payloads now capture runtime timestamps — past concern addressed.
TaskReceivedPayload(and similarlyTaskPrerunPayload,TaskPostrunPayload,TaskSuccessPayload,TaskFailurePayload,TaskRevokedPayload) now accept an optionaloccurredAtparameter and default tostemNow().toUtc(), replacing the previous use ofenvelope.enqueuedAt. This correctly reflects when the event actually occurred rather than when the task was enqueued.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/stem/lib/src/signals/payloads.dart` around lines 107 - 142, The payloads should record runtime timestamps instead of envelope.enqueuedAt; update each execution-phase payload (e.g., TaskReceivedPayload, TaskPrerunPayload, TaskPostrunPayload, TaskSuccessPayload, TaskFailurePayload, TaskRevokedPayload) to accept an optional occurredAt parameter, default it to (occurredAt ?? stemNow()).toUtc(), stop using envelope.enqueuedAt, and ensure the class exposes occurredAt via the getter and includes relevant identifiers (e.g., taskId, taskName, queue, workerId) in attributes so the event reflects when it actually occurred.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/stem_adapter_tests/README.md`:
- Around line 29-32: Quick Start is missing the runRevokeStoreContractTests
call; add a line alongside runQueueEventsContractTests that invokes
runRevokeStoreContractTests with the same adapterName and an appropriate factory
(e.g., runRevokeStoreContractTests(adapterName: 'my-adapter', factory:
RevokeStoreContractFactory(create: createBroker))), matching the pattern used
for runQueueEventsContractTests and using the existing createBroker helper.
In `@packages/stem_cli/lib/src/cli/worker.dart`:
- Around line 1089-1098: The table rendering loop in
WorkerPingCommand/WorkerInspectCommand suppresses the error details because
updated/paused are computed only from reply.payload; change the loop in
worker.dart where ordered is iterated so that when reply.status != 'ok' you
extract the error string from reply.error?['message'] (falling back to
reply.error?.toString() or '-' if null) and display that in the table (e.g., in
the Updated or a new Error column) instead of the '-' placeholders; update the
assignments for updated and/or paused to use this error value when status !=
'ok' so operators see the message in the text output without needing --json.
In `@packages/stem_cli/test/unit/cli/cli_worker_stats_test.dart`:
- Around line 514-529: _Remove the unused _FastTask class since it is never
registered in any SimpleTaskRegistry or referenced by tests; delete the entire
class definition (the _FastTask class implementing TaskHandler<void>) from the
test file to eliminate dead code and keep only the tasks actually used (e.g.,
_BlockingTask) in the pause/resume tests._
In `@packages/stem_redis/lib/src/brokers/redis_broker.dart`:
- Around line 572-575: Add a one-line comment above the early return in the
RedisBroker.loop() method explaining that the check for queue == null
intentionally skips the queue-consumption loop for broadcast-only subscriptions;
update the loop() function (where the variable queue is checked) to include that
explanatory comment so future readers know this early return is deliberate
rather than a bug.
In `@packages/stem/lib/src/core/queue_events.dart`:
- Around line 246-278: The lazy cast rawPayload.cast<String, Object?>() in
_eventFromEnvelope can defer a TypeError until access time if keys aren't
Strings; replace it with an eager, fail-fast conversion (e.g., build a new Map
by iterating rawPayload or use Map.from and validate/convert each key to String)
so non-string keys produce an immediate, clear error (throw a FormatException
with context if a key cannot be converted) and return a Map<String, Object?> for
payload.
In `@packages/stem/lib/src/worker/worker.dart`:
- Around line 786-867: The group rate-limiter block currently catches on Object
(in the try around rateLimiter!.acquire) which swallows programming errors;
change the catch to a narrower exception type (e.g., "on Exception catch (error,
stack)" or the specific rate-limiter exception class provided by your rate
limiter) so only runtime errors from the rate limiter are handled by the
fail-open/fail-closed logic (handler.options.groupRateLimiterFailureMode,
RateLimiterFailureMode.failOpen) and let non-Exception errors (TypeError,
NoSuchMethodError, etc.) propagate; keep the existing handling paths that call
retryStrategy.nextDelay and _retryDelivery, and adjust any log/error fields
(groupKey, envelope, delivery, resultEncoder) accordingly.
- Around line 2459-2540: _refreshQueueSubscriptions currently cancels every
tracked subscription then recreates only active ones, causing unnecessary
disruption; change it to a differential update: compute activeQueues from
_effectiveQueues.where(!_isQueuePaused) and then (1) determine toCancel =
subscriptions currently in _queueSubscriptionNames or _subscriptions that are
not in activeQueues and only cancel/remove those (using subscription.cancel()
with the existing error handling), and (2) determine toCreate = activeQueues
that are not already present in _subscriptions/_queueSubscriptionNames and only
call broker.consume/stream.listen to create those subscriptions (adding them to
_subscriptions and _queueSubscriptionNames as done now). Keep existing error
handling and the broadcast logic (index==0) when creating subscriptions and do
not change _cancelAllSubscriptions() semantics.
- Around line 3749-3817: _in _processQueuePauseCommand_, the code uses
expiresAt: now when paused is false to create an immediately-expired revoke
entry (so resume is implemented by writing an expired pause rather than deleting
it), which risks a brief race where another path (_syncRevocations_) could
reapply the pause before pruneExpired runs; update the function to include a
clear comment above the entries creation (and/or above the
upsertAll/pruneExpired block) stating this intentional pattern: that resume is
represented by an expired RevokeEntry, that _applyQueuePauseEntry removes
expired entries, and noting the tiny race window between upsertAll and
pruneExpired (and why pruneExpired is called immediately) so future readers
understand the behavior and tradeoffs involving revokeStore.upsertAll,
revokeStore.pruneExpired, and _applyQueuePauseEntry.
- Around line 2204-2215: The code calls stemNow() multiple times causing drift;
fix by capturing a single Timestamp (e.g., final now = stemNow()) at the start
of the block and use it when computing scheduledAt, delay, and notBefore so
scheduledAt = request.eta ?? (request.countdown != null ?
now.add(request.countdown!) : null), delay = scheduledAt != null ?
scheduledAt.difference(now) : _computeRetryDelay(envelope.attempt, request,
StackTrace.current, policy), and notBefore = scheduledAt ?? now.add(delay);
update references to scheduledAt, delay, and notBefore accordingly.
- Around line 2029-2036: The code calls stemNow() twice producing mismatched
timestamps: capture a single timestamp (e.g., final now = stemNow()) before
computing nextRunAt and before building the republished envelope, then use
now.add(delay) for both nextRunAt and the envelope.copyWith(... notBefore: ... )
so the broker.nack / broker.publish sequence uses a consistent retry timestamp;
update usages around nextRunAt, envelope.copyWith, and notBefore (and keep
maxRetries and attempt handling the same).
- Around line 3936-3957: The queue_pause and queue_resume case handlers call
_processQueuePauseCommand without per-case error handling and can leak stack
traces; wrap each of the queue_pause and queue_resume case bodies in a try/catch
like the revoke case: call _processQueuePauseCommand inside try, set reply to
ControlReplyMessage with status 'ok' and the result payload on success, and in
catch construct a ControlReplyMessage with status 'error', requestId from
command.requestId, workerId _workerIdentifier, and a payload containing only the
error message (e.g. error.toString() or a safe message) — do not include stack
traces.
- Around line 2291-2321: The backend is recording TaskState.retried with the
original envelope.attempt in _retryDelivery (used for rate-limit requeues) which
differs from _handleFailure and _handleRetryRequest that publish with attempt +
1; add a concise comment near the _retryDelivery function (and optionally
reference the related _handleFailure and _handleRetryRequest) clarifying that
rate-limit backoff intentionally does not increment envelope.attempt and that
TaskState.retried in this path represents a requeue due to rate-limiting (not a
consumed retry attempt), so monitoring systems should treat these retried
records differently; reference backend.set, _statusMeta, and the
envelope.attempt behavior in the comment.
- Around line 2323-2337: The current _handlePausedQueueDelivery always uses a
fixed 250ms backoff which causes tight requeue loops; change it to compute an
exponential backoff (e.g., base 250ms * 2^n, capped at a max like 30s, with
jitter) keyed to the envelope (use envelope.id) or envelope.queue so repeated
requeues for the same in-flight message back off progressively; implement a
small Map<String,int> (e.g., _pausedRequeueAttempts) incremented each time
_handlePausedQueueDelivery runs for that envelope and reset when queue is
unpaused (see _refreshQueueSubscriptions and RevokeEntry), compute backoff from
that counter and pass it to _retryDelivery, and keep envelope.attempt unchanged
and preserve the extra metadata ({'queuePaused': true, 'queue':
envelope.queue}).
---
Outside diff comments:
In `@packages/stem_redis/lib/src/brokers/redis_broker.dart`:
- Around line 719-729: The NOGROUP recovery path invalidates the cached group
but then calls _ensureBroadcastGroup with the consumer name instead of the
consumer group, causing a new group with the wrong name and an infinite retry;
update the catch block so that after _groupsCreated.remove('$streamKey|$group')
it calls await _ensureBroadcastGroup(channel, group) (use the group variable
passed to XREADGROUP) rather than passing consumer, leaving the
_shouldSuppressClosedError and rethrow behavior intact.
- Around line 853-875: The _envelopeFromMap method currently creates a default
enqueuedAt by serializing stemNow() to ISO string then immediately parsing it
back; change this to avoid the round-trip by using the parsed DateTime only when
map['enqueuedAt'] is present and otherwise use stemNow() directly (i.e., replace
the DateTime.parse(map['enqueuedAt'] ?? stemNow().toIso8601String()) pattern
with a conditional that calls DateTime.parse(map['enqueuedAt']!) when non-null/
non-empty, else uses stemNow()), keeping the rest of the Envelope fields and
types unchanged.
In
`@packages/stem_redis/test/integration/brokers/redis_broker_integration_test.dart`:
- Line 22: The shared top-level variable contractNamespace is reused by both
runBrokerContractTests.create and runQueueEventsContractTests.create causing
both additionalBrokerFactory callbacks to potentially read/write the same value;
to fix, remove the shared contractNamespace and create separate, locally scoped
namespace variables inside each factory create block (e.g.,
brokerContractNamespace inside runBrokerContractTests.create and
queueEventsContractNamespace inside runQueueEventsContractTests.create), update
any references in additionalBrokerFactory closures to use the corresponding
local variable, and ensure no external code relies on the old shared variable.
- Around line 325-335: The 50ms fixed delay before publisher.publish is too
short and causes a race where the consumer group/consumer may not be registered
before the broadcast is sent; replace the fixed await Future.delayed(const
Duration(milliseconds: 50)) with an active wait that polls for consumer
readiness (e.g., a retryUntil or loop that checks the broker/consumer group
exists or waits for a sentinel message/ack) before calling
publisher.publish(broadcast, routing: RoutingInfo.broadcast(channel: channel));
ensure the poll returns only once both consumers are ready so
workerOne.next.timeout no longer intermittently times out.
In `@packages/stem/lib/src/signals/payloads.dart`:
- Around line 651-686: ScheduleEntryFailedPayload currently returns
scheduledFor.toUtc() from the occurredAt getter, which misrepresents when the
failure was observed; update ScheduleEntryFailedPayload to record the actual
failure time by adding a new DateTime failedAt (or defaulting failedAt =
stemNow() when not provided) and change the occurredAt getter to return
failedAt.toUtc(); also update the constructor and the attributes map (replace or
add 'failedAt' with failedAt.toUtc().toIso8601String()) so the payload reflects
the real execution-failure timestamp while retaining entry, scheduledFor, error,
and stackTrace.
- Around line 688-717: The occurredAt getter currently calls stemNow() on each
access, so change ControlCommandReceivedPayload and
ControlCommandCompletedPayload to capture the event timestamp at construction:
remove the const constructor, add a final DateTime occurredAt field (initialized
to stemNow().toUtc() in the constructor) and replace the existing occurredAt
getter implementation with that field; keep eventName as-is and ensure both
classes' constructors accept the same required parameters (worker, command /
command, result) while setting occurredAt once at construction time.
In `@packages/stem/lib/src/workflow/core/workflow_clock.dart`:
- Around line 10-16: The docstring for SystemWorkflowClock is inaccurate: it
says it proxies to DateTime.now but the implementation calls stemNow(), which
can be scoped; update the class and/or constructor docstring to state that
SystemWorkflowClock delegates to stemNow() (the scoped system time helper)
rather than DateTime.now(), and mention that stemNow() may be overridden/scoped
in tests or runtime environments; reference the class SystemWorkflowClock, its
override of WorkflowClock.now(), and the stemNow() helper so readers know where
the delegation occurs.
In `@packages/stem/test/unit/worker/worker_test.dart`:
- Around line 486-557: The test "emits worker lifecycle signals on start and
shutdown" is duplicated; remove one definition to avoid interfering global
StemSignals state — locate the duplicate test block (the test with name 'emits
worker lifecycle signals on start and shutdown' that constructs InMemoryBroker,
InMemoryResultBackend, SimpleTaskRegistry, and Worker with consumerName
'worker-life' and the subscriptions to
StemSignals.workerInit/workerReady/workerStopping/workerShutdown) and delete it
(or consolidate any unique assertions if needed) so only a single test for
Worker lifecycle signals remains.
---
Duplicate comments:
In `@packages/stem_cli/lib/src/cli/worker.dart`:
- Around line 1081-1099: No code changes required: the column header has been
corrected to 'Paused queues (after)' in the dependencies.out.writeln call and
the exit-code logic correctly accounts for error-status replies in the worker
reply handling (see the ordered list iteration over replies and usage of
reply.status and reply.payload), so mark this PR approved and merge.
In `@packages/stem_cli/test/unit/cli/cli_worker_stats_test.dart`:
- Around line 132-176: The test creates pauseErr and resumeErr and passes them
to runStemCli but never asserts their contents; add assertions after each
runStemCli call to verify the CLI produced no error output (e.g.,
expect(pauseErr.toString(), isEmpty) after pauseCode check and
expect(resumeErr.toString(), isEmpty) after resumeCode check) so silent errors
from runStemCli are detected; reference the pauseErr/resumeErr buffers and the
runStemCli calls in the test to locate where to insert these expects.
In
`@packages/stem_redis/test/integration/brokers/redis_broker_integration_test.dart`:
- Around line 40-55: The test removed the _NoCloseBroker wrapper but didn’t add
an additionalDispose handler, so the additional broker returned by
additionalBrokerFactory (created via RedisStreamsBroker.connect) may not be
closed; update BrokerContractFactory to either call close() on any broker
produced by additionalBrokerFactory or add an additionalDispose parameter and
ensure the runner invokes it to close the broker, and similarly verify/extend
QueueEventsContractFactory to accept and invoke additionalDispose if it supports
additional brokers; reference BrokerContractFactory, additionalBrokerFactory,
additionalDispose, RedisStreamsBroker.connect and QueueEventsContractFactory
when making the change.
- Around line 84-99: The additionalBrokerFactory currently creates a
RedisStreamsBroker via RedisStreamsBroker.connect but there is no
additionalDispose, causing leaked connections; add an additionalDispose callback
alongside additionalBrokerFactory (used by QueueEventsContractFactory) that
accepts the broker returned by additionalBrokerFactory and properly
closes/disposes it (e.g., await broker.close()/disconnect()) to mirror other
adapters and ensure the RedisStreamsBroker connection is cleaned up after each
test.
In `@packages/stem/lib/src/signals/payloads.dart`:
- Around line 107-142: The payloads should record runtime timestamps instead of
envelope.enqueuedAt; update each execution-phase payload (e.g.,
TaskReceivedPayload, TaskPrerunPayload, TaskPostrunPayload, TaskSuccessPayload,
TaskFailurePayload, TaskRevokedPayload) to accept an optional occurredAt
parameter, default it to (occurredAt ?? stemNow()).toUtc(), stop using
envelope.enqueuedAt, and ensure the class exposes occurredAt via the getter and
includes relevant identifiers (e.g., taskId, taskName, queue, workerId) in
attributes so the event reflects when it actually occurred.
In `@packages/stem/lib/src/worker/worker.dart`:
- Around line 3728-3747: The previous issue was non-string queue identifiers
being processed in _extractQueueTargets; the current change correctly guards
against that by checking value is String and continuing otherwise, so no further
code changes are needed—keep the guard (if (value is! String) continue;) in
_extractQueueTargets and ensure the function still trims, filters empty names,
deduplicates using the Set queues, and returns a sorted List.
ℹ️ Review info
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (31)
.site/docs/comparisons/stem-vs-bullmq.md.site/docs/core-concepts/queue-events.md.site/docs/core-concepts/rate-limiting.md.site/docs/core-concepts/signals.mdpackages/dashboard/lib/src/services/stem_service.dartpackages/stem/example/docs_snippets/lib/canvas_batch.dartpackages/stem/lib/src/canvas/canvas.dartpackages/stem/lib/src/core/clock.dartpackages/stem/lib/src/core/contracts.dartpackages/stem/lib/src/core/queue_events.dartpackages/stem/lib/src/scheduler/in_memory_lock_store.dartpackages/stem/lib/src/signals/payloads.dartpackages/stem/lib/src/worker/worker.dartpackages/stem/lib/src/workflow/core/workflow_clock.dartpackages/stem/lib/stem.dartpackages/stem/lib/testing.dartpackages/stem/test/unit/canvas/canvas_test.dartpackages/stem/test/unit/core/clock_test.dartpackages/stem/test/unit/worker/worker_test.dartpackages/stem/test/unit/workflow/workflow_clock_test.dartpackages/stem_adapter_tests/README.mdpackages/stem_adapter_tests/lib/src/queue_events_contract_suite.dartpackages/stem_adapter_tests/lib/src/revoke_store_contract_suite.dartpackages/stem_cli/lib/src/cli/worker.dartpackages/stem_cli/test/unit/cli/cli_worker_stats_test.dartpackages/stem_postgres/lib/src/control/postgres_revoke_store.dartpackages/stem_redis/lib/src/brokers/redis_broker.dartpackages/stem_redis/test/integration/brokers/redis_broker_integration_test.dartpackages/stem_sqlite/lib/src/database/migrations/m_20260224103000_add_revoke_store.dartpackages/stem_sqlite/lib/src/models/stem_revoke_entry.dartpackages/stem_sqlite/test/control/sqlite_revoke_store_test.dart
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
🧰 Additional context used
🪛 GitHub Actions: stem_redis
packages/stem_redis/test/integration/brokers/redis_broker_integration_test.dart
[error] 335-335: Redis broadcast fan-out delivers to all subscribers failed. worker-one timed out waiting for broadcast message.
[error] 332-332: Asynchronous gap detected in test traces (possible flaky test or timeout).
🔇 Additional comments (55)
packages/stem_sqlite/lib/src/database/migrations/m_20260224103000_add_revoke_store.dart (1)
8-33: LGTM — migration schema is consistent and TZ-aware.No issues spotted in table definition, indexes, or down migration.
packages/stem_sqlite/lib/src/models/stem_revoke_entry.dart (1)
5-46: LGTM — ORM model matches the migration and is clear.No issues found with field mappings or annotations.
packages/stem_redis/lib/src/brokers/redis_broker.dart (4)
330-344: LGTM: Parameter rename fromconsumertogroupclarifies intent.The
_ensureBroadcastGroupmethod now correctly receives the resolved group name rather than deriving it internally, which aligns with the updated broadcast group resolution logic inconsume.
486-501: Broadcast-only mode routing looks correct.The
queue == nullguard,queueGroupderivation, and emptystreamKeysfor broadcast-only mode are well-structured. Downstream uses ofqueueGroup!are all safely guarded by the early return inloop()and bystreamKeysbeing empty (so the_scheduleClaimloop on line 628 is a no-op).
633-643:consumerGroupis now properly threaded into broadcast listeners.This addresses the prior concern about
consumerGroupbeing silently discarded in broadcast-only mode. The caller-suppliedconsumerGroupis honored as the group name for both_listenBroadcastand_scheduleClaim.
366-366:stemNow()migration is consistent and correct across all time-sensitive paths.All
DateTime.now()call sites (delay comparison, drain scoring, lease expiry, envelope defaults, dead-letter timestamps, replay scheduling, and lease extension) have been uniformly migrated tostemNow(). This enables deterministic testing and a single clock source.Also applies to: 429-429, 797-797, 845-845, 861-861, 971-971, 1039-1039, 1097-1097
packages/stem_cli/test/unit/cli/cli_worker_stats_test.dart (3)
131-189: LGTM —try/finallycleanup is properly wired.The
finallyblock guardsrelease.complete()against double-completion and ensuresworker.shutdown()/broker.dispose()are always called, even on assertion failure. This correctly addresses the past comment.
478-493: LGTM — deadline is now checked before each predicate invocation.Moving the deadline guard to the top of the
while (true)loop (line 485) and makingpredicateaFuture<bool> Function()addresses both the semantic imprecision from the past review and the synchronous-predicate mismatch. All call sites have been updated with() async =>lambdas.
495-512: LGTM — active-polling negative assertion is a solid replacement for the fixed sleep.
_assertTaskRemainsQueuedfirst awaits a non-null state (removing write-lag false failures), then continuously validates the state isqueuedthroughout theholdForwindow. Any early state transition causes an immediateStateError, which is strictly better than the prior single-snapshot check after a blind wait.packages/stem_cli/lib/src/cli/worker.dart (5)
23-24: LGTM — new commands correctly registered.
1109-1129: LGTM — clean concrete implementations.
677-677: LGTM —stemNow().toUtc()consistently replacesDateTime.now().toUtc()here and in_renderHeartbeat(line 1455).
1888-1892: LGTM —stemNow().toUtc()used consistently for timestamp generation in_start,_stop,_status(lines 1888, 1930, 1973),WorkerHealthcheckCommand(line 2100), and_describeUptime(line 2426).
2681-2683: LGTM —stemNow()used consistently on both sides of the deadline comparison; the absence of.toUtc()is harmless since bothDateTimevalues share the same timezone.packages/stem_postgres/lib/src/control/postgres_revoke_store.dart (1)
130-131: Redundant ternary removed; clock abstraction applied — LGTM.Both the previously flagged redundant ternary (
existing != null ? entry.version : entry.version→entry.version) and theDateTime.now()→stemNow()migration are correctly applied. The UTC semantics are consistent withSystemStemClock..site/docs/core-concepts/queue-events.md (1)
44-45: Grammar fix applied — LGTM."built into" is now correct.
packages/stem/lib/testing.dart (1)
1-5: Well-scoped testing barrel — LGTM.The selective
show FakeStemClockkeeps production clock types (SystemStemClock,withStemClock) out of the testing surface while exposing exactly what test authors need.packages/stem_adapter_tests/README.md (1)
72-76: Both past-flagged gaps resolved — LGTM.
QueueEventsContractCapabilitiesandRevokeStoreContractCapabilitiestables are added and the fan-out opt-out recipe is included.Also applies to: 95-99, 149-161
packages/stem/example/docs_snippets/lib/canvas_batch.dart (1)
32-40: Polling loop addresses past feedback — LGTM.Correctly breaks on terminal state before the delay, so the last successful poll isn't penalized with a redundant 50 ms sleep.
packages/stem/lib/src/core/clock.dart (2)
17-19: UTC normalization applied — LGTM.
DateTime.now().toUtc()eliminates the production/test clock-kind mismatch previously flagged.
39-54: Zone-based clock scoping — LGTM.
_zoneClockKey = Object()(non-const) correctly creates a unique runtime identity that won't be canonicalized across library loads. Theis StemClockguard instemNow()provides a clean fallback. Zone values propagate through async continuations in Dart, sostemNow()will resolve correctly acrossawaitpoints within awithStemClockscope.packages/stem/lib/src/scheduler/in_memory_lock_store.dart (1)
63-68: SinglestemNow()capture inrenew— LGTM.Captures
stemNow()once and reuses for both the expiry check and the newexpiresAtassignment, removing the subtle two-read window flagged previously.packages/stem/test/unit/workflow/workflow_clock_test.dart (1)
16-23: Zone-scoped clock integration test — LGTM.Passing the
clock.nowtear-off directly intowithStemClockcleanly verifies thatSystemWorkflowClockdelegates tostemNow()rather than callingDateTime.now()directly. This doubles as a regression guard if the delegation is ever accidentally removed.packages/dashboard/lib/src/services/stem_service.dart (2)
160-174: Centralized clock for control envelopes looks good.Using
stemNow().toUtc()forenqueuedAtkeeps timestamps aligned with the shared clock source.
192-200: Timeout loop now uses a single timestamp per iteration.This avoids deadline/remaining TOCTOU gaps and reads cleanly.
packages/stem/test/unit/core/clock_test.dart (1)
5-25: Clock abstraction tests are clear and deterministic.Nice coverage for scoped clock behavior and async propagation.
.site/docs/core-concepts/signals.md (2)
11-46: Signal lifecycle + StemEvent shape documentation reads cleanly.Nice consolidation of semantics and ordering details.
115-121: No issues found. All documented helper names match the actualStemSignalsAPI exactly. The method namesonTaskSuccess,onTaskFailure, and all others listed are correct as documented.packages/stem_adapter_tests/lib/src/queue_events_contract_suite.dart (1)
7-242: Queue events contract suite looks comprehensive.The scenarios cover core delivery, filtering, metadata, and fan-out behavior well.
packages/stem_adapter_tests/lib/src/revoke_store_contract_suite.dart (1)
7-199: Revoke store contract suite covers the core semantics well.packages/stem/lib/src/core/contracts.dart (3)
848-1040: Group rate-limit options and failure-mode parsing look consistent.
2007-2010: Delay now usesstemNow()for deterministic timing.
2066-2073:RateLimiterFailureModeenum is concise and clear..site/docs/core-concepts/rate-limiting.md (1)
12-153: Group rate-limiting docs are clear and explicit about defaults..site/docs/comparisons/stem-vs-bullmq.md (1)
1-62: Documentation structure looks good.The comparison matrix is well-organized with clear status semantics, evidence-backed rationale links, and the previously flagged empty "Update policy" heading is now properly populated with content. The BullMQ events parity mapping table is a useful addition for users migrating from BullMQ.
packages/stem_sqlite/test/control/sqlite_revoke_store_test.dart (1)
1-101: Test structure is clean and past review feedback has been addressed.The redundant
dbFile.delete()intearDownand the unnecessary cast(store as SqliteRevokeStore)from prior reviews are both resolved. The three test paths (fromDataSource, connect URL, adapter factory) provide good coverage with propertry/finallycleanup.packages/stem/lib/src/core/queue_events.dart (3)
14-73:QueueCustomEventimplementation looks correct; past feedback addressed.The
namefield is now included inattributes(line 57), and the class properly implements theStemEventcontract. ThetoJsonmethod correctly serializesemittedAtto ISO 8601 format.
76-125:QueueEventsProducer.emit— input validation and broadcast routing look correct.Proper trimming and empty-check on both
queueandeventName, UTC timestamp viastemNow(), and broadcast routing per-queue channel. Clean implementation.
128-236:QueueEventslifecycle management is robust; past feedback addressed.The constructor now validates the queue name via
_normalizeQueue(fail-fast). The_closedguard in_onDelivery(line 216) and_emitError(lines 231-233) properly protect against adding to a closed stream. The_waitForCallbackSuccessconsolidation from prior review is also reflected in the delegated helper pattern.packages/stem/lib/stem.dart (1)
90-99: New exports are correctly scoped; past feedback addressed.
FakeStemClockis now hidden from the production barrel (line 90), and the newqueue_events.dartandstem_event.dartexports properly surface the new public API.packages/stem/lib/src/canvas/canvas.dart (5)
155-246: Batch data model is well-designed.
BatchLifecycleState,BatchSubmission, andBatchStatusprovide a clean, complete surface for batch lifecycle management.pendingCountproperly clamps to avoid negatives, andisTerminalcovers all terminal states includingpartial.
418-479:submitBatchidempotency and immutability — past feedback fully addressed.The method now:
- Guards against resubmission by checking
backend.getGroup(id)beforeinitGroup(line 432).- Distinguishes batch vs. non-batch groups with
stem.batchmeta check (line 434).- Serializes per-batchId within the process via
_withBatchSubmissionLock.
481-486:inspectBatchcorrectly guards against non-batch groups.Line 484 checks
status.meta['stem.batch'] != true, returningnullfor groups not created viasubmitBatch. Past feedback addressed.
738-782:_buildBatchStatusterminal filtering and positive state checks — past feedback fully addressed.Entries are filtered to
isTerminalstates before counting (line 740), and lifecycle state classification uses positive equality (succeeded == completed, etc.) instead of fragile negative exclusion.
488-504: The lock provides no protection against concurrent submissions with the samebatchIdfrom separateCanvasinstances.While
_withBatchSubmissionLockserializes operations within a singleCanvasinstance, concurrent submissions from differentCanvasinstances (or separate processes) can both pass thegetGroupcheck before either callsinitGroup, causing a race condition. This is verified in the backend implementations:
- PostgreSQL (
insertOrIgnore): Only the firstinitGroupsucceeds; duplicates fail silently. Provides some safety but the check-then-act pattern remains racy.- SQLite (
upsert): Duplicates overwrite the group metadata, potentially causing a mismatch betweenexpectedcount and actual dispatched tasks.- Redis and in-memory backend (
SET/map assignment): Duplicates overwrite the group, creating the same mismatch issue.If multiple
Canvasinstances can be submitted against the samebatchId, consider either enforcing uniqueness constraints at the backend level or using a distributed lock (not instance-local).packages/stem/test/unit/canvas/canvas_test.dart (2)
83-154: Batch test coverage is thorough and well-structured.The three new tests cover the key batch scenarios: basic lifecycle, idempotency with caller-provided
batchId, and terminal-only entry counting. TheinspectBatchtest (line 124) directly manipulates backend state to isolate the counting logic — a clean unit-test approach. The shared_waitForNonNullhelper (past feedback addressed) reduces duplication nicely.
194-210: Generic polling helper is clean.
_waitForNonNullproperly abstracts the polling pattern with configurable timeout and descriptive error messages. Both_waitForSuccessand_waitForBatchTerminaldelegate to it, eliminating the previously flagged duplication.packages/stem/test/unit/worker/worker_test.dart (5)
1291-1377: Group limiter cross-type key sharing test is well-constructed.The
_ScenarioRateLimiterprecisely controls when rate limiting kicks in, and the test validates both the shared key behavior and the retry event metadata. Past feedback about the redundantisNotNullassertion has been addressed.
1427-1481: Fail-closed test: past timing feedback addressed.
maxRetries: 5is now explicitly set (line 1447), and_waitForTaskState(line 1475) replaces the fixed delay, making this test more resilient to CI variability.
1483-1562: Pause/resume persistence test looks solid.The
_assertTaskRemainsQueuedhelper (line 1533) is an improvement over a fixed delay — it first confirms the task exists, then actively asserts it stays queued for the hold period. The full pause → restart → verify-paused → resume → verify-completed flow provides good end-to-end coverage.
1694-1722: Polling helpers: past feedback addressed.
_waitFornow acceptsFutureOr<bool>predicates (line 1695), and_waitForCallbackSuccesscorrectly delegates to_waitForTaskState(lines 1716–1721), eliminating the previously flagged duplication.
1745-1765:_assertTaskRemainsQueued— reasonable approach for paused-queue assertion.The helper first polls until the task exists (line 1750), then asserts the state remains
queuedforholdForduration. This is inherently timing-sensitive but a pragmatic improvement: the two-phase approach (wait-for-existence then assert-hold) reduces the chance of vacuous passing while keeping the test straightforward.packages/stem/lib/src/worker/worker.dart (1)
4301-4359:WorkerEventnow implementsStemEvent— clean addition.The
timestampfield defaulting tostemNow()and theeventName/occurredAt/attributesgetters are well-structured. Theattributesmap conditionally includes fields only when present, avoiding null noise. This aligns well with the broaderStemEventcontract.packages/stem/lib/src/signals/payloads.dart (2)
44-71:BeforeTaskPublishPayloadandAfterTaskPublishPayloaduseenvelope.enqueuedAt— correct for publish-phase events.For these two payloads, the event is the publish action, so
enqueuedAtis the appropriate semantic timestamp.Also applies to: 73-104
530-588:WorkflowRunPayload.withSignalName— clean copy pattern for rebinding signal context.The
withSignalNamemethod correctly propagates all fields including the captured_occurredAt, avoiding the multiple-stemNow()problem.
Summary
QueueEventsProducer/QueueEventsStemEventand migrate signal/event payloads to emitStemEventStemSignalswith worker-scoped listener filters and canonical event naminggroupRateLimit,groupRateKey, failure mode)submitBatch,inspectBatch) with durable lifecycle status.sitedocs and snippets for all new user-facing behavior, including BullMQ parity mapping updatesTesting
dart test packages/stem/test/unit/signals/signal_test.dartdart test packages/stem/test/unit/signals/stem_signals_test.dartdart test packages/stem/test/unit/core/stem_event_test.dartdart test packages/stem/test/unit/core/queue_events_test.dartdart test packages/stem/test/unit/worker/worker_test.dartdart test packages/stem_adapter_testsdart test packages/stem_sqlite/test/controldart test packages/stem_sqlite/test/broker/sqlite_broker_test.dartnpm --prefix .site run buildSummary by CodeRabbit
New Features
Documentation
Tests