Replace expression tree pipeline execution with prewired invokers by danielmarbach · Pull Request #7631 · Particular/NServiceBus

danielmarbach · 2026-02-22T20:18:59Z

This PR aligns pipeline execution with a prewired immutable continuation chain, keeps state on the context bag, and improves AOT/trimming friendliness while minimizing broader architectural change and delivering significant performance improvements across all execution paths..

Summary

The previous trampoline model proposal (see #7625) executed the pipeline as a flat loop-like progression over prebuilt parts, tracking position via a mutable frame on the context. This new approach prewires the entire continuation chain at pipeline build time, eliminating frame manipulation entirely.

Key change: Instead of computing "what's next" at runtime via frame index advancement, each behavior receives a prewired next delegate that directly invokes the subsequent behavior. It is built once at pipeline construction and executed millions of times with zero allocation. It uses key revelations from the trampoline investigations.

Alignment with expression-tree model: Like the current expression-tree-based model (the "Current" baseline in benchmarks), this approach pre-composes the continuation chain at build time. However, it achieves this through static generic type instantiation rather than runtime expression compilation, delivering:

Same allocation-free success path
Same exception handling simplicity
Same thread-safety model: pipeline is immutable, per-invocation state lives on ContextBag
25-29% faster execution
~330x faster warmup (no expression-tree compilation)
Full AOT/trimming compatibility (no runtime codegen)

Architecture

Prewired Continuation Chain

Pipeline Construction Time:
┌─────────────────────────────────────────────────────────────────────────────┐
│  PipelineInvoker.Build(steps)                                               │
│                                                                             │
│  for i = steps.Count - 1 → 0:                                               │
│      next = CreateInvokerNode(step[i], i, next)                             │
│                                                                             │
│  Result: Linked chain of InvokerNode<TIn,TOut> instances                    │
│  Each node holds: behaviorIndex + prewired next delegate                    │
└─────────────────────────────────────────────────────────────────────────────┘

Execution Time:
┌─────────────────────────────────────────────────────────────────────────────┐
│  context.Extensions.Invoker(context)                                        │
│       ↓                                                                     │
│  InvokerNode<TIn,TOut>.Invoke(context)                                      │
│       ↓                                                                     │
│  behavior.Invoke(context, next)    ← next is prewired delegate              │
│       ↓                                                                     │
│  await next(context)              ← direct call to next InvokerNode         │
│       ↓                                                                     │
│  ... continues down the chain                                               │
└─────────────────────────────────────────────────────────────────────────────┘

Node Structure

abstract class InvokerNode
{
    public abstract Task Invoke(IBehaviorContext context);
}

sealed class InvokerNode<TInContext, TOutContext>(int behaviorIndex, Func<TOutContext, Task> next) : InvokerNode
    where TInContext : class, IBehaviorContext
    where TOutContext : class, IBehaviorContext
{
    public override Task Invoke(IBehaviorContext context)
    {
        var behavior = context.Extensions.GetBehavior<TInContext, TOutContext>(behaviorIndex);
        return behavior.Invoke(Unsafe.As<TInContext>(context), next);
    }
}

Critical allocation optimization:

next is cached at construction time, not per-invocation
next.Invoke is a bound method-group delegate (no closure)
Static generic cache for terminal next: CompletedNextCache<TOutContext>.Next

Comparison: Trampoline vs Prewired

Aspect	Trampoline (Previous)	Prewired (Current)
Runtime state	Mutable frame (Index, RangeEnd) on context	No runtime state
"Next" computation	`AdvanceFrame()` + dispatch by index	Direct delegate call
Allocation per invocation	None (success path)	None
Allocation on exception	Frame save/restore + state machine overhead	Minimal (exception only)
Stage transitions	SetFrame() to adjust range	Pre-linked continuation
Concurrency risk	Frame mutation in concurrent `next` calls	Immutable chain, safer

Benefits

Performance

Success path (allocation-free, fastest):

Up to 29% faster than expression-based baseline
Consistent across pipeline depths

Exception path (async throw):

Same allocation profile as baseline (~1.3 KB)
Parity with baseline timing

Warmup (first invocation):

~330x faster than expression-based baseline
50% less allocation

Design Simplicity

No mutable frame state: Eliminates InitFrame, AdvanceFrame, SetFrame, Frame property
No restore logic: Exception handling is natural async propagation, no frame save/restore
Immutable by construction: Each node is immutable, thread-safe by design
Cleaner concurrency story: No shared mutable state to corrupt

AOT/Trimming Friendly

No runtime code generation
No expression trees
No reflection on hot path
Static generic type instantiation at build time

Benchmarks

https://github.com/danielmarbach/MicroBenchmarks and branches starting with bare-metal

Results.zip

BenchmarkDotNet v0.15.8, macOS Tahoe 26.3 (25D125) [Darwin 25.3.0]
Apple M3 Max, 1 CPU, 14 logical and 14 physical cores
.NET SDK 10.0.101
  [Host]     : .NET 10.0.1 (10.0.1, 10.0.125.57005), Arm64 RyuJIT armv8.0-a
  DefaultJob : .NET 10.0.1 (10.0.1, 10.0.125.57005), Arm64 RyuJIT armv8.0-a

Execution

Method	PipelineDepth	Mean	Error	StdDev	Ratio	Allocated	Alloc Ratio
Wired	10	20.94 ns	0.202 ns	0.189 ns	0.75	-	NA
Current	10	28.01 ns	0.469 ns	0.439 ns	1.00	-	NA

Wired	20	43.05 ns	0.046 ns	0.036 ns	0.75	-	NA
Current	20	57.45 ns	0.351 ns	0.311 ns	1.00	-	NA

Wired	40	91.66 ns	1.022 ns	0.956 ns	0.71	-	NA
Current	40	129.26 ns	0.703 ns	0.587 ns	1.00	-	NA

Throwing

Method	PipelineDepth	Mean	Error	StdDev	Median	Ratio	Gen0	Allocated	Alloc Ratio
Current	10	6.102 μs	0.1432 μs	0.4130 μs	6.033 μs	1.00	0.1602	1.31 KB	1.00
Wired	10	6.426 μs	0.2527 μs	0.7252 μs	6.285 μs	1.06	0.1602	1.31 KB	1.00

Wired	20	6.441 μs	0.2101 μs	0.6162 μs	6.240 μs	0.92	0.1526	1.31 KB	1.00
Current	20	7.114 μs	0.3204 μs	0.9397 μs	6.982 μs	1.02	0.1526	1.31 KB	1.00

Wired	40	5.646 μs	0.1093 μs	0.1258 μs	5.597 μs	0.97	0.1602	1.3 KB	1.00
Current	40	5.803 μs	0.1141 μs	0.2225 μs	5.753 μs	1.00	0.1602	1.31 KB	1.00

Warmup

Method	PipelineDepth	Mean	Error	StdDev	Ratio	Gen0	Gen1	Allocated	Alloc Ratio
Wired	10	2.021 μs	0.0396 μs	0.0407 μs	0.003	2.5368	0.0305	20.73 KB	0.44
Current	10	672.160 μs	7.0983 μs	6.2925 μs	1.000	4.8828	1.9531	46.85 KB	1.00

Wired	20	4.067 μs	0.0799 μs	0.1196 μs	0.003	5.0430	0.0916	41.2 KB	0.47
Current	20	1,369.096 μs	27.2932 μs	45.6008 μs	1.001	9.7656	3.9063	88.34 KB	1.00

Wired	40	7.911 μs	0.1577 μs	0.2211 μs	0.003	10.0403	0.2899	82.12 KB	0.48
Current	40	2,746.676 μs	54.2674 μs	105.8443 μs	1.001	19.5313	7.8125	171.74 KB	1.00

Extended Comparison (MediumRun)

Additional benchmark with sync exception, replay, and trampoline comparison:

Job=MediumRun  IterationCount=15  LaunchCount=2  WarmupCount=10

Success Path Comparison

Method	PipelineDepth	Mean	Ratio	Allocated
Prewired_Success	10	17.61 ns	0.76	-
Current_Success	10	23.16 ns	1.00	-
Trampo_Success	10	28.38 ns	1.23	-

Prewired_Success	40	91.07 ns	0.75	-
Current_Success	40	121.73 ns	1.00	-
Trampo_Success	40	118.21 ns	0.97	-

Async Exception Comparison

Method	PipelineDepth	Mean	Ratio	Allocated
Prewired_Exception	10	6,051.69 ns	0.98	1337 B
Current_Exception	10	6,158.12 ns	1.00	1339 B
Trampo_Exception	10	38,927.61 ns	6.33	12987 B

Prewired_Exception	40	5,802.32 ns	0.98	1336 B
Current_Exception	40	5,918.23 ns	1.00	1339 B
Trampo_Exception	40	129,951.86 ns	21.99	129507 B

Sync Exception Comparison

Method	PipelineDepth	Mean	Ratio	Allocated
Prewired_Exception_Sync	10	2,338.83 ns	0.91	392 B
Current_Exception_Sync	10	2,563.78 ns	1.00	600 B
Trampo_Exception_Sync	10	23,015.87 ns	8.98	1624 B

Prewired_Exception_Sync	40	2,450.91 ns	0.92	392 B
Current_Exception_Sync	40	2,653.10 ns	1.00	600 B
Trampo_Exception_Sync	40	93,409.16 ns	35.22	5496 B

Replay Comparison (Multiple `next` calls)

Method	PipelineDepth	Mean	Ratio	Allocated
Trampo_Replay	10	38.53 ns	0.49	-
Prewired_Replay	10	57.12 ns	0.73	-
Current_Replay	10	78.54 ns	1.00	-

Trampo_Replay	40	131.87 ns	0.35	-
Prewired_Replay	40	280.09 ns	0.75	-
Current_Replay	40	375.26 ns	1.00	-

Concurrency Improvement

The prewired approach reduces concurrency risk compared to the trampoline:

Scenario	Trampoline Risk	Prewired Risk
Sequential `next` calls	Safe	Safe
Concurrent `await Task.WhenAll(next(ctx), next(ctx))`	Frame corruption possible	No engine state to corrupt

Note: Concurrent next calls are still not a supported contract (downstream behaviors may not be thread-safe for shared context mutation), but the pipeline engine itself no longer has mutable traversal state.

Implementation Details

Build-Time Composition

// PipelineInvoker.cs
public static Func<IBehaviorContext, Task> Build(IReadOnlyList<RegisterStep> steps)
{
    if (steps.Count == 0)
        return CompletedRoot;

    InvokerNode? next = null;
    for (var i = steps.Count - 1; i >= 0; i--)
    {
        next = CreateInvokerNode(steps[i], i, next);
    }

    return next!.Invoke;
}

Reverse composition: Build from end to start so each node can reference the already-built continuation.

Delegate Caching

static Func<TOutContext, Task> CreateNext<TOutContext>(InvokerNode? next)
    where TOutContext : class, IBehaviorContext =>
    next is null ? CompletedNextCache<TOutContext>.Next : next.Invoke;

static class CompletedNextCache<TOutContext> where TOutContext : class, IBehaviorContext
{
    public static readonly Func<TOutContext, Task> Next = _ => Task.CompletedTask;
}

Key optimization: next.Invoke is a bound method-group delegate, not a closure. This is critical because the earlier implementation had context => next.Invoke(context), which allocated per invocation.

Context Integration

// Pipeline.cs
public Task Invoke(TContext context)
{
    context.Extensions.Initialize(behaviors, invoker);
    return context.Extensions.Invoker(context);
}

// ContextBag.cs
internal void Initialize(IBehavior[] withBehaviors, Func<IBehaviorContext, Task> withInvoker)
{
    behaviors = withBehaviors;
    Invoker = withInvoker;
}

[MethodImpl(MethodImplOptions.AggressiveInlining)]
internal IBehavior<TInContext, TOutContext> GetBehavior<TInContext, TOutContext>(int index)
    where TInContext : class, IBehaviorContext
    where TOutContext : class, IBehaviorContext =>
    Unsafe.As<IBehavior<TInContext, TOutContext>>(Unsafe.Add(ref MemoryMarshal.GetArrayDataReference(behaviors), index))!;

Behaviors and invoker are set on the context once per invocation, enabling indexed access without closures.

Pipeline Extensibility Constraints

The core pipeline architecture constrains extensibility in specific ways:

What IS Possible via Public API

Replace existing stage connectors using PipelineSettings.Replace() (StageConnector<TFrom,TTo> is public)
Replace existing stage fork connectors (StageForkConnector is public)
Register behaviors within existing stages using built-in context types
Implement custom behaviors targeting supported context transitions

What is NOT Possible

Create new custom stages (StageConnector<ICustomContext, IAnotherContext>)
Create new custom stage forks
Any pipeline topology changes beyond replacing existing connectors

Why Custom Stages Are Blocked

BehaviorContext is internal: Without inheriting from it, custom context implementations cannot participate in the pipeline's ContextBag hierarchy.
Factory methods are hardcoded: ConnectorContextExtensions provides public factory methods only for built-in context types (CreateRoutingContext(), CreateDispatchContext(), etc.). No generic factory exists for custom contexts.
Context implementations are internal: All concrete context types (IncomingPhysicalMessageContext, DispatchContext, etc.) inherit from BehaviorContext and are internal.
Invoker node creation is closed: PipelineInvoker.Factory.cs contains switch expressions for all known stage transitions. Adding a new stage requires modifying this internal code.

Design rationale: These constraints ensure the prewired chain can be fully constructed at build time with known type mappings, enabling the performance characteristics demonstrated above.

Alternatives Considered

Trampoline Model (Benchmarked, Rejected)

The trampoline model was implemented and benchmarked as a predecessor to this approach. See PR #7625 for full details.

How it worked:

Pipeline executed as a flat loop-like progression over prebuilt parts
Mutable frame on context (Index, RangeEnd) tracked current position
Each next() call yielded to PipelineRunner.Next() which advanced frame and dispatched
Stage connectors adjusted frame to jump into child ranges

Why rejected:

Exception paths showed severe degradation (6-22x slower than baseline, 10-97x more allocation)
Sync exceptions were particularly problematic due to frame save/restore overhead
Mutable frame state created concurrency risk if next was called concurrently
Frame manipulation added complexity to exception handling (save/restore patterns)

Where it won:

Replay path was fastest (sequential next calls benefited from shared iterator)
Warmup was comparable to prewired approach

Decision: The prewired approach provides better overall characteristics. It improves success-path performance, simplifies exception handling, and enhances concurrency safety while accepting a modest trade-off on replay performance.

Delegate-Factory Approach (Benchmarked, Rejected)

Instead of abstract InvokerNode with virtual dispatch, tried a delegate-factory approach:

sealed class BehaviorInvoker<TIn, TOut>
{
    readonly Func<IBehaviorContext, Task> next;
    readonly Func<TOut, Task> nextDelegate; // Cached to avoid allocation
    
    public Task Invoke(IBehaviorContext context) => ...;
}

Result: ~2x slower on success/replay paths than node-based virtual dispatch. Extra delegate indirection and adapter layers cost more than predicted. Benchmark data confirmed that virtual dispatch on sealed generic types with known shapes is highly optimizable by the JIT.

Source-Generated Invokers

Would likely work, but rejected to avoid:

Generator complexity and maintenance
Build-time coupling
AOT/trimming achieved without source gen

Runtime Codegen/Expression Trees

Rejected for AOT/trimming constraints.

Migration Notes

No behavior interface changes: IBehavior<TIn, TOut> remains unchanged
No public API changes: Behavior registration and replacement work identically
Diagnostics preserved: PipelineStepDiagnostics.PrettyPrint() output unchanged
Tests pass: All existing pipeline tests pass, including multi-next replay semantics

Files Changed

Pipeline/PipelineInvoker.cs: Core invoker node structure and build logic
Pipeline/PipelineInvoker.Factory.cs: Type mapping for known stage transitions
Pipeline/Pipeline.cs: Updated to use the prewired invoker
Extensibility/ContextBag.cs: Removed frame state and added Invoker property
Pipeline/PipelineFrame.cs: Deleted (no longer needed)
Pipeline/PipelineRunner.cs: Simplified to a single Start() method

Sequence Diagram

sequenceDiagram
    participant Pipeline as Pipeline.Invoke()
    participant Context as ContextBag
    participant Invoker as InvokerNode.Invoke()
    participant Behavior as behavior.Invoke()
    participant Next as prewired next delegate
    
    Note over Pipeline: Build time: invoker chain created
    Note over Context: Behaviors[] + invoker set once
    
    Pipeline->>Context: Initialize(behaviors, invoker)
    Pipeline->>Context: Invoker(context)
    Context->>Invoker: rootInvoker.Invoke(context)
    
    loop Each behavior in chain
        Invoker->>Behavior: behavior.Invoke(context, next)
        Note right of Behavior: next is prewired delegate
        Behavior->>Next: await next(context)
        Next->>Invoker: nextNode.Invoke(context)
    end
    
    Note over Invoker: Chain completes naturally

The prewired chain replaces the trampoline loop entirely. Each next call is a direct virtual dispatch to the next InvokerNode, not a yield-and-resume pattern.

… tree did

src/NServiceBus.Core/Pipeline/Pipeline.cs

tmasternak

Good Stuff!

src/NServiceBus.Core/Pipeline/Pipeline.cs

src/NServiceBus.Core/Pipeline/PipelineInvoker.Factory.cs

SzymonPobiega

So, my understanding is that:

Pipeline forks are not represented in the model because they are implemented simply as invoking another pipeline -- there is no change in their implementation from the current solution
BuildPipelineFor is called for each pipeline to compile the ordered list of behaviors, connectors and a terminator for each pipeline
Build iterates over that list in the reverse order, creating an invoker node using CreatedInvokerNode as a factory method
The structure of the pipeline which is currently represented as the IL code of the generated expression tree is in the new model represented as links between an invoker node and its successor that are captured in the next delegates. The "secret sauce" that makes it a viable solution is the behavior downcast "table"

Assuming that at least some of my understanding is correct, I herby approve this PR.

danielmarbach · 2026-02-23T10:30:59Z

@SzymonPobiega yes absolutely correct.

Co-authored-by: Tomasz Masternak <tomasz.masternak@particular.net>

danielmarbach · 2026-02-23T14:23:49Z

I ran the latest proposal from @tmasternak on my M4. Since it is different hardware those things cannot be directly compared but it seems to delivery a save of ~7 nanoseconds per pipeline invocation regardless of hardware. This translates to 25-32% improvement from the cast change alone. Of course these are synthetic benchmarks but still a good baseline

Method	PipelineDepth	Mean	Ratio	Allocated
Prewired_Success	10	11.08 ns	0.61	-
Current_Success	10	18.12 ns	1.00	-
Trampo_Success	10	26.03 ns	1.44	-

Prewired_Success	20	26.19 ns	0.71	-
Current_Success	20	36.95 ns	1.00	-
Trampo_Success	20	50.83 ns	1.38	-

Prewired_Success	40	59.11 ns	0.73	-
Current_Success	40	80.95 ns	1.00	-
Trampo_Success	40	99.41 ns	1.23	-

Improvement: 27-39% faster than Current, 56-73% faster than Trampoline

Async Exception

Method	PipelineDepth	Mean	Ratio	Allocated
Prewired_Exception	10	4,740.32 ns	1.01	1336 B
Current_Exception	10	4,705.66 ns	1.00	1337 B
Trampo_Exception	10	30,341.34 ns	6.45	12991 B

Prewired_Exception	20	4,655.57 ns	0.99	1336 B
Current_Exception	20	4,727.29 ns	1.00	1336 B
Trampo_Exception	20	59,908.93 ns	12.68	38936 B

Prewired_Exception	40	4,875.63 ns	1.01	1336 B
Current_Exception	40	4,841.44 ns	1.00	1337 B
Trampo_Exception	40	102,432.39 ns	21.16	129649 B

Prewired matches Current (~1.0x), Trampoline is 6-21x slower

Sync Exception

Method	PipelineDepth	Mean	Ratio	Allocated
Prewired_Exception_Sync	10	1,733.52 ns	0.88	392 B
Current_Exception_Sync	10	1,977.99 ns	1.00	600 B
Trampo_Exception_Sync	10	17,932.47 ns	9.07	1624 B

Prewired_Exception_Sync	20	1,797.82 ns	0.89	392 B
Current_Exception_Sync	20	2,015.61 ns	1.00	600 B
Trampo_Exception_Sync	20	36,234.52 ns	17.98	2928 B

Prewired_Exception_Sync	40	1,828.73 ns	0.89	392 B
Current_Exception_Sync	40	2,058.15 ns	1.00	600 B
Trampo_Exception_Sync	40	74,153.48 ns	36.05	5496 B

Prewired is 11-12% faster than Current, 35x faster than Trampoline, 35% less allocation than Current

Replay (Multiple `next` calls)

Method	PipelineDepth	Mean	Ratio	Allocated
Trampo_Replay	10	35.21 ns	0.59	-
Prewired_Replay	10	40.66 ns	0.68	-
Current_Replay	10	59.70 ns	1.00	-

Trampo_Replay	20	59.21 ns	0.49	-
Prewired_Replay	20	87.88 ns	0.73	-
Current_Replay	20	120.17 ns	1.00	-

Trampo_Replay	40	113.91 ns	0.44	-
Prewired_Replay	40	196.49 ns	0.76	-
Current_Replay	40	258.29 ns	1.00	-

Trampoline fastest for replay, Prewired 24-27% faster than Current

Performance Improvement Difference

Before vs After (Hardware Change: M3 Max → M4 Max)

Scenario	Depth	Original (M3)	New (M4)	Current (M3)	Current (M4)	Wired Improvement
Execution	10	20.94 ns	14.41 ns	28.01 ns	21.27 ns	31% → 32%
Execution	20	43.05 ns	30.35 ns	57.45 ns	41.83 ns	25% → 27%
Execution	40	91.66 ns	64.48 ns	129.26 ns	86.45 ns	29% → 25%

Scenario	Depth	Original (M3)	New (M4)	Current (M3)	Current (M4)	Improvement
Async Exception	10	6.426 μs	4.749 μs	6.102 μs	4.836 μs	-2% → 2% faster
Async Exception	40	5.646 μs	4.739 μs	5.803 μs	4.861 μs	-3% → 2% faster
Sync Exception	10	2,338.83 ns	1,733.52 ns	2,563.78 ns	1,977.99 ns	9% → 12% faster
Sync Exception	40	2,450.91 ns	1,828.73 ns	2,653.10 ns	2,058.15 ns	8% → 11% faster

Scenario	Depth	Original (M3)	New (M4)	Current (M3)	Current (M4)	Improvement
Warmup	10	2.021 μs	1.924 μs	672.160 μs	543.887 μs	333x → 283x faster
Warmup	20	4.067 μs	3.594 μs	1,369.096 μs	1,064.967 μs	337x → 296x faster
Warmup	40	7.911 μs	6.895 μs	2,746.676 μs	2,107.101 μs	347x → 306x faster

Summary

The new hardware (M4 Max) shows ~30-40% better absolute performance across all scenarios. The relative improvement ratios remain consistent between the two hardware platforms, confirming the wired approach delivers consistent performance gains regardless of hardware.

Scenario	Wired/Prewired	Current	Improvement
Execution (40 deep)	64.48 ns	86.45 ns	25% faster
Async Exception (40 deep)	4.739 μs	4.861 μs	2% faster
Sync Exception (40 deep)	1,828.73 ns	2,058.15 ns	11% faster, 35% less alloc
Warmup (40 deep)	6.895 μs	2,107.101 μs	306x faster, 49% less alloc
Success (40 deep)	59.11 ns	80.95 ns	27% faster

internalautomation bot assigned danielmarbach Feb 22, 2026

Use a simple delegate chain similar to what previously the expression…

9202760

… tree did

danielmarbach force-pushed the pipeline-wired branch from 1d1b640 to 9202760 Compare February 22, 2026 20:31

danielmarbach changed the title ~~Use a simple delegate chain similar to what previously the expression tree did~~ Replace expression tree pipeline execution with prewired invokers Feb 22, 2026

danielmarbach mentioned this pull request Feb 22, 2026

Replace expression-tree pipeline execution with trampoline parts #7625

Closed

danielmarbach marked this pull request as ready for review February 22, 2026 21:43

danielmarbach requested review from DavidBoike, SzymonPobiega, andreasohlund, bording, mikeminutillo and tmasternak February 22, 2026 21:44

danielmarbach commented Feb 22, 2026

View reviewed changes

src/NServiceBus.Core/Pipeline/Pipeline.cs Outdated Show resolved Hide resolved

Better wording

308f3ca

tmasternak reviewed Feb 23, 2026

View reviewed changes

src/NServiceBus.Core/Pipeline/Pipeline.cs Outdated Show resolved Hide resolved

src/NServiceBus.Core/Pipeline/PipelineInvoker.Factory.cs Outdated Show resolved Hide resolved

SzymonPobiega approved these changes Feb 23, 2026

View reviewed changes

Replace behavior index with IBehavior instance in pipeline invoker

d4573e1

Co-authored-by: Tomasz Masternak <tomasz.masternak@particular.net>

andreasohlund approved these changes Feb 24, 2026

View reviewed changes

andreasohlund added the Improvement label Feb 24, 2026

andreasohlund added this to the 10.2.0 milestone Feb 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Replace expression tree pipeline execution with prewired invokers#7631

Replace expression tree pipeline execution with prewired invokers#7631
danielmarbach wants to merge 3 commits intomasterfrom
pipeline-wired

danielmarbach commented Feb 22, 2026 •

edited

Loading

Uh oh!

Uh oh!

tmasternak left a comment

Uh oh!

Uh oh!

Uh oh!

SzymonPobiega left a comment

Uh oh!

danielmarbach commented Feb 23, 2026

Uh oh!

danielmarbach commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments

Conversation

danielmarbach commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Architecture

Prewired Continuation Chain

Node Structure

Comparison: Trampoline vs Prewired

Benefits

Performance

Design Simplicity

AOT/Trimming Friendly

Benchmarks

Execution

Throwing

Warmup

Extended Comparison (MediumRun)

Success Path Comparison

Async Exception Comparison

Sync Exception Comparison

Replay Comparison (Multiple next calls)

Concurrency Improvement

Implementation Details

Build-Time Composition

Delegate Caching

Context Integration

Pipeline Extensibility Constraints

What IS Possible via Public API

What is NOT Possible

Why Custom Stages Are Blocked

Alternatives Considered

Trampoline Model (Benchmarked, Rejected)

Delegate-Factory Approach (Benchmarked, Rejected)

Source-Generated Invokers

Runtime Codegen/Expression Trees

Migration Notes

Files Changed

Sequence Diagram

Uh oh!

Uh oh!

tmasternak left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

SzymonPobiega left a comment

Choose a reason for hiding this comment

Uh oh!

danielmarbach commented Feb 23, 2026

Uh oh!

danielmarbach commented Feb 23, 2026

Async Exception

Sync Exception

Replay (Multiple next calls)

Performance Improvement Difference

Before vs After (Hardware Change: M3 Max → M4 Max)

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

danielmarbach commented Feb 22, 2026 •

edited

Loading

Replay Comparison (Multiple `next` calls)

Replay (Multiple `next` calls)