Skip to content

Comments

Replace expression-tree pipeline execution with trampoline parts#7625

Closed
danielmarbach wants to merge 13 commits intomasterfrom
pipeline
Closed

Replace expression-tree pipeline execution with trampoline parts#7625
danielmarbach wants to merge 13 commits intomasterfrom
pipeline

Conversation

@danielmarbach
Copy link
Contributor

@danielmarbach danielmarbach commented Feb 10, 2026

This PR aligns pipeline execution with a simplified trampoline model, keeps state on the context bag, and improves AOT/trimming friendliness while minimizing broader architectural change.

image

The trampoline model executes the pipeline as a flat loop-like progression over prebuilt parts, instead of recursively composing many nested delegates at runtime.

  • A PipelinePart[] represents the execution plan.
  • A small mutable frame on the context (Index, RangeEnd) tracks current position/range.
  • Start initializes the frame and dispatches the first part.
  • Each behavior calls next, which routes to StageRunners.Next, increments index, and dispatches the next part.
  • Stage connectors adjust the frame to jump into child ranges (and restore progression semantics through the same Next entrypoint).

This gives predictable control flow, avoids runtime code generation, and keeps hot-path dispatch compact.

Benefits

  • More AOT/trimming-friendly:
    • No runtime MakeGenericType, expression-tree compilation, or dynamic method generation required in the core dispatch path.
  • Better hot-path characteristics:
    • Compact, predictable dispatch with fewer dynamic layers.
  • Cleaner design:
    • Terminator transitions handled uniformly.
    • Reflection-derived metadata computed once and reused.
  • No source generator required: the model relies on precomputed parts + static invoker dispatch, so it remains AOT/trimming-friendly without introducing source-gen infrastructure, generator maintenance, or build-time generator
    coupling.
  • Behavior extensibility is unchanged: behaviors remain fully pluggable as before, as long as they target supported stage/context transitions for the pipeline. This preserves the existing extension model while making execution
    more static.

Additional Notes

  • The pipeline already had a precomputed behaviors array per built pipeline.
  • This PR extends the same precomputation concept to parts (PipelinePart[]), so execution now uses two aligned, indexed arrays:
    • behaviors[index] for the behavior instance
    • parts[index] for how that behavior/stage is invoked
  • This keeps runtime work minimal and shifts complexity to build time.

Pipeline Frame Safety

  • The mutable pipeline frame (Index, RangeEnd) is safe in this design because it is invocation-local state carried on the current behavior context (context bag), not global/static state.
  • In other words, each pipeline invocation operates on its own frame data; there is no cross-invocation sharing by design.
  • As long as pipeline contexts are not reused concurrently across invocations (current model), mutating the frame is thread-safe and correct.

Cost of Adding a New Pipeline

This change does add a small amount of explicit wiring when introducing a brand-new pipeline. That should be called out, but in context:

  • Adding a new pipeline already requires touching multiple extension points today (registration, model/build wiring, diagnostics/visualization, and tests/approvals).
  • With this PR, the extra work is mainly adding invoker-id mappings/wiring for the new stage transitions.
  • Historically, this is a low-frequency operation: over many years, only a small number of new pipelines were introduced (notably recoverability and audit), while existing core pipelines stayed largely stable.

So the tradeoff is intentional:

  • Slightly more explicit setup for rare "new pipeline type" work.
  • Better runtime characteristics, simpler execution model, and improved AOT/trimming compatibility for the common path.

Benchmarks

https://github.com/danielmarbach/MicroBenchmarks and branches starting with bare-metal


BenchmarkDotNet v0.15.8, macOS Tahoe 26.2 (25C56) [Darwin 25.2.0]
Apple M3 Max, 1 CPU, 14 logical and 14 physical cores
.NET SDK 10.0.101
  [Host]     : .NET 10.0.1 (10.0.1, 10.0.125.57005), Arm64 RyuJIT armv8.0-a
  DefaultJob : .NET 10.0.1 (10.0.1, 10.0.125.57005), Arm64 RyuJIT armv8.0-a

Execution

Method PipelineDepth Mean Error StdDev Ratio Allocated Alloc Ratio
Trampo 10 24.67 ns 0.114 ns 0.106 ns 0.82 - NA
Expressions 10 30.05 ns 0.172 ns 0.160 ns 1.00 - NA
Trampo 20 48.42 ns 0.558 ns 0.522 ns 0.75 - NA
Expressions 20 64.99 ns 0.467 ns 0.437 ns 1.00 - NA
Trampo 40 111.00 ns 0.649 ns 0.607 ns 0.77 - NA
Expressions 40 144.73 ns 1.599 ns 1.496 ns 1.00 - NA

Throwing

Method PipelineDepth Mean Error StdDev Ratio RatioSD Gen0 Allocated Alloc Ratio
Expressions 10 7.633 μs 0.1514 μs 0.1682 μs 1.00 0.03 0.1526 1.31 KB 1.00
Trampo 10 7.679 μs 0.1285 μs 0.1202 μs 1.01 0.03 0.1526 1.3 KB 0.99
Trampo 20 7.591 μs 0.1021 μs 0.0955 μs 0.99 0.02 0.1526 1.3 KB 0.99
Expressions 20 7.704 μs 0.1426 μs 0.1333 μs 1.00 0.02 0.1526 1.31 KB 1.00
Current 40 7.450 μs 0.1426 μs 0.1334 μs 1.00 0.02 0.1526 1.31 KB 1.00
Expressions 40 7.528 μs 0.0712 μs 0.0631 μs 1.01 0.02 0.1526 1.3 KB 0.99

Warmup

Method PipelineDepth Mean Error StdDev Ratio Gen0 Gen1 Allocated Alloc Ratio
Trampo 10 2.249 μs 0.0152 μs 0.0142 μs 0.003 2.4338 0.0229 19.89 KB 0.42
Expressions 10 765.779 μs 5.5135 μs 5.1573 μs 1.000 4.8828 1.9531 46.85 KB 1.00
Trampo 20 4.230 μs 0.0328 μs 0.0307 μs 0.003 4.8370 0.0610 39.54 KB 0.45
Expressions 20 1,518.139 μs 13.7502 μs 12.8619 μs 1.000 9.7656 3.9063 88.34 KB 1.00
Trampo 40 8.419 μs 0.0592 μs 0.0554 μs 0.003 9.6436 0.1678 78.81 KB 0.46
Expressions 40 2,983.006 μs 19.4129 μs 18.1588 μs 1.000 19.5313 7.8125 171.81 KB 1.00

Alternatives Considered

  • Class-based pipeline part objects (polymorphic dispatch): considered, but rejected for hot-path execution due to extra indirection and weaker inlining characteristics compared to static delegate/id-based dispatch.
  • Source-generated invokers: would likely work, but rejected to avoid generator complexity and build coupling; current design achieves AOT/trimming goals without source gen.
  • Runtime codegen/expression-tree compilation: rejected for AOT/trimming constraints.

See https://github.com/danielmarbach/PipelinePlayground and and branches starting with bare-metal and invokers

Pipeline Architecture

The Pipeline is a Trampoline in Disguise

The pipeline replaces recursive behavior invocation with an explicit iterator (ContextBag.frame) that gets resumed via PipelineRunner.Next.

Each behavior receives next which isn't actually the next behavior; it's a callback into the trampoline:

// From PipelineInvokers.cs:
static class BehaviorNextCache<TContext>
{
    public static readonly Func<TContext, Task> Next = PipelineRunner.Next;
}

When you call next(context), you're yielding control back to the trampoline via PipelineRunner.Next.

Mechanism

// PipelineRunner.cs
public static Task Next(IBehaviorContext ctx)
{
    var nextIndex = ctx.Extensions.AdvanceFrame(out var reachedEnd);  // frame.Index++
    return reachedEnd ? Task.CompletedTask : Dispatch(ctx, nextIndex);
}

static Task Dispatch(IBehaviorContext ctx, int index)
{
    ref var part = ref ctx.Extensions.GetPart(index);
    return PipelineInvokers.Invoke(ctx, part);  // Invoke behavior or stage
}

The frame on the context (ContextBag.frame with Index and RangeEnd) is the iterator. Calling next() invokes PipelineRunner.Next() which advances the frame and loops.

Stage Connectors: Frame Manipulation

// PipelineInvokers.InvokeStage sets a new frame for child stages:
ctx.Extensions.SetFrame(childStart - 1, childEnd);  // New iterator scope

Visual Flow

┌─────────────────────────────────────────────────────────────┐
│                    PipelineRunner (Trampoline)              │
│  ┌─────────────────────────────────────────────────────┐    │
│  │  while (frame.Index < frame.RangeEnd)               │    │
│  │      part = ctx.Extensions.GetPart(frame.Index)     │    │
│  │      ctx.Extensions.AdvanceFrame()  ←── "i++"       │    │
│  │      PipelineInvokers.Invoke(ctx, part)             │    │
│  └─────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────┘
         ↑                                            │
         │        (calls next which is)               │
         │                                            ↓
┌─────────────────┐                           ┌─────────────┐
│    Behavior2    │                           │   Behavior1 │
│ invoke next() ──┼───────────────────────────┼─► do work   │
│                 │                           │  call next()│
└─────────────────┘                           └─────────────┘

Why this matters

Without the trampoline, deep pipelines would blow the stack:

// Naive (STACK OVERFLOW!)
Task Invoke(context, next) {
    DoWork();
    return next(context);  // Recursive call
}

// Trampoline (FLAT STACK!)
Task Invoke(context, next) {
    DoWork();
    return next;  // Returns Task that loops via PipelineRunner.Next
}

Sequence Diagram

sequenceDiagram
    participant Start as PipelineRunner.Start()
    participant Init as ctx.Extensions.InitFrame()
    participant Dispatch as PipelineRunner.Dispatch()
    participant Invokers as PipelineInvokers.Invoke()
    participant Behavior as behavior.Invoke()
    participant Next as PipelineRunner.Next()
    participant Advance as ctx.Extensions.AdvanceFrame()
    
    Note over Start: Frame: {Index: 0, RangeEnd: N}
    
    Start->>Init: Initialize iterator
    Start->>Dispatch: Dispatch(ctx, 0)
    Dispatch->>Invokers: Invoke(ctx, part)
    Invokers->>Behavior: Invoke(context, BehaviorNextCache.Next)
    
    Note right of Behavior: Behavior does work<br/>then calls await next(context)
    
    Behavior->>Next: PipelineRunner.Next(ctx)
    Note over Next: "next" IS PipelineRunner.Next!
    
    Next->>Advance: ctx.Extensions.AdvanceFrame()
    Note right of Advance: frame.Index++
    
    Advance-->>Next: reachedEnd?
    
    alt Not reached end
        Next->>Dispatch: Dispatch(ctx, nextIndex)
        Note over Dispatch: Loop continues (trampoline)
    else End of pipeline
        Next-->>Behavior: Task.CompletedTask
    end
Loading

The frame in ContextBag is the iterator. Calling next() yields to PipelineRunner.Next(). The trampoline loop (DispatchInvoke → behavior → NextDispatch) replaces recursion.

@danielmarbach danielmarbach changed the title Pipeline Replace expression-tree pipeline execution with trampoline parts Feb 10, 2026
@danielmarbach danielmarbach marked this pull request as ready for review February 10, 2026 17:31
@danielmarbach
Copy link
Contributor Author

Open question / assumption to validate:

  • My assumption is that our public pipeline extensibility contract is limited to:
    • adding behaviors to supported existing stages,
    • replacing existing stage connectors,
    • replacing existing fork connectors.
  • I do not believe introducing entirely new stage context transitions is a supported public scenario today, since stage graph/context evolution is framework-owned.
  • If that assumption is correct, strict invoker mapping is aligned with the documented model; fallback for unknown transitions is then an internal compatibility choice, not a required public contract guarantee.

https://docs.particular.net/nservicebus/pipeline/steps-stages-connectors

@mikeminutillo maybe knows more. In theory I could add a fallback invocation, but that would require me to re-introduce generic method creation and currently I think that would be unnecessary.

@danielmarbach
Copy link
Contributor Author

After analyzing the codebase, I can confirm that the core pipeline cannot be extended with custom stages or stage forks via public API, even ignoring the PipelineInvokers optimization.

  1. Custom context creation is blocked
    The BehaviorContext base class is internal (src/NServiceBus.Core/Pipeline/BehaviorContext.cs:10 (https://github.com/Particular/NServiceBus/blob/master/src/NServiceBus.Core/Pipeline/BehaviorContext.cs#L10)). Without inheriting from it, you cannot create custom context implementations required for new stages.
  2. Factory methods only support the built-in types
    The ConnectorContextExtensions class provides public factory methods (src/NServiceBus.Core/Pipeline/ConnectorContextExtensions.cs (https://github.com/Particular/NServiceBus/blob/master/src/NServiceBus.Core/Pipeline/ConnectorContextExtensions.cs)), but only for built-in context types (e.g., CreateRoutingContext(), CreateDispatchContext()). There are no generic factory methods for custom context types.
  3. Custom stages using interfaces only also fails
    Even attempting to create custom stages using only interfaces (StageConnector<ICustomFrom, ICustomTo>) would fail because:
  • The factory extension methods are hardcoded to specific connector type signatures (e.g., StageConnector<ITransportReceiveContext, IIncomingPhysicalMessageContext>)
  • All context implementations are internal and inherit from BehaviorContext (e.g., IncomingPhysicalMessageContext, DispatchContext)
  • Without inheriting from BehaviorContext, custom contexts cannot participate in the pipeline's ContextBag hierarchy
  1. What IS possible via Public API
  • Replacing existing stage connectors using PipelineSettings.Replace() (since StageConnector<TFrom,TTo> is public)
  • Replacing existing stage fork connectors (since StageForkConnector is public)
  • Registering behaviors within existing stages using built-in context types
  1. What is NOT possible
  • Creating new custom stages (StageConnector<ICustomContext, IAnotherContext>)
  • Creating new custom stage forks
  • Any pipeline topology changes beyond replacing existing connectors

While StageConnector, StageForkConnector, and ForkConnector are public abstract classes, they are effectively useless for creating new pipeline stages without the ability to create custom context implementations. The public API constrains users to the existing pipeline topology, which means they can replace behaviors within pre-defined stages or swap out connectors, but cannot extend the pipeline with new stages or forks.

@danielmarbach
Copy link
Contributor Author

Any approvals?

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static Task Next(IBehaviorContext ctx)
{
var nextIndex = ctx.Extensions.AdvanceFrame(out var reachedEnd);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that in order to call next multiple times in a single behavior you would need to create a new context first? For instance, what happens if someone has something like this in their environment?

class RetryDispatchesBehavior : Behavior<IDispatchContext>
{
  public override async Task Invoke(IDispatchContext context, Func<Task> next)
  {
    var attempts = 0;
    while(true)
    {
      try
      {
        attempts++;
        await next();
        break;
      }
      catch
      {
        if(attempts > 2)
           throw;
      }
    }
  }
}

I believe that will work with the current pipeline implementation but here each call to next() will advance the frame, won't it?

I think the only place we do something like this is in the LoadHandlersConnector, where we do create a new context for each invocation. In any stage connector we'd have to be creating a new context (for the next stage).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a pretty common pattern when I created custom behaviors that introduce duplicate message processing anomalies

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SzymonPobiega @mikeminutillo excellent catch. There were no tests that explicitly tested that, so I completely forgot this valid scenario, which would break with this approach. I will mark this one as draft for now to rethink the design and submit a PR to add a test to Core about this expected behavior

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the commit I pushed would fix it and have similar perf characteristics on the execution path but significantly slow down the exception path. I think it could be an OK trade-off, but I want to chew on this a bit further.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the credit goes to @mikeminutillo here. I would probably not notice it if not for his code snippet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is addressed in #7631

Copy link
Member

@SzymonPobiega SzymonPobiega left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I follow how this behavior changes the stack impact. My understanding is that when a behavior calls next(), it moves the control back to the PipelineRunner but the previous behavior needs to still be on the stack because we resume its execution after returning from next(). Or am I missing something?

I was also wondering how much of the performance is gained via "magic" unsafe optimizations vs using a different approach to the pipeline execution.

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static Task Next(IBehaviorContext ctx)
{
var nextIndex = ctx.Extensions.AdvanceFrame(out var reachedEnd);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a pretty common pattern when I created custom behaviors that introduce duplicate message processing anomalies

@danielmarbach
Copy link
Contributor Author

Closed in favour of #7631

@danielmarbach danielmarbach deleted the pipeline branch February 22, 2026 21:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants