RFC: Add ResamplerStack to coordinate stacked resamplers #1348

Marenz · 2026-01-21T13:58:20Z

This PR proposes a solution to the stacked resampler timing issue. I'm opening this as an RFC to discuss the approach before finalizing.

Problem

When resampling already-resampled data (e.g., 1s samples → 15min aggregates), there's a race condition at window boundaries. If both resamplers fire at the same moment, the higher-level resampler may process BEFORE the lower-level one has emitted its boundary sample.

Minimal reproduction

import asyncio
from datetime import datetime, timedelta, timezone
from frequenz.channels import Broadcast
from frequenz.quantities import Quantity
from frequenz.sdk.timeseries import Sample, ResamplerConfig
from frequenz.sdk.timeseries._resampling._resampler import Resampler

async def demo():
    samples_received = []

    def track_samples(samples, config, props):
        samples_received.append(list(samples))
        return sum(v for _, v in samples) / len(samples) if samples else float("nan")

    # First resampler: 2s periods
    first = Resampler(ResamplerConfig(
        resampling_period=timedelta(seconds=2),
        max_data_age_in_periods=1.0,
    ))

    # Second resampler: 4s periods (consumes first's output)
    second = Resampler(ResamplerConfig(
        resampling_period=timedelta(seconds=4),
        max_data_age_in_periods=1.0,
        resampling_function=track_samples,
    ))

    raw_chan = Broadcast[Sample[Quantity]](name="raw")
    inter_chan = Broadcast[Sample[Quantity]](name="inter")

    async def forward(sample):
        await inter_chan.new_sender().send(sample)

    first.add_timeseries("first", raw_chan.new_receiver(), forward)
    second.add_timeseries("second", inter_chan.new_receiver(), lambda s: None)

    # Send samples
    ts = datetime.now(timezone.utc)
    for i in range(1, 9):
        await raw_chan.new_sender().send(
            Sample(ts + timedelta(seconds=i), Quantity(float(i)))
        )

    # t=4: Both fire concurrently - THIS IS THE BUG
    await asyncio.sleep(4.0)
    await asyncio.gather(
        second.resample(one_shot=True),  # Runs before first emits!
        first.resample(one_shot=True),
    )

    # Expected: 2 samples per window
    # Actual: 1 sample per window (boundary sample missing)
    print(f"Samples per window: {[len(s) for s in samples_received]}")

Impact

For 15-minute windows losing 1 second: ~0.1% data loss
For aggregation functions like sum or max: error compounds
For shorter higher-level periods: more significant impact

Proposed Solution

This PR adds:

Resampler.trigger(timestamp) - Allows external control of when resampling occurs, without waiting for the internal timer.
ResamplerStack - Coordinates multiple resamplers:
- Executes them in dependency order (lower-level first)
- Adds yields between each to ensure channel delivery
- For continuous mode: uses a single GCD-based timer

Usage

from frequenz.sdk.timeseries import Resampler, ResamplerConfig, ResamplerStack

first = Resampler(ResamplerConfig(resampling_period=timedelta(seconds=2)))
second = Resampler(ResamplerConfig(resampling_period=timedelta(seconds=4)))

# Stack them (lower-level first)
stack = ResamplerStack([first, second])

# Run - handles ordering automatically
await stack.resample()

Alternatives Considered

Add asyncio.sleep(0) yields inside Resampler.resample() - Tried this but it doesn't reliably solve the problem because both resamplers enter the timer loop before either processes.
Emit samples slightly before window boundary - Would require changing timestamp semantics.
Use open intervals (samples at exactly window_end go to next window) - Breaking change to current behavior.
Have higher-level resamplers wait longer - Fragile, doesn't guarantee ordering.

Questions for Discussion

Is ResamplerStack the right abstraction? Should this be handled differently?
Should the continuous mode use WallClockTimer instead of Timer for better alignment?
Should we also provide a way to automatically detect stacked resamplers (e.g., via channel inspection)?
Is exporting Resampler publicly the right approach, or should users only use ResamplerStack?

Add Resampler.trigger() method and ResamplerStack class to solve the timing issue when stacking resamplers (resampling already-resampled data). When both resamplers fire simultaneously, the higher-level one may process before the lower-level one emits its boundary sample. Signed-off-by: Mathias L. Baumann <mathias.baumann@frequenz.com>

Marenz requested a review from a team as a code owner January 21, 2026 13:58

Marenz requested review from shsms and removed request for a team January 21, 2026 13:58

github-project-automation bot added this to Python SDK Roadmap Jan 21, 2026

github-project-automation bot moved this to To do in Python SDK Roadmap Jan 21, 2026

github-actions bot added part:tests Affects the unit, integration and performance (benchmarks) tests part:data-pipeline Affects the data pipeline labels Jan 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Add ResamplerStack to coordinate stacked resamplers #1348

RFC: Add ResamplerStack to coordinate stacked resamplers #1348

Uh oh!

Marenz commented Jan 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

RFC: Add ResamplerStack to coordinate stacked resamplers #1348

Are you sure you want to change the base?

RFC: Add ResamplerStack to coordinate stacked resamplers #1348

Uh oh!

Conversation

Marenz commented Jan 21, 2026

Problem

Minimal reproduction

Impact

Proposed Solution

Usage

Alternatives Considered

Questions for Discussion

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant