Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions src/frequenz/sdk/timeseries/_resampling/_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,15 +94,15 @@ class ResamplerConfig:
It must be a positive time span.
"""

max_data_age_in_periods: float = 3.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this value is leading to unexpected behavior, but this would be a very intrusive change which would result in many more NaN values in deployments. If we want to change it we could make it a required parameter for migration purpose, and later introduce the new default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

max_data_age_in_periods: float = 1.0
"""The maximum age a sample can have to be considered *relevant* for resampling.

Expressed in number of periods, where period is the `resampling_period`
if we are downsampling (resampling period bigger than the input period) or
the *input sampling period* if we are upsampling (input period bigger than
the resampling period).

It must be bigger than 1.0.
It must be bigger or equal than 1.0.

Example:
If `resampling_period` is 3 seconds, the input sampling period is
Expand Down
7 changes: 4 additions & 3 deletions src/frequenz/sdk/timeseries/_resampling/_resampler.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
import itertools
import logging
import math
from bisect import bisect
from bisect import bisect, bisect_left
from collections import deque
from datetime import datetime, timedelta, timezone
from typing import assert_never
Expand Down Expand Up @@ -437,7 +437,7 @@ def resample(self, timestamp: datetime) -> Sample[Quantity]:
)
minimum_relevant_timestamp = timestamp - period * conf.max_data_age_in_periods

min_index = bisect(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the behavior how to resample, i.e. left or right open and the labeling should be config parameters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make left or right opened configurable with the corresponding label, such as:
right open: [t,t+1) -> labeled as t
left open: (t-1,t] -> labeled as t (the current behavior)
Or do you want to allow the user to additionally do something like:
right open, label in the end: [t,t+1) -> labeled as t+1
left open, label in the beginning:(t-1,t] -> labeled as t-1
Because I think the later could lead to a lot of confusion (not sure if its ever needed)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the latter is also reasonable options (see e.g. https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.resample.html), but don't see a strong reason to implement this now if not needed. If it's well-documented, the users can also adjust the timestamps trivially. So your proposal sounds good to me.

min_index = bisect_left(
self._buffer,
minimum_relevant_timestamp,
key=lambda s: s[0],
Expand All @@ -458,7 +458,8 @@ def resample(self, timestamp: datetime) -> Sample[Quantity]:
if relevant_samples
else None
)
return Sample(timestamp, None if value is None else Quantity(value))
sample_time = timestamp - conf.resampling_period
return Sample(sample_time, None if value is None else Quantity(value))

def _log_no_relevant_samples(
self, minimum_relevant_timestamp: datetime, timestamp: datetime
Expand Down
Loading