Genericise Flow Metrics' RetentionWindow by yaauie · Pull Request #18624 · elastic/logstash

yaauie · 2026-01-22T23:32:45Z

Release notes

[rn: skip]

What does this PR do?

This NET ZERO CHANGE refactor extracts the shareable internals of FlowMetrics.RetentionWindow into its own abstract class so that they can be shared by the incoming histogram implementation.

The initial implementation of FlowMetrics.RetentionWindow was heavily optimized around the nature of rates of change being calculable from first and last entries without a need for intermediate values; it preferred throwing captures away whenever doing so didn't impact its ability to meet the retention policy.

This refactor extracts RetentionWindow as an abstract generic class, and moves the responsibility for merging two captures and for calculating the value into subclasses so that the implementations can provide meaningful specifics without knowing anything else about the linked list data-structures or atomic-operations.

The ExtendedFlowMetric provided its lifetime values in a hacky way that couldn't be reused, so I also migrated the FlowMetricRetentionPolicy to a form that supports a lifetime value that effectively has an infinitely large resolution to avoid intermediate captures.

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
~~[ ] I have made corresponding changes to the documentation~~
~~[ ] I have made corresponding change to the default configuration files (and/or docker env variables)~~
I have added tests that prove my fix is effective or that my feature works

…ed to compute a flow metric.

…asses od DatapointCapture. - Added type paramters with a constrant to be a subclass of DatapointCapture which is something that have a datapoint capture time - Peeled out baseline method into ExtendedFlowMetric, it's a wrapper on returnHead(nanos)

…me impl

github-actions · 2026-01-22T23:32:56Z

🤖 GitHub comments

Just comment with:

run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)
/run exhaustive tests : Run the exhaustive tests Buildkite pipeline.

mergify · 2026-01-22T23:33:23Z

This pull request does not have a backport label. Could you fix it @yaauie? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit.
If no backport is necessary, please add the backport-skip label

yaauie · 2026-01-22T23:40:28Z

logstash-core/src/main/java/org/logstash/instrument/metrics/RetentionWindow.java

+        NodeStagingPair<CAPTURE> oldTailWithStaging = this.tail.getAndUpdate((committedNodeStagePair) -> {
+            final Node<CAPTURE> committedNode = committedNodeStagePair.committed;
+            final CAPTURE committedCapture = committedNode.capture;
+            final CAPTURE stagedCapture = committedNodeStagePair.staged;
+
+            final CAPTURE captureToCommit;
+            final CAPTURE captureToStage;
+
+            if (Objects.isNull(committedCapture)) {
+                // if we don't have a commit yet, commit.
+                if (Objects.nonNull(stagedCapture)) {
+                    captureToCommit = stagedCapture;
+                    captureToStage = newestCapture;
+                } else {
+                    captureToCommit = newestCapture;
+                    captureToStage = null;
+                }
+            } else if (Objects.nonNull(stagedCapture) && committedCapture.nanoTime() < policy.commitBarrierNanos(newestCapture.nanoTime())) {
+                // if the gap between newest and committed is bigger than resolution, commit staged and stage new
+                captureToCommit = stagedCapture;
+                captureToStage = newestCapture;
+            } else {
+                // otherwise merge into our stage
+                captureToCommit = committedCapture;
+                captureToStage = mergeCaptures(stagedCapture, newestCapture);
+            }
+
+            // apply our changes, keeping the committed Node if we're committing its capture.
+            final Node<CAPTURE> nodeToCommit = (Objects.equals(committedCapture, captureToCommit) ? committedNode : new Node<>(captureToCommit));
+            newTailWithStaging.set(nodeToCommit, captureToStage);
+
+            return newTailWithStaging;
+        });


Additional context:

This is a re-implementation of the algorithm, primarily focused on using an atomically-swappable pair of values (a committed tail-Node<CAPTURE> plus a staged CAPTURE) instead of two separately-atomic values so that we can use AtomicReference#getAndUpdate to atomically swap a new pair in.

If the action results in a new node being in this pair, we know that the node is detached (not linked to by the previous tail) and must link the former tail node to it.

The code here defers out to the abstract function mergeCaptures so that the implementation can provide a meaningful merge action. In the case of FlowMetrics (which are okay dropping intermediate values), the implementation simply selects the youngest of the two captures.

yaauie · 2026-01-22T23:41:16Z

logstash-core/src/test/java/org/logstash/instrument/metrics/ExtendedFlowMetricTest.java

        // ensure we are fully-compact.
-        assertThat(flowMetric.estimateCapturesRetained(), is(lessThan(250)));
-        assertThat(flowMetric.estimateExcessRetained(this::maxRetentionPlusMinResolutionBuffer), is(equalTo(Duration.ZERO)));
+        assertThat(flowMetric.estimateCapturesRetained(), is(lessThan(252)));


The lifetime metric retains up to 2 nodes, and wasn't previously counted.

yaauie · 2026-01-22T23:45:22Z

logstash-core/src/main/java/org/logstash/instrument/metrics/ExtendedFlowMetric.java

-        private final AtomicReference<Node> tail;
-        private final AtomicReference<Node> head;
-        private final FlowMetricRetentionPolicy policy;
+    static class FlowRetentionWindow extends RetentionWindow<FlowCapture, Double> {


The diff is a little difficult to read here, but our implementation that subclasses the now-abstract RetentionWindow contains all of the logic for merging FlowCaptures and calculating a value:

static class FlowRetentionWindow extends RetentionWindow<FlowCapture, Double> { FlowRetentionWindow(FlowMetricRetentionPolicy policy, FlowCapture zeroCapture) { super(policy, zeroCapture); } @Override FlowCapture mergeCaptures(FlowCapture oldCapture, FlowCapture newCapture) { if (oldCapture == null) { return newCapture; } if (newCapture == null) { return oldCapture; } return (oldCapture.nanoTime() > newCapture.nanoTime()) ? oldCapture : newCapture; } @Override Optional<Double> calculateValue() { return calculateFromBaseline((compareCapture, baselineCapture) -> { if (compareCapture == null) { return Optional.empty(); } if (baselineCapture == null) { return Optional.empty(); } final OptionalDouble rate = calculateRate(compareCapture, baselineCapture); // convert from OptionalDouble to Optional<Double> return rate.isPresent() ? Optional.of(rate.getAsDouble()) : Optional.empty(); }); } }

-- ExtendedFlowMetric.FlowRetentionWindow@f2a18b67

andsel

Very nice refactoring, I think that calculateValue is the place to compute the aggregation of histograms, for the histogram metric.
I've left a couple of suggestions to have your feedback.

...h-core/src/main/java/org/logstash/instrument/metrics/BuiltInFlowMetricRetentionPolicies.java

logstash-core/src/main/java/org/logstash/instrument/metrics/RetentionWindow.java

Co-authored-by: Andrea Selva <selva.andre@gmail.com>

yaauie · 2026-02-05T18:48:46Z

I think that calculateValue is the place to compute the aggregation of histograms, for the histogram metric.

@andsel: I had seen your previous work on the HdrHistogram front that stored interval histograms (containing only data-points from the window), which would mean that the implementation there would need to walk the linked list to calculate, making the runtime query cost fairly high.

This would be made significantly easier if the values stored for histogram captures were lifetime values, since the HdrHistogram#subtract would allow us to similarly use only the head and youngest captures to calculate a value. It would also reduce the capture-time cost, since we wouldn't need to be merging captures into the staged capture and could simply select the youngest capture between the capture-to-append and the existing staged capture.

elasticmachine · 2026-02-05T19:18:57Z

💚 Build Succeeded

Buildkite Build
Commit: c5c4bc5

History

💔 Build #4280 failed d9d4751
💚 Build #4252 succeeded 188023d
💛 Build #4139 was flaky f2a18b6

cc @yaauie @andsel

yaauie · 2026-02-05T19:30:52Z

logstash-core/src/main/java/org/logstash/instrument/metrics/RetentionWindow.java

+     * @param newCapture a new capture to apply on top of the base
+     * @return a {@link CAPTURE} representing both provided captures
+     */
+    abstract CAPTURE mergeCaptures(final CAPTURE oldCapture, final CAPTURE newCapture);


note: this method only really exists because the proposed hdrhistogram implementation stores interval histograms instead of lifetime values, so it needs to merge fresh captures into the staged capture when the staged capture isn't yet old enough to commit.

If the hdrhistogram were to be implemented using lifetime values, then this could be simplified to simply select the later capture in both cases.

The histogram PR does the staging on the client side. It can't create a snapshot for each recorded value and then let the retained window do the merge of these micro snapshot. Instead, it accumulate the measures in the recorder, and that verify when it needs to create a new snapshot, checking that the resolution nanos interval passed.
I don't think that we have to create a new snapshot for each batch size that we measure, that would cost a lot, in particular in memory usage terms.

andsel · 2026-02-06T14:18:20Z

Hi @yaauie reviewing your comment about subtract and lifetime histograms I've some questions.

This would be made significantly easier if the values stored for histogram captures were lifetime values, since the HdrHistogram#subtract would allow us to similarly use only the head and youngest captures

If I understand correctly you propose to not store incremental deltas of histograms, but snapshots that are absolute pictures of a lifelong accumulation and then using subtract to compute the time delimited view. The idea could be interesting but as I can understand is not feasible with Histogram's recorder, which provides only the snapshots between intervals, so it returns deltas and not absolute histograms. From official HdrHistogram documentation, Recorder is the preferred way to track values in highly concurrent environments.

The metric code has to be fast on taking action but can be slower on the reading side.

the implementation there would need to walk the linked list to calculate, making the runtime query cost fairly high.

Regarding the doubts about the navigation of the list of snapshot when getValue() is queried to provide a metric.
This operation is typically invoked externally via the HTTP API, meaning the call frequency is low (not many calls per millisecond), and it occurs on a "cold path," offline from the primary pipeline workers that record the values.

The RetentionWindow lists contain the following approximate number of snapshot nodes:

Last 1 minute: 20 nodes (every 3 seconds)
Last 5 minutes: 20 nodes (every 15 seconds)
Last 15 minutes: 30 nodes (every 30 seconds)
Last 1 hour: 60 nodes (every minute)
Last 24 hours: 96 nodes (every 15 minutes)

Given the current plan to use only the first three windows for this specific metric, traversing a maximum of 30 nodes is not anticipated to cause performance issues, especially considering the operation is on the cold path.

Let me know what do you think.

andsel

Code changes looks good to me, but have left a couple of comments:

one related to the need to the merge operation.
the other about to use the subtract from histogram and the possible related performance issue on query side.

yaauie · 2026-02-09T21:36:25Z

The idea could be interesting but as I can understand is not feasible with Histogram's recorder, which provides only the snapshots between intervals, so it returns deltas and not absolute histograms. From official HdrHistogram documentation, Recorder is the preferred way to track values in highly concurrent environments.

and

The metric code has to be fast on taking action but can be slower on the reading side.

I agree. But I think we can have the best of both worlds.

Flow metrics are typically registered with a periodic task to perform captures of the underlying metric, instead of making the capture logic be part of the write-time path for that metric. This is intentional, and keeps all of the complexities of managing that state out of the write-time path so that we avoid adding jitter to actual event processing.

Using a similar separation here would be helpful, and would also ensure that we have frequent-enough captures even if we have periods where a given histogram doesn't have additional values being tracked.

The current proposal in #18503 is to have one Recorder instance per RetentionWindow, each of which may conditionally block the current thread after writing to the recorder in order to perform the capture if the retention window's policy requires it. This means that a flow metric with 3 retention windows would have 3 opportunities for a thread to be blocked at write time to perform additional calculations (appending the capture to the window).

Since lifetime values would not need to be differentiated by retention window, we could have a single LifetimeHistogramMetric for each flow metric that uses a recorder internally (so that only a single Recorder#recordValue is on the writer's path). At value-read time (e.g., when the periodic capture task is invoked), we could use Recorder#getIntervalHistogram and merge the result into our lifetime value. A ~~non-blocking~~ spike example of that approach that takes care to return immutable wrappers is here.

While I brought this up in terms of how it enables a latest-minus-baseline calculation in the retention window, I think that the primary value is how it simplifies the metric write-time path.

andsel

LGTM

In the light of the #18624 (comment) this PR LGTM and I'll create a PR to follow the guidance on implementing Lifetime Histogram Metric that replace #18503

andsel and others added 14 commits December 17, 2025 14:25

Introduced common base class to group all kind of data that can be us…

ae0eed1

…ed to compute a flow metric.

Extracted flow metric's RetentionWindow in its own class

683e7c5

Moved selectNewerCapture to newly created DatapointCapture

ad3f56f

Fixed method naming

a9caff4

Covered type cast with warn log (really unrealistic)

747e637

Added license header to source code file

6208424

Minor, added javadoc comment

b0b0721

Fixed bad method renaming

77c8b7a

Minor, fixed synthax

4ceaf14

checkpoint

ae7fa8b

refactor flow metric retention windows to be relative, provide lifeti…

804842d

…me impl

retention window cleanup

c3c1470

flow: make RetentionWindow responsible for calculating value

f2a18b6

yaauie requested a review from andsel January 22, 2026 23:32

yaauie added the refactoring label Jan 22, 2026

yaauie commented Jan 22, 2026

View reviewed changes

andsel assigned yaauie Jan 23, 2026

andsel requested changes Jan 28, 2026

View reviewed changes

yaauie added 2 commits February 3, 2026 17:47

refactor for clarity, breaking lambda out into function

4fb279f

make RetentionWindow use own logger

188023d

robbavey mentioned this pull request Feb 4, 2026

refactor flow metrics to being able to handle also histograms and not just scalar values. #17974

Closed

robbavey assigned andsel Feb 4, 2026

yaauie and others added 2 commits February 5, 2026 10:36

Apply suggestion from @andsel

8641196

Co-authored-by: Andrea Selva <selva.andre@gmail.com>

Apply suggestion from @andsel

d9d4751

Co-authored-by: Andrea Selva <selva.andre@gmail.com>

use NodeStagingPair.Mutable correctly to fix build

c5c4bc5

yaauie requested a review from andsel February 5, 2026 18:55

yaauie commented Feb 5, 2026

View reviewed changes

andsel reviewed Feb 6, 2026

View reviewed changes

andsel self-requested a review February 10, 2026 09:20

andsel approved these changes Feb 10, 2026

View reviewed changes

yaauie merged commit b05f0dd into elastic:main Feb 10, 2026
12 checks passed

yaauie deleted the flow-metrics-genericise branch February 10, 2026 16:38

Conversation

yaauie commented Jan 22, 2026

Release notes

What does this PR do?

Checklist

Uh oh!

github-actions bot commented Jan 22, 2026

🤖 GitHub comments

Uh oh!

mergify bot commented Jan 22, 2026

Uh oh!

yaauie Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

yaauie Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

yaauie Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

andsel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yaauie commented Feb 5, 2026

Uh oh!

elasticmachine commented Feb 5, 2026

💚 Build Succeeded

History

Uh oh!

yaauie Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

andsel Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

andsel commented Feb 6, 2026

Uh oh!

andsel left a comment

Choose a reason for hiding this comment

Uh oh!

yaauie commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andsel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yaauie commented Feb 9, 2026 •

edited

Loading