Skip to content

Comments

db: migrate aggregation_jobs table to new times#4393

Merged
tgeoghegan merged 4 commits intomainfrom
timg/datastore-time-precision-agg-jobs-table
Feb 24, 2026
Merged

db: migrate aggregation_jobs table to new times#4393
tgeoghegan merged 4 commits intomainfrom
timg/datastore-time-precision-agg-jobs-table

Conversation

@tgeoghegan
Copy link
Contributor

Store client report intervals as INT8RANGE, representing a range of timestamps in time precision units.

To that end, we add SqlIntervalTimePrecision, a simplified version of SqlInterval that gives us conversion to/from INT8RANGE. Once all TSTZRANGEs are converted to INT8RANGE, the old SqlInterval gets deleted and SqlIntervalTimePrecision is renamed.

Part of #4206

@tgeoghegan tgeoghegan requested a review from a team as a code owner February 24, 2026 01:12
@tgeoghegan tgeoghegan added the allow-changed-migrations Override the ci-migrations check to allow migrations that have changed. label Feb 24, 2026
COALESCE($2::TIMESTAMP WITH TIME ZONE - tasks.report_expiry_age * '1 second'::INTERVAL,
'-infinity'::TIMESTAMP WITH TIME ZONE)
FOR UPDATE OF aggregation_jobs SKIP LOCKED LIMIT $3
COALESCE(($3 - tasks.report_expiry_age) / tasks.time_precision, 0)
Copy link
Contributor Author

@tgeoghegan tgeoghegan Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This little bit of math needs close scrutiny. #3 is now in seconds since the epoch, so we're subtracting report_expiry_age (also in seconds, a BIGINT) then dividing by the BIGINT time_precision to get a quantity we can compare to UPPER(aggregation_jobs.client_timestamp_interval). The way we do that in Janus boils down to timestamp / precision_secs (Time::from_seconds_since_epoch) so I think this is OK, because Postgres / is documented to "truncate the result" (https://www.postgresql.org/docs/9.5/functions-math.html).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The math LGTM, but we do have a functionality change here, I think? Previously tasks without a report_expiry_age used -infinity in the comparison, while with the division here we're now getting 0, so our >= is only going to match tasks with positive timestamps, while previously with the result going to -infinity it'd always match.

I guess that's just an edge case not particularly important for the real world... but you did ask me to apply scrutiny here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate the scrutiny. This is worth diving into.

The nuance here is signedness of time values. Previously, we stored SQL TIMESTAMP, which ranges from [-infinity, infinity] (is it possible to have a closed range on infinity? whatever). The intention was to use the smallest possible value so that it will always compare less than UPPER(aggregation_jobs.client_timestamp_interval), which was TIMESTAMP.

Now, UPPER(aggregation_jobs.client_timestamp_interval) is a DAP Time, which ranges from 0, 2^64-1. But we store it as SQL BIGINT which is -2^63, +2^63-1. So should we compare against the smallest possible Time or the smallest possible BIGINT?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Furthermore, TIMESTAMP's zero point does not line up with the Unix epoch that DAP uses, so some of our unit tests did care about negative SQL timestamps. Now, both $3 and UPPER(aggregation_jobs.client_timestamp_interval) are using the Unix epoch as their zero point, so I think it would be fine to use either 0 or x'8000000000000000'::bigint as our fallback value to compare against.

Copy link
Contributor

@jcjones jcjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work here. Thank you for taking this on... I don't feel super strongly about the extra safety so I'll go ahead and hit Approve With Feedback.

COALESCE($2::TIMESTAMP WITH TIME ZONE - tasks.report_expiry_age * '1 second'::INTERVAL,
'-infinity'::TIMESTAMP WITH TIME ZONE)
FOR UPDATE OF aggregation_jobs SKIP LOCKED LIMIT $3
COALESCE(($3 - tasks.report_expiry_age) / tasks.time_precision, 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The math LGTM, but we do have a functionality change here, I think? Previously tasks without a report_expiry_age used -infinity in the comparison, while with the division here we're now getting 0, so our >= is only going to match tasks with positive timestamps, while previously with the result going to -infinity it'd always match.

I guess that's just an edge case not particularly important for the real world... but you did ask me to apply scrutiny here.

Comment on lines 2207 to 2211
Duration::from_time_precision_units(
(end_time - start_time)
.try_into()
.map_err(|_| "interval duration must be positive")?,
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the try_into is doing some lifting here, the subtraction is also somewhat unsafe, and we could make that safe in the degenerate case, so why not?

Suggested change
Duration::from_time_precision_units(
(end_time - start_time)
.try_into()
.map_err(|_| "interval duration must be positive")?,
),
Duration::from_time_precision_units(
(end_time
.checked_sub(start_time)
.ok_or("interval end must be >= start")?)
.try_into()
.map_err(|_| "interval duration must be non-negative")?
),

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could alternately convert both start_time and end_time to a u64 right after decoding them, and do the checked_sub() last.

_: &postgres_types::Type,
out: &mut bytes::BytesMut,
) -> Result<postgres_types::IsNull, Box<dyn std::error::Error + Sync + Send>> {
// Convert the interval start and end to SQL timestamps.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these aren't timestamps

Suggested change
// Convert the interval start and end to SQL timestamps.
// Convert the interval start and end to values in time precision units.

client_timestamp_interval TSTZRANGE NOT NULL,
-- The minimal interval, in time precision units, containing all of client timestamps included
-- in this aggregation job
client_timestamp_interval INT8RANGE NOT NULL,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On line 3 we have an out of date comment that maybe should note that the INT8RANGE is also allowed to be an index (line 288) due to the GiST extension

@tgeoghegan tgeoghegan force-pushed the timg/datastore-time-precision-client-reports-table branch from 4f5c305 to 72be386 Compare February 24, 2026 18:47
@tgeoghegan tgeoghegan force-pushed the timg/datastore-time-precision-agg-jobs-table branch from 6d9946f to f0dc49e Compare February 24, 2026 19:13
Base automatically changed from timg/datastore-time-precision-client-reports-table to main February 24, 2026 22:48
Store client report intervals as `INT8RANGE`, representing a range of
timestamps in time precision units.

To that end, we add `SqlIntervalTimePrecision`, a simplified version of
`SqlInterval` that gives us conversion to/from `INT8RANGE`. Once all
`TSTZRANGE`s are converted to `INT8RANGE`, the old `SqlInterval` gets
deleted and `SqlIntervalTimePrecision` is renamed.

Part of #4206
@tgeoghegan tgeoghegan force-pushed the timg/datastore-time-precision-agg-jobs-table branch from cbb35b8 to 519c7df Compare February 24, 2026 23:08
@tgeoghegan tgeoghegan enabled auto-merge (squash) February 24, 2026 23:27
.map_err(|_| "interval start must be positive")?;
let end_time: u64 = int8_from_sql(end_raw)?
.try_into()
.map_err(|_| "interval start must be positive")?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: typo

Suggested change
.map_err(|_| "interval start must be positive")?;
.map_err(|_| "interval end must be positive")?;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shoot, I jumped the gun. I'll get this in the next PR.

@tgeoghegan tgeoghegan merged commit 63a244f into main Feb 24, 2026
8 checks passed
@tgeoghegan tgeoghegan deleted the timg/datastore-time-precision-agg-jobs-table branch February 24, 2026 23:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

allow-changed-migrations Override the ci-migrations check to allow migrations that have changed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants