db: migrate aggregation_jobs table to new times#4393
Conversation
| COALESCE($2::TIMESTAMP WITH TIME ZONE - tasks.report_expiry_age * '1 second'::INTERVAL, | ||
| '-infinity'::TIMESTAMP WITH TIME ZONE) | ||
| FOR UPDATE OF aggregation_jobs SKIP LOCKED LIMIT $3 | ||
| COALESCE(($3 - tasks.report_expiry_age) / tasks.time_precision, 0) |
There was a problem hiding this comment.
This little bit of math needs close scrutiny. #3 is now in seconds since the epoch, so we're subtracting report_expiry_age (also in seconds, a BIGINT) then dividing by the BIGINT time_precision to get a quantity we can compare to UPPER(aggregation_jobs.client_timestamp_interval). The way we do that in Janus boils down to timestamp / precision_secs (Time::from_seconds_since_epoch) so I think this is OK, because Postgres / is documented to "truncate the result" (https://www.postgresql.org/docs/9.5/functions-math.html).
There was a problem hiding this comment.
The math LGTM, but we do have a functionality change here, I think? Previously tasks without a report_expiry_age used -infinity in the comparison, while with the division here we're now getting 0, so our >= is only going to match tasks with positive timestamps, while previously with the result going to -infinity it'd always match.
I guess that's just an edge case not particularly important for the real world... but you did ask me to apply scrutiny here.
There was a problem hiding this comment.
I appreciate the scrutiny. This is worth diving into.
The nuance here is signedness of time values. Previously, we stored SQL TIMESTAMP, which ranges from [-infinity, infinity] (is it possible to have a closed range on infinity? whatever). The intention was to use the smallest possible value so that it will always compare less than UPPER(aggregation_jobs.client_timestamp_interval), which was TIMESTAMP.
Now, UPPER(aggregation_jobs.client_timestamp_interval) is a DAP Time, which ranges from 0, 2^64-1. But we store it as SQL BIGINT which is -2^63, +2^63-1. So should we compare against the smallest possible Time or the smallest possible BIGINT?
There was a problem hiding this comment.
Furthermore, TIMESTAMP's zero point does not line up with the Unix epoch that DAP uses, so some of our unit tests did care about negative SQL timestamps. Now, both $3 and UPPER(aggregation_jobs.client_timestamp_interval) are using the Unix epoch as their zero point, so I think it would be fine to use either 0 or x'8000000000000000'::bigint as our fallback value to compare against.
jcjones
left a comment
There was a problem hiding this comment.
Great work here. Thank you for taking this on... I don't feel super strongly about the extra safety so I'll go ahead and hit Approve With Feedback.
| COALESCE($2::TIMESTAMP WITH TIME ZONE - tasks.report_expiry_age * '1 second'::INTERVAL, | ||
| '-infinity'::TIMESTAMP WITH TIME ZONE) | ||
| FOR UPDATE OF aggregation_jobs SKIP LOCKED LIMIT $3 | ||
| COALESCE(($3 - tasks.report_expiry_age) / tasks.time_precision, 0) |
There was a problem hiding this comment.
The math LGTM, but we do have a functionality change here, I think? Previously tasks without a report_expiry_age used -infinity in the comparison, while with the division here we're now getting 0, so our >= is only going to match tasks with positive timestamps, while previously with the result going to -infinity it'd always match.
I guess that's just an edge case not particularly important for the real world... but you did ask me to apply scrutiny here.
| Duration::from_time_precision_units( | ||
| (end_time - start_time) | ||
| .try_into() | ||
| .map_err(|_| "interval duration must be positive")?, | ||
| ), |
There was a problem hiding this comment.
While the try_into is doing some lifting here, the subtraction is also somewhat unsafe, and we could make that safe in the degenerate case, so why not?
| Duration::from_time_precision_units( | |
| (end_time - start_time) | |
| .try_into() | |
| .map_err(|_| "interval duration must be positive")?, | |
| ), | |
| Duration::from_time_precision_units( | |
| (end_time | |
| .checked_sub(start_time) | |
| .ok_or("interval end must be >= start")?) | |
| .try_into() | |
| .map_err(|_| "interval duration must be non-negative")? | |
| ), |
There was a problem hiding this comment.
We could alternately convert both start_time and end_time to a u64 right after decoding them, and do the checked_sub() last.
| _: &postgres_types::Type, | ||
| out: &mut bytes::BytesMut, | ||
| ) -> Result<postgres_types::IsNull, Box<dyn std::error::Error + Sync + Send>> { | ||
| // Convert the interval start and end to SQL timestamps. |
There was a problem hiding this comment.
these aren't timestamps
| // Convert the interval start and end to SQL timestamps. | |
| // Convert the interval start and end to values in time precision units. |
| client_timestamp_interval TSTZRANGE NOT NULL, | ||
| -- The minimal interval, in time precision units, containing all of client timestamps included | ||
| -- in this aggregation job | ||
| client_timestamp_interval INT8RANGE NOT NULL, |
There was a problem hiding this comment.
On line 3 we have an out of date comment that maybe should note that the INT8RANGE is also allowed to be an index (line 288) due to the GiST extension
4f5c305 to
72be386
Compare
6d9946f to
f0dc49e
Compare
Store client report intervals as `INT8RANGE`, representing a range of timestamps in time precision units. To that end, we add `SqlIntervalTimePrecision`, a simplified version of `SqlInterval` that gives us conversion to/from `INT8RANGE`. Once all `TSTZRANGE`s are converted to `INT8RANGE`, the old `SqlInterval` gets deleted and `SqlIntervalTimePrecision` is renamed. Part of #4206
cbb35b8 to
519c7df
Compare
| .map_err(|_| "interval start must be positive")?; | ||
| let end_time: u64 = int8_from_sql(end_raw)? | ||
| .try_into() | ||
| .map_err(|_| "interval start must be positive")?; |
There was a problem hiding this comment.
nit: typo
| .map_err(|_| "interval start must be positive")?; | |
| .map_err(|_| "interval end must be positive")?; |
There was a problem hiding this comment.
Shoot, I jumped the gun. I'll get this in the next PR.
Store client report intervals as
INT8RANGE, representing a range of timestamps in time precision units.To that end, we add
SqlIntervalTimePrecision, a simplified version ofSqlIntervalthat gives us conversion to/fromINT8RANGE. Once allTSTZRANGEs are converted toINT8RANGE, the oldSqlIntervalgets deleted andSqlIntervalTimePrecisionis renamed.Part of #4206