Conversation
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
|
|
||
|
|
||
| -- | Get a pattern by ID | ||
| getLogPatternById :: DB es => LogPatternId -> Eff es (Maybe LogPattern) |
There was a problem hiding this comment.
use the _selectWhere pattern, instead of enumerating the fields one by one
| Issues.LogPattern -> | ||
| "Describe this log pattern issue and its implications.\n" | ||
| <> "Title: " | ||
| <> issue.title | ||
| <> "\n" | ||
| <> "Service: " | ||
| <> fromMaybe "unknown-service" issue.service | ||
| Issues.LogPatternRateChange -> | ||
| "Describe this log pattern rate change and its implications.\n" | ||
| <> "Title: " | ||
| <> issue.title | ||
| <> "\n" | ||
| <> "Service: " | ||
| <> fromMaybe "unknown-service" issue.service | ||
|
|
There was a problem hiding this comment.
Please use the [text|] or any other quasiquote. so its easier to visually see the pattern of this message without the haskell semigroup noise
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
| SELECT | ||
| lp.id, | ||
| lp.project_id, | ||
| lp.log_pattern, | ||
| lp.pattern_hash, | ||
| lp.baseline_state, | ||
| lp.baseline_volume_hourly_mean, | ||
| lp.baseline_volume_hourly_stddev, | ||
| COALESCE(counts.current_count, 0)::INT AS current_hour_count | ||
| FROM apis.log_patterns lp | ||
| LEFT JOIN ( | ||
| SELECT log_pattern, COUNT(*) AS current_count | ||
| FROM otel_logs_and_spans | ||
| WHERE project_id = ? | ||
| AND timestamp >= date_trunc('hour', NOW()) | ||
| AND log_pattern IS NOT NULL | ||
| GROUP BY log_pattern | ||
| ) counts ON counts.log_pattern = lp.log_pattern | ||
| WHERE lp.project_id = ? | ||
| AND lp.state != 'ignored' AND lp.baseline_state = 'established' |
There was a problem hiding this comment.
How is log_pattern supposed to join on otel_logs_and_spans when they're not in the same database?
Or is log_patterns supposed to be a timeseries table in timefusion as well? if thats the case then you cant make queries on timeseries tables that dont depend on timestamp range
There was a problem hiding this comment.
Oh and you never join on two time series tables. some databases might support the join operation, but the performance is always horrible in that case
src/Models/Apis/RequestDumps.hs
Outdated
There was a problem hiding this comment.
same as here. how is log_pattern joining on otel_logs_and_spans?
There was a problem hiding this comment.
otel_logs_and_spans has a log_pattern column
There was a problem hiding this comment.
Is log_pattern going to be in timefusion or postgres? if its in timefusion, you do joins. And you can't query it without time range being part of the query.
|
|
||
| -- | Get pattern stats from otel_logs_and_spans | ||
| -- Returns median and MAD (Median Absolute Deviation) for robust baseline calculation | ||
| getPatternStats :: DB es => Projects.ProjectId -> Text -> Int -> Eff es (Maybe PatternStats) |
There was a problem hiding this comment.
why are you not using our widgets or atleast KQL for stats and numbers? isnt this for display?
There was a problem hiding this comment.
This is for detecting spikes
|
|
||
| -- | Calculate baselines for log patterns | ||
| -- Uses hourly counts from otel_logs_and_spans over the last 7 days | ||
| calculateLogPatternBaselines :: Projects.ProjectId -> ATBackgroundCtx () |
There was a problem hiding this comment.
I don't want to support magic alerts. If we're detecting spikes, we should implement an alert system that our users can enable on any metric as well, so its the same code anad logic for all cases. Not magic logic we run in the background.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
src/Models/Apis/Issues.hs
Outdated
There was a problem hiding this comment.
Why did you set these to 0?
There was a problem hiding this comment.
Those are not part of the new issues table.
There was a problem hiding this comment.
Why are we adding it to the query?
There was a problem hiding this comment.
Use the text quasiquotes here.
| @@ -0,0 +1,68 @@ | |||
| BEGIN; | |||
|
|
|||
| CREATE TABLE IF NOT EXISTS apis.log_patterns ( | |||
There was a problem hiding this comment.
We will support patterns on different fields not just the default body/message field. This table doesn't seem aware of this expectation.
| CREATE INDEX IF NOT EXISTS idx_log_patterns_last_seen ON apis.log_patterns(project_id, last_seen_at DESC); | ||
| CREATE INDEX IF NOT EXISTS idx_log_patterns_service ON apis.log_patterns(project_id, service_name); | ||
|
|
||
| CREATE OR REPLACE FUNCTION apis.new_log_pattern_proc() RETURNS trigger AS $$ |
There was a problem hiding this comment.
Again, the question remains. Is this a timeseries table or regular table? If its a timeseries table (to be on timefusion), then it won't be in the same database as where the background jobs is, and hence would be unable to queue jobs from within the db.
There was a problem hiding this comment.
It's a regular table.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
|
Rename patterns function from 15mins to 5mins. |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
2f2da2d to
5a609a9
Compare
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
e084072 to
5791763
Compare
Code Review: PR #300 — Log PatternsOverall this is a solid feature addition with good architectural decisions (batch DB queries, z-score anomaly detection, Effectful migration). Here are concrete issues and improvements, with a focus on conciseness and using existing project tools. Bugs / Correctness
-- Missing comma before `totals AS (`
mad_calc AS (...)
totals AS ( -- ← needs a leading commaThe query at
-- BackgroundJobs.hs
detectLogPatternSpikes pid authCtx = do
...
let anomalyIds = V.fromList $ map (\(pid', _, _, _, _, _) -> pid') anomalyData
PG.query [sql| SELECT count(*) from otel_logs_and_spans WHERE project_id = ? AND timestamp >= now() - interval '7 days' |] (Only pid)
-- LogPatterns.hs:374
AND lp.source_field = 'body'Comment says "only body patterns for now" but Tier 2 patterns (
createLogPatternIssue :: (Time :> es, UUIDEff :> es) => ...
Code Conciseness / Style
The refactored
UPDATE apis.log_patterns SET state = 'acknowledged' ... WHERE pattern_hash = ANY(?)Pattern hashes are project-scoped in the unique index but this query does not filter by
getLogPatterns pid limit offset = PG.query (_selectWhere @LogPattern [[DAT.field| project_id |]] <> " ORDER BY last_seen_at DESC LIMIT ? OFFSET ?") (pid, limit, offset)This is fine for now but the function accepts
deriving (AE.FromJSON, AE.ToJSON)
via DAE.CustomJSON '[DAE.OmitNothingFields, DAE.FieldLabelModifier '[DAE.CamelToSnake]] LogPatternState
Long one-liner in mkIssue projectId LogPatternRateChange lp.patternHash lp.patternHash lp.serviceName (direction == "spike" && ...) severity ("Log Pattern " <> ...) ("Log pattern volume " <> ...) "n/a" rateChangeDataThis call at
pure $ drop skip sorted
The diff adds
createLogGroup templateTokens templateString logId now =
LogGroup { ..., exampleLog = templateString, ... }On creation, Performance
Relude.when (length patterns == 1000) $ Log.logWarn ...
let established = sum [1 :: Int | lp <- patterns, ...]Two O(n) traversals (once for the guard, once for
This is the heaviest query in the PR. The new partial index Migration
ALTER TABLE apis.issues ADD COLUMN IF NOT EXISTS target_hash TEXT NOT NULL DEFAULT '';The empty-string default allows the Minor
|
Add log pattern extraction from otel_logs_and_spans using the Drain algorithm, with baseline volume tracking (median + MAD) and automatic spike detection via z-score thresholds. New patterns and rate changes create issues with LLM-generated titles and descriptions.
- Fix N+1 query: batch getBatchPatternStats replaces per-pattern loop - Break Utils<->RequestDumps circular dependency (inline count query) - Rename UpsertPattern.pattern to logPattern (PatternSynonyms conflict) - Move issue creation from IO to Eff (UUIDEff + Time effects, no liftIO) - Change Issue.sourceType from Text to Maybe Text (NULL instead of "") - Add otel_logs_and_spans log_pattern index for query performance - Move ALTER TYPE ADD VALUE outside transaction block (PG < 12 compat) - Inline trace_id into CREATE TABLE (remove ALTER TABLE ADD COLUMN) - Add monitoring logs to baseline calculation and spike detection - Style: when(not...) -> unless, remove trailing blanks in DrainSpec
Use HashMap for O(1) metadata lookup in processPatterns, replace correlated subqueries with CTE join in getBatchPatternStats, fix selectIssues to return actual affected_requests/affected_clients instead of hardcoded zeros, and flatten nested if/else in fetchLogPatterns into guards.
5791763 to
59e1333
Compare
PR Review: Log Patterns (#300)Good overall structure. The feature is well-scoped and the new Bugs / Correctness
) -- end of mad_calc CTE
totals AS ( -- ← missing comma before this CTELine ~1008 in the diff. The
The volume check ( Code Quality / ConcisenessNamed-field tuple anti-pattern in let anomalyData = flip mapMaybe patternsWithRates \lpRate ->
...
in direction <&> \dir -> (lpRate.patternId, lpRate.patternHash, currentRate, mean, stddev, dir)
forM_ anomalyData \(patternId, patternHash, currentRate, mean, stddev, direction) -> ...A 6-tuple is hard to follow and error-prone. The intermediate Redundant
Repeated This string appears 6 times across
The new
let sourceField = bool "body" kind (kind /= "log")This means: if
UPDATE apis.log_patterns SET state = 'acknowledged' WHERE pattern_hash = ANY(?)Pattern hashes could theoretically collide across projects. Adding
patterns <- LogPatterns.getLogPatterns pid 1000 0
Relude.when (length patterns == 1000) $ Log.logWarn "Pattern limit reached ..."The magic number Test pattern matching is verbose In SQL / Migration
The Haskell
Existing rows will get
Minor
|
Saving and presenting log patterns in ui and log patterns anomaly detection
Closes #
How to test