Skip to content

Log patterns: Log patterns related code#300

Open
dawkaka wants to merge 6 commits intomasterfrom
log-patterns-chp
Open

Log patterns: Log patterns related code#300
dawkaka wants to merge 6 commits intomasterfrom
log-patterns-chp

Conversation

@dawkaka
Copy link
Contributor

@dawkaka dawkaka commented Jan 22, 2026

Saving and presenting log patterns in ui and log patterns anomaly detection

  • Unusual increase in log patterns within a time window
  • New detected log patterns

Closes #

How to test

@claude

This comment was marked as outdated.

@claude

This comment was marked as outdated.



-- | Get a pattern by ID
getLogPatternById :: DB es => LogPatternId -> Eff es (Maybe LogPattern)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the _selectWhere pattern, instead of enumerating the fields one by one

Comment on lines 250 to 264
Issues.LogPattern ->
"Describe this log pattern issue and its implications.\n"
<> "Title: "
<> issue.title
<> "\n"
<> "Service: "
<> fromMaybe "unknown-service" issue.service
Issues.LogPatternRateChange ->
"Describe this log pattern rate change and its implications.\n"
<> "Title: "
<> issue.title
<> "\n"
<> "Service: "
<> fromMaybe "unknown-service" issue.service

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the [text|] or any other quasiquote. so its easier to visually see the pattern of this message without the haskell semigroup noise

@claude

This comment was marked as outdated.

@claude

This comment was marked as outdated.

@claude

This comment was marked as outdated.

@claude

This comment was marked as outdated.

@claude

This comment was marked as outdated.

Comment on lines 288 to 307
SELECT
lp.id,
lp.project_id,
lp.log_pattern,
lp.pattern_hash,
lp.baseline_state,
lp.baseline_volume_hourly_mean,
lp.baseline_volume_hourly_stddev,
COALESCE(counts.current_count, 0)::INT AS current_hour_count
FROM apis.log_patterns lp
LEFT JOIN (
SELECT log_pattern, COUNT(*) AS current_count
FROM otel_logs_and_spans
WHERE project_id = ?
AND timestamp >= date_trunc('hour', NOW())
AND log_pattern IS NOT NULL
GROUP BY log_pattern
) counts ON counts.log_pattern = lp.log_pattern
WHERE lp.project_id = ?
AND lp.state != 'ignored' AND lp.baseline_state = 'established'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is log_pattern supposed to join on otel_logs_and_spans when they're not in the same database?

Or is log_patterns supposed to be a timeseries table in timefusion as well? if thats the case then you cant make queries on timeseries tables that dont depend on timestamp range

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh and you never join on two time series tables. some databases might support the join operation, but the performance is always horrible in that case

Comment on lines 573 to 584
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as here. how is log_pattern joining on otel_logs_and_spans?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otel_logs_and_spans has a log_pattern column

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is log_pattern going to be in timefusion or postgres? if its in timefusion, you do joins. And you can't query it without time range being part of the query.


-- | Get pattern stats from otel_logs_and_spans
-- Returns median and MAD (Median Absolute Deviation) for robust baseline calculation
getPatternStats :: DB es => Projects.ProjectId -> Text -> Int -> Eff es (Maybe PatternStats)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you not using our widgets or atleast KQL for stats and numbers? isnt this for display?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for detecting spikes


-- | Calculate baselines for log patterns
-- Uses hourly counts from otel_logs_and_spans over the last 7 days
calculateLogPatternBaselines :: Projects.ProjectId -> ATBackgroundCtx ()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to support magic alerts. If we're detecting spikes, we should implement an alert system that our users can enable on any metric as well, so its the same code anad logic for all cases. Not magic logic we run in the background.

@claude

This comment was marked as outdated.

@claude

This comment was marked as outdated.

@claude

This comment was marked as outdated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you set these to 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are not part of the new issues table.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we adding it to the query?

Comment on lines 149 to 162
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the text quasiquotes here.

@@ -0,0 +1,68 @@
BEGIN;

CREATE TABLE IF NOT EXISTS apis.log_patterns (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will support patterns on different fields not just the default body/message field. This table doesn't seem aware of this expectation.

CREATE INDEX IF NOT EXISTS idx_log_patterns_last_seen ON apis.log_patterns(project_id, last_seen_at DESC);
CREATE INDEX IF NOT EXISTS idx_log_patterns_service ON apis.log_patterns(project_id, service_name);

CREATE OR REPLACE FUNCTION apis.new_log_pattern_proc() RETURNS trigger AS $$
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, the question remains. Is this a timeseries table or regular table? If its a timeseries table (to be on timefusion), then it won't be in the same database as where the background jobs is, and hence would be unable to queue jobs from within the db.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a regular table.

@claude

This comment was marked as outdated.

@claude

This comment was marked as outdated.

@tonyalaribe
Copy link
Contributor

Rename patterns function from 15mins to 5mins.

@claude

This comment was marked as outdated.

@claude

This comment was marked as outdated.

@claude

This comment was marked as outdated.

@claude

This comment was marked as outdated.

@claude

This comment was marked as outdated.

@claude

This comment was marked as outdated.

@claude
Copy link

claude bot commented Feb 18, 2026

Code Review: PR #300 — Log Patterns

Overall this is a solid feature addition with good architectural decisions (batch DB queries, z-score anomaly detection, Effectful migration). Here are concrete issues and improvements, with a focus on conciseness and using existing project tools.


Bugs / Correctness

getBatchPatternStats is missing a comma in the CTE — SQL syntax error

-- Missing comma before `totals AS (`
mad_calc AS (...)
totals AS (   -- ← needs a leading comma

The query at src/Models/Apis/LogPatterns.hs:305 will fail at runtime with a parse error.

detectLogPatternSpikes shadows pid

-- BackgroundJobs.hs
detectLogPatternSpikes pid authCtx = do
  ...
  let anomalyIds = V.fromList $ map (\(pid', _, _, _, _, _) -> pid') anomalyData

pid' is actually a LogPatternId, not a ProjectId. The name is confusing — patternId' or just lpid would be clearer, but more importantly the variable name pid' mirrors the outer pid :: ProjectId, which is a bug waiting to happen.

processNewLogPattern uses a raw SQL string

PG.query [sql| SELECT count(*) from otel_logs_and_spans WHERE project_id = ? AND timestamp >= now() - interval '7 days' |] (Only pid)

project_id is a UUID column elsewhere in the codebase but the query uses ? directly with a ProjectId. Other queries in this file cast with ::text. Verify the type is consistent.

getPatternsWithCurrentRates only checks source_field = 'body'

-- LogPatterns.hs:374
AND lp.source_field = 'body'

Comment says "only body patterns for now" but Tier 2 patterns (url_path, exception) stored in apis.log_patterns are never checked for spikes. This should at least be a TODO comment or tracked issue, as the baseline infrastructure already works for all source_field values.

createLogPatternIssue doesn't use the Time effect despite passing now indirectly through mkIssue

createLogPatternIssue :: (Time :> es, UUIDEff :> es) => ...

Time is required in the constraint because mkIssue calls Time.currentTime, but createLogPatternIssue itself never touches time — the constraint propagates correctly, but the function doesn't read it directly. Not a bug, just a note that the constraint is inherited.


Code Conciseness / Style

updateOrCreateLevelOne / updateOrCreateLevelTwo — use RecordWildCards or named field update instead of repeating V.// [(index, existing{...})]

The refactored maybe ... V.findIndex style is cleaner, but the inner update is still verbose. Both functions share the same shape; consider a local helper or at least on with record update syntax, which GHC will accept since RecordWildCards is likely enabled.

acknowledgeLogPatterns — missing project_id filter

UPDATE apis.log_patterns SET state = 'acknowledged' ... WHERE pattern_hash = ANY(?)

Pattern hashes are project-scoped in the unique index but this query does not filter by project_id. A rogue (or cross-project) hash match could acknowledge patterns from another project if hashes collide. Add AND project_id = ? and thread the ProjectId parameter through.

getLogPatterns — redundant _selectWhere + manual LIMIT/OFFSET suffix

getLogPatterns pid limit offset = PG.query (_selectWhere @LogPattern [[DAT.field| project_id |]] <> " ORDER BY last_seen_at DESC LIMIT ? OFFSET ?") (pid, limit, offset)

This is fine for now but the function accepts Int for limit/offset. Consider Natural or at least add a note that negative values will be passed to Postgres unvalidated.

LogPatternState JSON derivation is wrong

deriving (AE.FromJSON, AE.ToJSON)
  via DAE.CustomJSON '[DAE.OmitNothingFields, DAE.FieldLabelModifier '[DAE.CamelToSnake]] LogPatternState

LogPatternState is a sum type with no fields — OmitNothingFields and FieldLabelModifier do nothing here. The constructor tag modifier is also missing. Compare with BaselineState which presumably uses ConstructorTagModifier. The serialised form will be "LPSNew" / "LPSAcknowledged" etc., which may not match the DB text column values new / acknowledged / ignored. The WrappedEnumSC "LPS" derivation for FromField/ToField handles DB round-trips correctly, but the JSON form used in the API response will differ.

Long one-liner in Issues.hscreateLogPatternRateChangeIssue

mkIssue projectId LogPatternRateChange lp.patternHash lp.patternHash lp.serviceName (direction == "spike" && ...) severity ("Log Pattern " <> ...) ("Log pattern volume " <> ...) "n/a" rateChangeData

This call at Issues.hs:660 is a single very long line. Since the project values conciseness, this particular case exceeds readability — format it as a multi-line call matching the style of createLogGroup etc.

fetchLogPatterns fallback — drop skip sorted

pure $ drop skip sorted

skip here is an Int received from the caller (a pagination offset). But sortOn (Down . snd) is O(n log n) applied to up to 500 rows and then drop skip wastes the early items. For client-side grouping this is likely acceptable, but skip semantics differ between the SQL path (server-side offset) and this path (post-sort drop). If skip can be >15 the caller will get fewer than 15 results silently.

sourceFieldLabel is defined in both Anomalies.hs and Log.hs

The diff adds sourceFieldLabel to src/Pages/Anomalies.hs:603 and the summary also mentions it in src/Pages/LogExplorer/Log.hs. If these are two separate definitions, it should be extracted to a shared helper (e.g. Pkg/Components or Utils).

createLogGroup initialises exampleLog = templateString

createLogGroup templateTokens templateString logId now =
  LogGroup { ..., exampleLog = templateString, ... }

On creation, exampleLog is set to templateString (the pattern), not the actual raw log content. The real example log is only saved later in updateLogGroupWithTemplate when isSampleLog = True. For newly created single-element groups the example will always be the template, not the original log. Passing the raw logContent through createLogGroup would make it consistent.


Performance

calculateLogPatternBaselines — O(n) length on patterns list

Relude.when (length patterns == 1000) $ Log.logWarn ...
let established = sum [1 :: Int | lp <- patterns, ...]

Two O(n) traversals (once for the guard, once for established). Use V.length if you switch to Vector, or a single foldl' with a counter.

getBatchPatternStats — scans otel_logs_and_spans over 168 hours for all patterns

This is the heaviest query in the PR. The new partial index idx_logs_spans_log_pattern should help, but ensure EXPLAIN ANALYZE has been run on production-scale data. Consider caching the result or running the baseline calculation less frequently than every hour.


Migration

state column uses TEXT not the existing apis.issue_type enum pattern

log_pattern_state (new/acknowledged/ignored) is stored as plain TEXT. Other status columns (e.g. baseline_state) follow the same approach, but consider a CHECK constraint or a proper ENUM type for the state column to prevent invalid values at the DB level.

target_hash default '' in migration + Haskell default

ALTER TABLE apis.issues ADD COLUMN IF NOT EXISTS target_hash TEXT NOT NULL DEFAULT '';

The empty-string default allows the NOT NULL constraint to be satisfied for existing rows. But the ON CONFLICT (project_id, target_hash, issue_type) index will then group all old issues with target_hash = '' together for the same issue_type, potentially causing unintended conflict/merge on the next insert for old issue types. Existing APIChange issues already have a non-empty endpoint_hash; consider backfilling target_hash = endpoint_hash for api_change rows in the migration.


Minor

  • _isSummary parameter in processNewLog (BackgroundJobs.hs:198) — prefixed with _ suggesting it's unused. The old code used it to switch logic; if it's genuinely dead now, remove the parameter and update callers.
  • Data.Set (member) import in RequestDumps.hs — only used for flattenedOtelAttributes membership check. The existing Data.HashMap.Strict is already imported; confirm flattenedOtelAttributes is a Set and not a HashMap (which would need HM.member).
  • Tests in DrainSpec.hs now use anonymous tuple patterns (_, tmp, _) throughout. A named record or a type alias for the triple would make tests less fragile when the return type changes again.

tonyalaribe and others added 6 commits February 18, 2026 01:41
Add log pattern extraction from otel_logs_and_spans using the Drain
algorithm, with baseline volume tracking (median + MAD) and automatic
spike detection via z-score thresholds. New patterns and rate changes
create issues with LLM-generated titles and descriptions.
- Fix N+1 query: batch getBatchPatternStats replaces per-pattern loop
- Break Utils<->RequestDumps circular dependency (inline count query)
- Rename UpsertPattern.pattern to logPattern (PatternSynonyms conflict)
- Move issue creation from IO to Eff (UUIDEff + Time effects, no liftIO)
- Change Issue.sourceType from Text to Maybe Text (NULL instead of "")
- Add otel_logs_and_spans log_pattern index for query performance
- Move ALTER TYPE ADD VALUE outside transaction block (PG < 12 compat)
- Inline trace_id into CREATE TABLE (remove ALTER TABLE ADD COLUMN)
- Add monitoring logs to baseline calculation and spike detection
- Style: when(not...) -> unless, remove trailing blanks in DrainSpec
Use HashMap for O(1) metadata lookup in processPatterns, replace
correlated subqueries with CTE join in getBatchPatternStats, fix
selectIssues to return actual affected_requests/affected_clients
instead of hardcoded zeros, and flatten nested if/else in
fetchLogPatterns into guards.
@claude
Copy link

claude bot commented Feb 18, 2026

PR Review: Log Patterns (#300)

Good overall structure. The feature is well-scoped and the new LogPatterns module has clean separation of concerns. A few issues worth addressing:


Bugs / Correctness

getBatchPatternStats is missing a comma — SQL syntax error

)    -- end of mad_calc CTE
totals AS (   -- ← missing comma before this CTE

Line ~1008 in the diff. The totals AS (...) CTE is missing a preceding ,, which will cause a runtime SQL parse error the first time calculateLogPatternBaselines runs.

leastRecentlyUsedIndex crashes on empty vector

V.minIndexBy is partial on empty vectors. updateOrCreateLogGroup guards with V.length logGroups >= maxLogGroups, so in practice the vector is never empty when leastRecentlyUsedIndex is called — but an explicit guard or a safeMinIndexBy would make this safe and signal intent. (The old V.ifoldl' version had the same issue, so this is not a regression, just worth noting.)

processNewLogPattern count vs. pattern-hash query

The volume check (SELECT count(*)) and the pattern lookup (getLogPatternByHash) are separate queries with no transaction, so there's a small TOCTOU window (pattern could be acknowledged between the two). Low-risk in practice but worth noting.


Code Quality / Conciseness

Named-field tuple anti-pattern in detectLogPatternSpikes

let anomalyData = flip mapMaybe patternsWithRates \lpRate ->
      ...
      in direction <&> \dir -> (lpRate.patternId, lpRate.patternHash, currentRate, mean, stddev, dir)

forM_ anomalyData \(patternId, patternHash, currentRate, mean, stddev, direction) -> ...

A 6-tuple is hard to follow and error-prone. The intermediate anomalyIds variable uses pid' as its name (shadowing the outer pid!) which is a latent bug. Consider a small record or at least a named type alias. The shadowed pid' should be renamed to e.g. lpId.

Redundant sourceType field always equals issueTypeToText issueType

mkIssue sets sourceType = Just $ issueTypeToText issueType unconditionally, making the field fully derivable. The migration adds it as a nullable column, but if it's always set to the same value as issue_type, it provides no additional information. Either remove it or use it for something distinct (e.g. the tier-2 source field name).

Repeated fromMaybe "unknown-service" issue.service in Enhancement.hs

This string appears 6 times across buildTitlePrompt, buildDescriptionPrompt, and buildCriticalityPrompt but issue.service is now Maybe Text everywhere. A single let svc = fromMaybe "unknown-service" issue.service at the top of each function (or a small helper) would eliminate repetition.

LogPattern/LogPatternRateChange LLM prompts are minimal

The new buildTitlePrompt and buildDescriptionPrompt cases for LogPattern and LogPatternRateChange only pass issue.title and service — unlike RuntimeException/APIChange which decode issueData and provide rich context (error type, field counts, etc.). The issueData for both log pattern types contains everything needed (logPattern, sampleMessage, zScore, changePercent, etc.) — it should be decoded and passed to the LLM just like the other issue types.

processPatterns sourceField derivation is confusing

let sourceField = bool "body" kind (kind /= "log")

This means: if kind /= "log" use kind, else use "body". It works but it's needlessly indirect. if kind == "log" then "body" else kind reads clearly and avoids the double-negation.

acknowledgeLogPatterns doesn't filter by project_id

UPDATE apis.log_patterns SET state = 'acknowledged' WHERE pattern_hash = ANY(?)

Pattern hashes could theoretically collide across projects. Adding AND project_id = ? would be safer and align with every other query in this file.

getLogPatterns hard-codes 1000 in the call site, then warns if result is 1000

patterns <- LogPatterns.getLogPatterns pid 1000 0
Relude.when (length patterns == 1000) $ Log.logWarn "Pattern limit reached ..."

The magic number 1000 appears in both the call and the condition. Extract it as a constant or use length patterns == limit.

Test pattern matching is verbose

In DrainSpec.hs, the pattern (\(_, template, _) -> template) now appears ~8 times as a V.map argument. With NamedFieldPuns / RecordWildCards or just a top-level helper patternTemplate (_, t, _) = t, this could be a one-liner each time.


SQL / Migration

ON CONFLICT index mismatch risk

The Haskell insertIssue uses ON CONFLICT (project_id, target_hash, issue_type) but the migration creates idx_issues_project_target_type_open as a partial unique index with WHERE acknowledged_at IS NULL AND archived_at IS NULL. PostgreSQL requires the ON CONFLICT predicate to exactly match the partial index predicate. If they don't match at runtime, the upsert will silently insert duplicates instead of deduplicating. Verify that the WHERE clause in the SQL statement matches.

target_hash TEXT NOT NULL DEFAULT '' + old issues

Existing rows will get target_hash = ''. The partial unique index allows multiple rows with (project_id, '', issue_type) as long as one is closed (acknowledged/archived). This is probably fine for legacy issues, but worth confirming that old api_change issues with endpoint_hash != '' still deduplicate correctly under the new index.

state column uses TEXT not the enum

log_patterns.state is TEXT DEFAULT 'new' while LogPatternState has a proper Haskell enum with WrappedEnumSC. Adding a CHECK (state IN ('new', 'acknowledged', 'ignored')) constraint (or a PostgreSQL CREATE TYPE) would enforce the invariant at the DB level, consistent with how issue_type is handled.


Minor

  • getPatternsWithCurrentRates only checks source_field = 'body' (comment says "only body patterns for now") — make sure this is tracked as a TODO since tier-2 patterns (url_path, exception) have their own rates and would benefit from the same spike detection.
  • calculateLogPatternBaselines uses getBatchPatternStats (all patterns, one query) which is great, but getPatternStats (single-pattern variant) is now unused outside tests — consider removing it or noting where it's kept for future use.
  • extractTier2Patterns uses DISTINCT without a LIMIT guard on the exceptions query (only url_paths has LIMIT 1000) — the exceptions query is also missing a LIMIT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments