ClickHouse schema for storing log templates and representative logs for KL divergence analysis.
docker run -d --name clickhouse-server \
-p 8123:8123 \
-p 9000:9000 \
--ulimit nofile=262144:262144 \
clickhouse/clickhouse-serverdocker exec -i clickhouse-server clickhouse-client --multiquery < clickhouse_schema.sqldocker exec clickhouse-server clickhouse-client -q "SHOW TABLES"Stores the sequence of template IDs for each log stream. Used for KL divergence analysis.
Columns:
stream_name- Log stream identifiertimestamp- Log timestamp (DateTime64 with millisecond precision)template_id- Template ID that the log matcheslog_message- Original log message (optional reference)
Features:
- Partitioned by month (
toYYYYMM(timestamp)) - 30-day TTL (configurable)
- Ordered by
(stream_name, timestamp, template_id)
Stores up to 3 representative logs for each template.
Columns:
stream_name- Log stream identifiertemplate_id- Template IDtemplate_pattern- Template pattern string (e.g., "User <> logged in from <>")representative_logs- Array of up to 3 example log messageslog_count- Total count of logs matching this templatefirst_seen- First occurrence timestamplast_seen- Last occurrence timestamp
Features:
- Uses
ReplacingMergeTreeto deduplicate by latestlast_seen - Partitioned by
stream_name
INSERT INTO log_template_ids (stream_name, timestamp, template_id, log_message)
VALUES ('app-server', now(), 'tpl_001', 'User john logged in from 192.168.1.1');INSERT INTO log_template_representatives
(stream_name, template_id, template_pattern, representative_logs, log_count, first_seen, last_seen)
VALUES (
'app-server',
'tpl_001',
'User <*> logged in from <*>',
['User john logged in from 192.168.1.1', 'User jane logged in from 10.0.0.1', 'User bob logged in from 172.16.0.1'],
150,
now(),
now()
);SELECT * FROM log_template_representatives
WHERE stream_name = 'app-server'
ORDER BY log_count DESC;SELECT timestamp, template_id
FROM log_template_ids
WHERE stream_name = 'app-server'
AND timestamp >= now() - INTERVAL 1 HOUR
ORDER BY timestamp;docker stop clickhouse-serverdocker start clickhouse-serverdocker rm -f clickhouse-serverdocker exec -it clickhouse-server clickhouse-clientEdit clickhouse_schema.sql and modify:
TTL timestamp + INTERVAL 30 DAY; -- Change 30 to desired days8123- HTTP interface9000- Native TCP interface