Biscuit - a tsdb

Everything in this repo except this statement is 99% codex. Use at your own risk.

Biscuit - a tsdb

Minimal time-series storage built on FoundationDB with a ring buffer per metric and a DataFusion-powered SQL surface.

Zero-to-cURL quick start

This is the fastest way to experience the system: launch the Podman image, POST a few samples, and run a SQL query.

Build and start the container
```
podman build -t tsdb-proto .
podman run --rm -it \
  --privileged \
  --security-opt seccomp=unconfined \
  -p 8000:8000 \
  -v "$(pwd):/workspace:z" \
  tsdb-proto
```
The container brings up FoundationDB, the FastAPI server, and the demo collector under runit. Port 8000 now exposes the API to your host. The first startup spends a few seconds auto-configuring FoundationDB (configure new single memory), so wait until the logs go quiet before sending requests.

Send a gauge datapoint

API=http://127.0.0.1:8000
curl -s "$API/ingest/gauge" \
  -H "Content-Type: application/json" \
  -d "{\"name\":\"cpu_percent\",\"ts\":$(date +%s),\"value\":37.5,\"tags\":{\"host\":\"demo\"}}"

Look up the metric_id

Requires jq; alternatively inspect the JSON manually.

metric_id=$(curl -s "$API/metrics/lookup" \
  -H "Content-Type: application/json" \
  -d '{"name":"cpu_percent"}' | jq -r '.metrics[0].metric_id // empty')
if [ -z "$metric_id" ]; then
  echo "metric cpu_percent not found (did ingest succeed?)"
  exit 1
fi
echo "metric_id=$metric_id"

Query the samples with SQL

start_ts=$(( $(date +%s) - 600 ))
end_ts=$(date +%s)
curl -s "$API/query" \
  -H "Content-Type: application/json" \
  -d "{\"metric_ids\":[$metric_id],\"start_ts\":$start_ts,\"end_ts\":$end_ts,\"sql\":\"SELECT ts, value FROM samples WHERE metric_id = $metric_id ORDER BY ts\"}" \
  | jq

Swap in your own SQL (ts_bucket, bucket_rate, and friends are registered) or hit /metrics/$metric_id/series for pre-bucketed JSON.

Open the UI

Visit http://127.0.0.1:8000/ui in your browser to explore dashboards that read from the same API.

Shutdown the container with Ctrl+C when finished.

How it works

The system stores metric samples in a bounded ring per metric (tuple key (1, metric_id, slot)), metadata in (2, metric_id), and counter state in (3, metric_id). Each metric keeps slots samples where slot = (ts // step) % slots, so storage stays constant over time. Gauges keep raw values, counters keep cumulative values, and bucket_rate derives per-bucket rates at query time. Retention is explicit (no TTLs); delete or rewrite metrics via the API when needed.

Design highlights (see `DESIGN.md` for full rationale)

Fixed-size rings mean storage is bounded by construction (num_metrics × slots × record_size).
Writes overwrite existing slots instead of appending, so reducing traffic immediately reduces pressure.
There are no TTLs/compactions/deletes—old data is naturally overwritten on schedule.
FoundationDB handles durability/replication; DataFusion handles SQL, letting the two evolve independently.
Backpressure is predictable: overload shows up as latency, not surprise disk or compaction spikes.

For deeper reasoning, tradeoffs, and comparisons to other TSDB models, read DESIGN.md.

Available endpoints

POST /ingest/gauge – { "metric_id"?, "name"?, "ts", "value", "tags"?, "step"?, "slots"? }
POST /ingest/counter – { "metric_id"?, "name"?, "ts", "raw_value", "tags"?, "step"?, "slots"? }
POST /query – { "metric_ids": [...], "start_ts", "end_ts", "sql" }; tables: samples, metrics, metric_tags; UDFs: ts_bucket, bucket_rate, clamp, align_time, null_if_outside, series_add/sub/mul/div
GET /metrics – list metric metadata
POST /metrics/lookup – match name/tags
GET /metrics/{metric_id}/series – pre-bucketed time series (uses bucket_rate for counters)
POST /metrics/{metric_id}/retention – gauges-only rewrite { "step": int, "slots": int }
DELETE /metrics/{metric_id} – drop metric + values
Dashboards live under /ui with GET/POST /dashboards

Sample queries

Every query is a two-step process:

1. Discover the metric IDs you care about

UI: open /metrics, search for your metric, and copy the ID column.
SQL: run SELECT metric_id FROM metrics WHERE name = 'cpu_percent'; via the UI’s SQL tab or /query.

API/CLI: call /metrics/lookup, for example:

metric_id=$(curl -s "$API/metrics/lookup" \
  -H "Content-Type: application/json" \
  -d '{"name":"cpu_percent"}' | jq -r '.metrics[0].metric_id // empty')

2. Run SQL against those IDs

Send SQL from the command line with curl + /query; pass the metric IDs you just resolved plus time bounds. Example:

curl -s "$API/query" \
  -H "Content-Type: application/json" \
  -d "{\"metric_ids\":[$metric_id],\"start_ts\":1700000000,\"end_ts\":1700003600,\"sql\":\"SELECT ts, value FROM samples WHERE metric_id = $metric_id ORDER BY ts\"}" \
  | jq

That same payload is what the UI issues behind the scenes when you run a query, so you can prototype in the terminal and paste back into dashboards later.

Example recipes

Each example shows the two exact commands you’d run: one to discover the metric ID, one to query it.

Inspect raw gauge samples

Find the metric ID (using the metric name cpu_percent as an example):

CPU_METRIC_ID=$(curl -s "$API/metrics/lookup" \
  -H "Content-Type: application/json" \
  -d '{"name":"cpu_percent"}' \
  | jq -r '.metrics[0].metric_id // empty')

Pull the last 10 minutes of samples:

START_TS=$(( $(date +%s) - 600 ))
END_TS=$(date +%s)

curl -s "$API/query" \
  -H "Content-Type: application/json" \
  -d "{\"metric_ids\":[${CPU_METRIC_ID}],\"start_ts\":$START_TS,\"end_ts\":$END_TS,\"sql\":\"SELECT ts, value FROM samples WHERE metric_id = ${CPU_METRIC_ID} ORDER BY ts\"}" \
  | jq

Bucket and average a gauge

Reuse the CPU_METRIC_ID lookup above (or run it again for a different metric).

Bucket readings into one-minute intervals and smooth them with avg():

curl -s "$API/query" \
  -H "Content-Type: application/json" \
  -d "{\"metric_ids\":[${CPU_METRIC_ID}],\"start_ts\":$START_TS,\"end_ts\":$END_TS,\"sql\":\"WITH selector_map AS (SELECT 'cpu' AS alias, ${CPU_METRIC_ID} AS metric_id), base_agg AS (SELECT ts_bucket(s.ts, 60) AS bucket, 'cpu' AS alias, avg(s.value) AS value FROM samples AS s JOIN selector_map AS sel ON s.metric_id = sel.metric_id WHERE s.ts BETWEEN $START_TS AND $END_TS GROUP BY bucket) SELECT bucket, alias, value FROM base_agg ORDER BY bucket\"}" \
  | jq

(Same SQL written out for reference:)

WITH selector_map AS (
  SELECT 'cpu' AS alias, 42 AS metric_id
),
base_agg AS (
  SELECT ts_bucket(s.ts, 60) AS bucket, 'cpu' AS alias, avg(s.value) AS value
  FROM samples AS s
  JOIN selector_map AS sel ON s.metric_id = sel.metric_id
  WHERE s.ts BETWEEN 1700000000 AND 1700003600
  GROUP BY bucket
)
SELECT bucket, alias, value
FROM base_agg
ORDER BY bucket;

Counter throughput with `bucket_rate`

Resolve the counter’s metric ID (here using the baked-in net_bytes_sent metric):

REQ_METRIC_ID=$(curl -s "$API/metrics/lookup" \
  -H "Content-Type: application/json" \
  -d '{"name":"net_bytes_sent"}' \
  | jq -r '.metrics[0].metric_id // empty')

Convert cumulative counter samples into per-second rates. bucket_rate expects both the current bucket value and the previous bucket value (LAG(...)) so it can compute the delta safely:

curl -s "$API/query" \
  -H "Content-Type: application/json" \
  -d "{\"metric_ids\":[${REQ_METRIC_ID}],\"start_ts\":$START_TS,\"end_ts\":$END_TS,\"sql\":\"WITH selector_map AS (SELECT 'net_sent' AS alias, ${REQ_METRIC_ID} AS metric_id), base AS (SELECT ts_bucket(s.ts, 60) AS bucket, s.metric_id, avg(s.value) AS bucket_avg FROM samples AS s JOIN selector_map AS sel ON s.metric_id = sel.metric_id WHERE s.ts BETWEEN $START_TS AND $END_TS GROUP BY bucket, s.metric_id) SELECT b.bucket, sel.alias AS series, bucket_rate(b.bucket_avg, LAG(b.bucket_avg) OVER (ORDER BY b.bucket), 60) AS per_second FROM base AS b JOIN selector_map AS sel ON b.metric_id = sel.metric_id ORDER BY bucket\"}" \
  | jq

SQL reference:

WITH selector_map AS (
  SELECT 'net_sent' AS alias, 3020 AS metric_id
),
base AS (
  SELECT ts_bucket(s.ts, 60) AS bucket, s.metric_id, avg(s.value) AS bucket_avg
  FROM samples AS s
  JOIN selector_map AS sel ON s.metric_id = sel.metric_id
  WHERE s.ts BETWEEN 1700000000 AND 1700003600
  GROUP BY bucket, s.metric_id
)
SELECT b.bucket, sel.alias AS series, bucket_rate(b.bucket_avg, LAG(b.bucket_avg) OVER (ORDER BY b.bucket), 60) AS per_second
FROM base AS b
JOIN selector_map AS sel ON b.metric_id = sel.metric_id
ORDER BY bucket;

QueryEngine registers ts_bucket, bucket_rate, math helpers, and series operators automatically before executing SQL, so the functions above are always available.

Compare counter directions (sent vs recv)

Resolve the net_bytes_sent and net_bytes_recv metric IDs:

SENT_ID=$(curl -s "$API/metrics/lookup" \
  -H "Content-Type: application/json" \
  -d '{"name":"net_bytes_sent"}' | jq -r '.metrics[0].metric_id // empty')

RECV_ID=$(curl -s "$API/metrics/lookup" \
  -H "Content-Type: application/json" \
  -d '{"name":"net_bytes_recv"}' | jq -r '.metrics[0].metric_id // empty')

Compute per-second rates for each direction and subtract them to see bursts or imbalances:

START_TS=$(( $(date +%s) - 3600 ))
END_TS=$(date +%s)

curl -s "$API/query" \
  -H "Content-Type: application/json" \
  -d "{\"metric_ids\":[${SENT_ID},${RECV_ID}],\"start_ts\":$START_TS,\"end_ts\":$END_TS,\"sql\":\"WITH selector_map AS (SELECT 'sent' AS alias, ${SENT_ID} AS metric_id UNION ALL SELECT 'recv' AS alias, ${RECV_ID} AS metric_id), bucketed AS (SELECT ts_bucket(s.ts, 60) AS bucket, sel.alias, avg(s.value) AS bucket_avg FROM samples AS s JOIN selector_map AS sel ON s.metric_id = sel.metric_id WHERE s.ts BETWEEN $START_TS AND $END_TS GROUP BY bucket, sel.alias), rates AS (SELECT bucket, alias, bucket_rate(bucket_avg, LAG(bucket_avg) OVER (PARTITION BY alias ORDER BY bucket), 60) AS rate FROM bucketed) SELECT bucket, max(CASE WHEN alias = 'sent' THEN rate END) AS sent_rate, max(CASE WHEN alias = 'recv' THEN rate END) AS recv_rate, max(CASE WHEN alias = 'sent' THEN rate END) - max(CASE WHEN alias = 'recv' THEN rate END) AS delta FROM rates GROUP BY bucket ORDER BY bucket\"}" \
  | jq '.[\"rows\"]? // .'

SQL reference:

WITH selector_map AS (
  SELECT 'sent' AS alias, 3020 AS metric_id
  UNION ALL
  SELECT 'recv' AS alias, 3021 AS metric_id
),
bucketed AS (
  SELECT ts_bucket(s.ts, 60) AS bucket, sel.alias, avg(s.value) AS bucket_avg
  FROM samples AS s
  JOIN selector_map AS sel ON s.metric_id = sel.metric_id
  WHERE s.ts BETWEEN 1700000000 AND 1700003600
  GROUP BY bucket, sel.alias
),
rates AS (
  SELECT bucket,
         alias,
         bucket_rate(bucket_avg, LAG(bucket_avg) OVER (PARTITION BY alias ORDER BY bucket), 60) AS rate
  FROM bucketed
)

SELECT bucket, max(CASE WHEN alias = 'sent' THEN rate END) AS sent_rate, max(CASE WHEN alias = 'recv' THEN rate END) AS recv_rate, max(CASE WHEN alias = 'sent' THEN rate END) - max(CASE WHEN alias = 'recv' THEN rate END) AS delta FROM rates GROUP BY bucket ORDER BY bucket;


#### Spot resource ceilings (CPU/memory/disk %)

1. Resolve the three percentage gauges:

 ```bash
 CPU_ID=$(curl -s "$API/metrics/lookup" \
   -H "Content-Type: application/json" \
   -d '{"name":"cpu_percent"}' | jq -r '.metrics[0].metric_id // empty')
 MEM_ID=$(curl -s "$API/metrics/lookup" \
   -H "Content-Type: application/json" \
   -d '{"name":"mem_used_percent"}' | jq -r '.metrics[0].metric_id // empty')
 DISK_ID=$(curl -s "$API/metrics/lookup" \
   -H "Content-Type: application/json" \
   -d '{"name":"disk_used_percent"}' | jq -r '.metrics[0].metric_id // empty')

Bucket all three into a single query so dashboards can plot them together and watch for anything approaching 100%:

START_TS=$(( $(date +%s) - 3600 ))
END_TS=$(date +%s)

curl -s "$API/query" \
  -H "Content-Type: application/json" \
  -d "{\"metric_ids\":[${CPU_ID},${MEM_ID},${DISK_ID}],\"start_ts\":$START_TS,\"end_ts\":$END_TS,\"sql\":\"WITH selector_map AS (SELECT 'cpu' AS alias, ${CPU_ID} AS metric_id UNION ALL SELECT 'mem' AS alias, ${MEM_ID} AS metric_id UNION ALL SELECT 'disk' AS alias, ${DISK_ID} AS metric_id), bucketed AS (SELECT ts_bucket(s.ts, 60) AS bucket, sel.alias, avg(s.value) AS pct FROM samples AS s JOIN selector_map AS sel ON s.metric_id = sel.metric_id WHERE s.ts BETWEEN $START_TS AND $END_TS GROUP BY bucket, sel.alias) SELECT bucket, max(CASE WHEN alias = 'cpu' THEN pct END) AS cpu_percent, max(CASE WHEN alias = 'mem' THEN pct END) AS mem_used_percent, max(CASE WHEN alias = 'disk' THEN pct END) AS disk_used_percent FROM bucketed GROUP BY bucket ORDER BY bucket\"}\" \
  | jq '.[\"rows\"]? // .'

SQL reference:

WITH selector_map AS (
  SELECT 'cpu' AS alias, 3001 AS metric_id
  UNION ALL
  SELECT 'mem' AS alias, 3003 AS metric_id
  UNION ALL
  SELECT 'disk' AS alias, 3004 AS metric_id
),
bucketed AS (
  SELECT ts_bucket(s.ts, 60) AS bucket, sel.alias, avg(s.value) AS pct
  FROM samples AS s
  JOIN selector_map AS sel ON s.metric_id = sel.metric_id
  WHERE s.ts BETWEEN 1700000000 AND 1700003600
  GROUP BY bucket, sel.alias
)
SELECT bucket,
       max(CASE WHEN alias = 'cpu' THEN pct END) AS cpu_percent,
       max(CASE WHEN alias = 'mem' THEN pct END) AS mem_used_percent,
       max(CASE WHEN alias = 'disk' THEN pct END) AS disk_used_percent
FROM bucketed
GROUP BY bucket
ORDER BY bucket;

Tests

.venv/bin/python tests.py

The suite covers ring-buffer round trips, counter-rate detection, and SQL queries. Tests skip automatically if FoundationDB is unreachable.

Repository tips

Root entry points: api.py (FastAPI), demo.py (ingestion/query flow), tests.py (unittests)
Persistence logic lives in tsdb_fdb.py; SQL orchestration and UDF registration live in query_engine.py
Shared config lives in config/; container assets under docker/ and Dockerfile
Store fdb.cluster in the repo root or reference it via FDB_CLUSTER_FILE

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
config		config
docker		docker
frontend		frontend
scripts		scripts
static		static
templates		templates
.containerignore		.containerignore
.gitignore		.gitignore
AGENTS.md		AGENTS.md
DESIGN.md		DESIGN.md
Dockerfile		Dockerfile
README.md		README.md
api.py		api.py
collect_metrics.py		collect_metrics.py
demo.py		demo.py
development.md		development.md
metrics.png		metrics.png
package-lock.json		package-lock.json
package.json		package.json
query_engine.py		query_engine.py
requirements.txt		requirements.txt
tests.py		tests.py
tsconfig.json		tsconfig.json
tsdb_fdb.py		tsdb_fdb.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Everything in this repo except this statement is 99% codex. Use at your own risk.

Biscuit - a tsdb

Zero-to-cURL quick start

How it works

Design highlights (see `DESIGN.md` for full rationale)

Available endpoints

Sample queries

1. Discover the metric IDs you care about

2. Run SQL against those IDs

Example recipes

Inspect raw gauge samples

Bucket and average a gauge

Counter throughput with `bucket_rate`

Compare counter directions (sent vs recv)

Tests

Repository tips

About

Uh oh!

Releases

Packages

Languages

syamp/biscuit

Folders and files

Latest commit

History

Repository files navigation

Everything in this repo except this statement is 99% codex. Use at your own risk.

Biscuit - a tsdb

Zero-to-cURL quick start

How it works

Design highlights (see DESIGN.md for full rationale)

Available endpoints

Sample queries

1. Discover the metric IDs you care about

2. Run SQL against those IDs

Example recipes

Inspect raw gauge samples

Bucket and average a gauge

Counter throughput with bucket_rate

Compare counter directions (sent vs recv)

Tests

Repository tips

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Design highlights (see `DESIGN.md` for full rationale)

Counter throughput with `bucket_rate`

Packages