Skip to content

feat(cmd): top boards#43

Merged
sendya merged 6 commits intomainfrom
feat/top-boards
Feb 13, 2026
Merged

feat(cmd): top boards#43
sendya merged 6 commits intomainfrom
feat/top-boards

Conversation

@sendya
Copy link
Member

@sendya sendya commented Feb 12, 2026

This pull request introduces a new real-time terminal dashboard tool for monitoring system and application metrics, adds a TopK feature to the LRU cache implementation, and enhances observability and introspection across the codebase. It also updates build scripts and dependencies to support these new features.

Major new features and improvements:

image

Real-time Terminal Dashboard

  • Added a new command-line tool ttop (cmd/top/main.go) that provides a real-time terminal dashboard for monitoring metrics such as QPS, CPU, memory, disk usage, and hot URLs by connecting to the Tavern server's metrics endpoint. This includes a rich UI using termui and smooth updating of metrics.
  • Updated the Makefile to build the new ttop binary as part of the build process. [1] [2]
  • Added required dependencies for the dashboard and metrics collection, including termui, go-humanize, gopsutil, and others in go.mod. [1] [2] [3] [4]

LRU Cache TopK Functionality

  • Implemented a TopK method in the generic LRU cache (pkg/algorithm/lru/lru.go) to efficiently retrieve the top K most frequently used keys, and added a comprehensive unit test (pkg/algorithm/lru/lru_topk_test.go). [1] [2] [3] [4]
  • Extended the Bucket interface to include the TopK method, ensuring broader cache compatibility.

Metrics and Observability Enhancements

  • Added a new metrics package with utilities for smoothing counter values and collecting request code totals using Prometheus metrics, improving the accuracy and granularity of system observability.
  • Enhanced the QS plugin (plugin/qs/qs.go) to collect, smooth, and serve real-time metrics (including hot URLs and system resource usage) for the dashboard, using atomic operations and Prometheus. [1] [2]

Runtime and Version Reporting

  • Improved the runtime info package to record the server's start time and provide a formatted version string, and added a -V/--version flag to the main application for easy version introspection. [1] [2] [3] [4] [5] [6] [7]

References:
[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22]

Copilot AI review requested due to automatic review settings February 12, 2026 15:53
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a real-time terminal “top” dashboard (ttop) backed by a new QS plugin SSE metrics endpoint, and extends the in-repo LRU cache + storage bucket interfaces to surface “TopK” hot items for observability/introspection.

Changes:

  • Introduces cmd/top terminal dashboard that consumes /plugin/qs/graph SSE data.
  • Adds TopK support to the generic LRU cache and propagates it through the storage Bucket interface and bucket implementations.
  • Adds a small metrics helper package and runtime version/start-time reporting; updates build scripts and dependencies.

Reviewed changes

Copilot reviewed 15 out of 16 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
cmd/top/main.go New ttop terminal UI that consumes the QS SSE metrics stream.
plugin/qs/qs.go Adds /plugin/qs/graph SSE endpoint and background collectors (QPS/status codes, usage, hot keys).
pkg/algorithm/lru/lru.go Adds TopK traversal and adjusts increment bookkeeping.
pkg/algorithm/lru/lru_topk_test.go Unit test for the new TopK behavior.
api/defined/v1/storage/storage.go Extends Bucket interface with TopK.
storage/bucket/memory/memory.go Implements Bucket.TopK() via the in-memory LRU.
storage/bucket/disk/disk.go Implements Bucket.TopK() via the disk bucket LRU.
storage/diraware/bucket.go Forwards TopK() through the wrapper bucket.
storage/bucket/empty/empty.go Adds stub TopK() implementation.
metrics/metrics.go Adds counter smoothing and Prometheus gather helpers.
pkg/x/runtime/info.go Adds StartedAt and formatted build/runtime info string.
main.go Adds -V flag to print build/runtime info and exit.
Makefile Builds bin/ttop as part of the toolchain target.
go.mod / go.sum Adds dependencies required for the dashboard/collectors.
server/server.go Adds X-Server: local-plugin header for local plugin requests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

cmd/top/main.go Outdated
dataMu.RUnlock()

rater.Text = fmt.Sprintf("\nRequests/sec: %d \nTotal: %d \n2xx : %d\n4xx : %d\n499 : %d\n5xx : %d",
int(data["total"]), int(data["total"]), int(data["2xx"]), int(data["4xx"]), int(data["499"]), int(data["5xx"]))
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UI displays both "Requests/sec" and "Total" using data["total"]. As implemented, these two values will always be identical, which is misleading. Consider exposing separate fields (e.g. rps vs total_requests) and using them accordingly in the dashboard.

Suggested change
int(data["total"]), int(data["total"]), int(data["2xx"]), int(data["4xx"]), int(data["499"]), int(data["5xx"]))
int(data["rps"]), int(data["total"]), int(data["2xx"]), int(data["4xx"]), int(data["499"]), int(data["5xx"]))

Copilot uses AI. Check for mistakes.
// e.g.
//
// stopCh <- struct{}{}
qs.stopCh <- struct{}{}
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stop sends a single value into stopCh. With multiple goroutines selecting on the same channel, only one goroutine will receive the stop signal, leaving the others running. Also, since the channel is buffered(1), a second Stop() call can block forever if the buffer is full. Prefer close(stopCh) for broadcast semantics (and guard with sync.Once), and/or call the cancel function to stop all collectors.

Suggested change
qs.stopCh <- struct{}{}
if qs.cancel != nil {
qs.cancel()
}

Copilot uses AI. Check for mistakes.
continue
}

usage, _ := disk.Usage(bucket.Path())
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

disk.Usage(bucket.Path()) errors are ignored; on error usage can be nil and usage.Used/Total will panic. Handle the error (skip bucket or keep previous value) before accumulating.

Suggested change
usage, _ := disk.Usage(bucket.Path())
usage, err := disk.Usage(bucket.Path())
if err != nil || usage == nil {
log.Warnf("qs: failed to get disk usage for bucket %q: %v", bucket.Path(), err)
continue
}

Copilot uses AI. Check for mistakes.
StoreType() string
// Path returns the Bucket path.
Path() string
// TopK returns the top k most frequently used keys
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Bucket.TopK doc says it returns "the top k most frequently used keys", but current implementations return formatted strings like path@@time@@refs for UI consumption. This mismatch makes the interface confusing and brittle. Consider either (1) changing the method name/docs to reflect returning display/metadata strings, or (2) returning a structured type (e.g. []HotKey) and/or separate TopKKeys/TopKStats APIs.

Suggested change
// TopK returns the top k most frequently used keys
// TopK returns implementation-defined metadata strings for the top k most frequently used keys.
// The returned strings are intended for UI/display consumption (for example, "path@@time@@refs"),
// and do not necessarily correspond to raw key values.

Copilot uses AI. Check for mistakes.
Comment on lines +232 to +235
md, _ := m.indexdb.Get(context.Background(), arr[i][:])
if md != nil {
ret = append(ret, fmt.Sprintf("%s@@%s@@%d", md.ID.Path(), time.Unix(int64(mark.LastAccess()), 0).Format(time.DateTime), mark.Refs()))
}
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mark := m.cache.Peek(arr[i]) can return nil (e.g. concurrent eviction), but the code dereferences mark unconditionally. Also, indexdb.Get errors are ignored. Add nil/error handling to avoid panics and avoid emitting partial/incorrect entries.

Suggested change
md, _ := m.indexdb.Get(context.Background(), arr[i][:])
if md != nil {
ret = append(ret, fmt.Sprintf("%s@@%s@@%d", md.ID.Path(), time.Unix(int64(mark.LastAccess()), 0).Format(time.DateTime), mark.Refs()))
}
if mark == nil {
// Entry may have been evicted concurrently; skip it.
continue
}
md, err := m.indexdb.Get(context.Background(), arr[i][:])
if err != nil || md == nil {
// If we cannot reliably get metadata, skip this entry to avoid partial/incorrect output.
continue
}
ret = append(ret, fmt.Sprintf("%s@@%s@@%d", md.ID.Path(), time.Unix(int64(mark.LastAccess()), 0).Format(time.DateTime), mark.Refs()))

Copilot uses AI. Check for mistakes.
plugin/qs/qs.go Outdated
Comment on lines 442 to 448
for {
select {
case <-ctx.Done():
return
case <-qs.stopCh:
return
case <-time.Tick(time.Second * 5):
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same time.Tick issue here: each loop iteration creates a new ticker and leaks. Create one time.NewTicker(5*time.Second) before the loop and select on its channel.

Suggested change
for {
select {
case <-ctx.Done():
return
case <-qs.stopCh:
return
case <-time.Tick(time.Second * 5):
ticker := time.NewTicker(5 * time.Second)
defer ticker.Stop()
for {
select {
case <-ctx.Done():
return
case <-qs.stopCh:
return
case <-ticker.C:

Copilot uses AI. Check for mistakes.
qs.cpuPercent.Store(uint32(percent[0]))
}

vmem, _ := mem.VirtualMemory()
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mem.VirtualMemory() errors are ignored and vmem can be nil; dereferencing vmem.Used/vmem.Total would panic. Handle the error and nil result before storing values.

Suggested change
vmem, _ := mem.VirtualMemory()
vmem, err := mem.VirtualMemory()
if err != nil || vmem == nil {
return
}

Copilot uses AI. Check for mistakes.
Comment on lines 349 to 376
// 临时存储本次收集的平滑值
tempData := make(map[string]float64)
totalCounter := float64(0)
for _, mf := range familys {
if mf.GetName() == "tr_tavern_requests_code_total" {
for _, metric := range mf.GetMetric() {
for _, label := range metric.Label {
if label.GetName() == "code" {
code := label.GetValue()
val := metric.GetCounter().GetValue()
totalCounter += val
if smoother, ok := metricsMap[code]; ok {
smoothedValue := smoother.Update(val)
tempData[code] = smoothedValue
}
}
}
}
}
}

// 使用写锁更新共享数据
qs.mu.Lock()
for code, value := range tempData {
qs.smoothedData[code] = value
}
qs.smoothedData["total"] = totalCounter
qs.mu.Unlock()
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totalCounter is computed as the sum of Prometheus counters (cumulative totals), but downstream the dashboard treats data["total"] as requests/sec. This will grow forever and doesn't match the smoothed per-second values you compute for each code. Consider computing total as the sum of the per-code smoothed deltas (or maintain a dedicated smoother for total).

Copilot uses AI. Check for mistakes.
Comment on lines +253 to +255
uiEvents := terminal.PollEvents()
ticker := time.NewTicker(time.Second).C
for {
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The render loop uses time.NewTicker(time.Second) and ignores the -interval flag, so UI refresh rate can't be configured. Use tickInterval when creating the ticker (and stop it on exit).

Copilot uses AI. Check for mistakes.
Comment on lines +222 to +237
disk, diskDraw := func() (*widgets.Gauge, func()) {
disk := widgets.NewGauge()
disk.Title = "Disk Usage"
disk.Percent = int(diskPercent.Load())
disk.BarColor = terminal.ColorYellow
disk.BorderStyle.Fg = terminal.ColorWhite
disk.TitleStyle.Fg = terminal.ColorCyan

return disk, func() {
disk.Percent = int(diskPercent.Load())
disk.Label = fmt.Sprintf("%d%% | Disk: %s / %s",
0,
humanize.Bytes(diskUsage.Load()),
humanize.Bytes(diskTotal.Load()),
)
}
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

diskPercent is never set (and the label hardcodes 0%%), so the Disk gauge percent will always be 0 regardless of actual usage. Either compute and store the percentage from disk_usage/disk_total, or remove diskPercent and derive percent in diskDraw.

Copilot uses AI. Check for mistakes.
…splay in the top command, and improve plugin shutdown logic.
@sendya sendya merged commit 016bec8 into main Feb 13, 2026
1 check passed
@sendya sendya deleted the feat/top-boards branch February 13, 2026 12:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant