Conversation
… integrating it into the QS plugin for hot URL tracking and on-demand metric collection.
…uce `ttop` command with enhanced hot key details.
…and display logic, and rename `qs` plugin API paths
There was a problem hiding this comment.
Pull request overview
This PR adds a real-time terminal “top” dashboard (ttop) backed by a new QS plugin SSE metrics endpoint, and extends the in-repo LRU cache + storage bucket interfaces to surface “TopK” hot items for observability/introspection.
Changes:
- Introduces
cmd/topterminal dashboard that consumes/plugin/qs/graphSSE data. - Adds
TopKsupport to the generic LRU cache and propagates it through the storageBucketinterface and bucket implementations. - Adds a small
metricshelper package and runtime version/start-time reporting; updates build scripts and dependencies.
Reviewed changes
Copilot reviewed 15 out of 16 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
cmd/top/main.go |
New ttop terminal UI that consumes the QS SSE metrics stream. |
plugin/qs/qs.go |
Adds /plugin/qs/graph SSE endpoint and background collectors (QPS/status codes, usage, hot keys). |
pkg/algorithm/lru/lru.go |
Adds TopK traversal and adjusts increment bookkeeping. |
pkg/algorithm/lru/lru_topk_test.go |
Unit test for the new TopK behavior. |
api/defined/v1/storage/storage.go |
Extends Bucket interface with TopK. |
storage/bucket/memory/memory.go |
Implements Bucket.TopK() via the in-memory LRU. |
storage/bucket/disk/disk.go |
Implements Bucket.TopK() via the disk bucket LRU. |
storage/diraware/bucket.go |
Forwards TopK() through the wrapper bucket. |
storage/bucket/empty/empty.go |
Adds stub TopK() implementation. |
metrics/metrics.go |
Adds counter smoothing and Prometheus gather helpers. |
pkg/x/runtime/info.go |
Adds StartedAt and formatted build/runtime info string. |
main.go |
Adds -V flag to print build/runtime info and exit. |
Makefile |
Builds bin/ttop as part of the toolchain target. |
go.mod / go.sum |
Adds dependencies required for the dashboard/collectors. |
server/server.go |
Adds X-Server: local-plugin header for local plugin requests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
cmd/top/main.go
Outdated
| dataMu.RUnlock() | ||
|
|
||
| rater.Text = fmt.Sprintf("\nRequests/sec: %d \nTotal: %d \n2xx : %d\n4xx : %d\n499 : %d\n5xx : %d", | ||
| int(data["total"]), int(data["total"]), int(data["2xx"]), int(data["4xx"]), int(data["499"]), int(data["5xx"])) |
There was a problem hiding this comment.
The UI displays both "Requests/sec" and "Total" using data["total"]. As implemented, these two values will always be identical, which is misleading. Consider exposing separate fields (e.g. rps vs total_requests) and using them accordingly in the dashboard.
| int(data["total"]), int(data["total"]), int(data["2xx"]), int(data["4xx"]), int(data["499"]), int(data["5xx"])) | |
| int(data["rps"]), int(data["total"]), int(data["2xx"]), int(data["4xx"]), int(data["499"]), int(data["5xx"])) |
| // e.g. | ||
| // | ||
| // stopCh <- struct{}{} | ||
| qs.stopCh <- struct{}{} |
There was a problem hiding this comment.
Stop sends a single value into stopCh. With multiple goroutines selecting on the same channel, only one goroutine will receive the stop signal, leaving the others running. Also, since the channel is buffered(1), a second Stop() call can block forever if the buffer is full. Prefer close(stopCh) for broadcast semantics (and guard with sync.Once), and/or call the cancel function to stop all collectors.
| qs.stopCh <- struct{}{} | |
| if qs.cancel != nil { | |
| qs.cancel() | |
| } |
| continue | ||
| } | ||
|
|
||
| usage, _ := disk.Usage(bucket.Path()) |
There was a problem hiding this comment.
disk.Usage(bucket.Path()) errors are ignored; on error usage can be nil and usage.Used/Total will panic. Handle the error (skip bucket or keep previous value) before accumulating.
| usage, _ := disk.Usage(bucket.Path()) | |
| usage, err := disk.Usage(bucket.Path()) | |
| if err != nil || usage == nil { | |
| log.Warnf("qs: failed to get disk usage for bucket %q: %v", bucket.Path(), err) | |
| continue | |
| } |
| StoreType() string | ||
| // Path returns the Bucket path. | ||
| Path() string | ||
| // TopK returns the top k most frequently used keys |
There was a problem hiding this comment.
The Bucket.TopK doc says it returns "the top k most frequently used keys", but current implementations return formatted strings like path@@time@@refs for UI consumption. This mismatch makes the interface confusing and brittle. Consider either (1) changing the method name/docs to reflect returning display/metadata strings, or (2) returning a structured type (e.g. []HotKey) and/or separate TopKKeys/TopKStats APIs.
| // TopK returns the top k most frequently used keys | |
| // TopK returns implementation-defined metadata strings for the top k most frequently used keys. | |
| // The returned strings are intended for UI/display consumption (for example, "path@@time@@refs"), | |
| // and do not necessarily correspond to raw key values. |
| md, _ := m.indexdb.Get(context.Background(), arr[i][:]) | ||
| if md != nil { | ||
| ret = append(ret, fmt.Sprintf("%s@@%s@@%d", md.ID.Path(), time.Unix(int64(mark.LastAccess()), 0).Format(time.DateTime), mark.Refs())) | ||
| } |
There was a problem hiding this comment.
mark := m.cache.Peek(arr[i]) can return nil (e.g. concurrent eviction), but the code dereferences mark unconditionally. Also, indexdb.Get errors are ignored. Add nil/error handling to avoid panics and avoid emitting partial/incorrect entries.
| md, _ := m.indexdb.Get(context.Background(), arr[i][:]) | |
| if md != nil { | |
| ret = append(ret, fmt.Sprintf("%s@@%s@@%d", md.ID.Path(), time.Unix(int64(mark.LastAccess()), 0).Format(time.DateTime), mark.Refs())) | |
| } | |
| if mark == nil { | |
| // Entry may have been evicted concurrently; skip it. | |
| continue | |
| } | |
| md, err := m.indexdb.Get(context.Background(), arr[i][:]) | |
| if err != nil || md == nil { | |
| // If we cannot reliably get metadata, skip this entry to avoid partial/incorrect output. | |
| continue | |
| } | |
| ret = append(ret, fmt.Sprintf("%s@@%s@@%d", md.ID.Path(), time.Unix(int64(mark.LastAccess()), 0).Format(time.DateTime), mark.Refs())) |
plugin/qs/qs.go
Outdated
| for { | ||
| select { | ||
| case <-ctx.Done(): | ||
| return | ||
| case <-qs.stopCh: | ||
| return | ||
| case <-time.Tick(time.Second * 5): |
There was a problem hiding this comment.
Same time.Tick issue here: each loop iteration creates a new ticker and leaks. Create one time.NewTicker(5*time.Second) before the loop and select on its channel.
| for { | |
| select { | |
| case <-ctx.Done(): | |
| return | |
| case <-qs.stopCh: | |
| return | |
| case <-time.Tick(time.Second * 5): | |
| ticker := time.NewTicker(5 * time.Second) | |
| defer ticker.Stop() | |
| for { | |
| select { | |
| case <-ctx.Done(): | |
| return | |
| case <-qs.stopCh: | |
| return | |
| case <-ticker.C: |
| qs.cpuPercent.Store(uint32(percent[0])) | ||
| } | ||
|
|
||
| vmem, _ := mem.VirtualMemory() |
There was a problem hiding this comment.
mem.VirtualMemory() errors are ignored and vmem can be nil; dereferencing vmem.Used/vmem.Total would panic. Handle the error and nil result before storing values.
| vmem, _ := mem.VirtualMemory() | |
| vmem, err := mem.VirtualMemory() | |
| if err != nil || vmem == nil { | |
| return | |
| } |
| // 临时存储本次收集的平滑值 | ||
| tempData := make(map[string]float64) | ||
| totalCounter := float64(0) | ||
| for _, mf := range familys { | ||
| if mf.GetName() == "tr_tavern_requests_code_total" { | ||
| for _, metric := range mf.GetMetric() { | ||
| for _, label := range metric.Label { | ||
| if label.GetName() == "code" { | ||
| code := label.GetValue() | ||
| val := metric.GetCounter().GetValue() | ||
| totalCounter += val | ||
| if smoother, ok := metricsMap[code]; ok { | ||
| smoothedValue := smoother.Update(val) | ||
| tempData[code] = smoothedValue | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| // 使用写锁更新共享数据 | ||
| qs.mu.Lock() | ||
| for code, value := range tempData { | ||
| qs.smoothedData[code] = value | ||
| } | ||
| qs.smoothedData["total"] = totalCounter | ||
| qs.mu.Unlock() |
There was a problem hiding this comment.
totalCounter is computed as the sum of Prometheus counters (cumulative totals), but downstream the dashboard treats data["total"] as requests/sec. This will grow forever and doesn't match the smoothed per-second values you compute for each code. Consider computing total as the sum of the per-code smoothed deltas (or maintain a dedicated smoother for total).
| uiEvents := terminal.PollEvents() | ||
| ticker := time.NewTicker(time.Second).C | ||
| for { |
There was a problem hiding this comment.
The render loop uses time.NewTicker(time.Second) and ignores the -interval flag, so UI refresh rate can't be configured. Use tickInterval when creating the ticker (and stop it on exit).
| disk, diskDraw := func() (*widgets.Gauge, func()) { | ||
| disk := widgets.NewGauge() | ||
| disk.Title = "Disk Usage" | ||
| disk.Percent = int(diskPercent.Load()) | ||
| disk.BarColor = terminal.ColorYellow | ||
| disk.BorderStyle.Fg = terminal.ColorWhite | ||
| disk.TitleStyle.Fg = terminal.ColorCyan | ||
|
|
||
| return disk, func() { | ||
| disk.Percent = int(diskPercent.Load()) | ||
| disk.Label = fmt.Sprintf("%d%% | Disk: %s / %s", | ||
| 0, | ||
| humanize.Bytes(diskUsage.Load()), | ||
| humanize.Bytes(diskTotal.Load()), | ||
| ) | ||
| } |
There was a problem hiding this comment.
diskPercent is never set (and the label hardcodes 0%%), so the Disk gauge percent will always be 0 regardless of actual usage. Either compute and store the percentage from disk_usage/disk_total, or remove diskPercent and derive percent in diskDraw.
…splay in the top command, and improve plugin shutdown logic.
This pull request introduces a new real-time terminal dashboard tool for monitoring system and application metrics, adds a TopK feature to the LRU cache implementation, and enhances observability and introspection across the codebase. It also updates build scripts and dependencies to support these new features.
Major new features and improvements:
Real-time Terminal Dashboard
ttop(cmd/top/main.go) that provides a real-time terminal dashboard for monitoring metrics such as QPS, CPU, memory, disk usage, and hot URLs by connecting to the Tavern server's metrics endpoint. This includes a rich UI usingtermuiand smooth updating of metrics.Makefileto build the newttopbinary as part of the build process. [1] [2]termui,go-humanize,gopsutil, and others ingo.mod. [1] [2] [3] [4]LRU Cache TopK Functionality
TopKmethod in the generic LRU cache (pkg/algorithm/lru/lru.go) to efficiently retrieve the top K most frequently used keys, and added a comprehensive unit test (pkg/algorithm/lru/lru_topk_test.go). [1] [2] [3] [4]Bucketinterface to include theTopKmethod, ensuring broader cache compatibility.Metrics and Observability Enhancements
metricspackage with utilities for smoothing counter values and collecting request code totals using Prometheus metrics, improving the accuracy and granularity of system observability.plugin/qs/qs.go) to collect, smooth, and serve real-time metrics (including hot URLs and system resource usage) for the dashboard, using atomic operations and Prometheus. [1] [2]Runtime and Version Reporting
-V/--versionflag to the main application for easy version introspection. [1] [2] [3] [4] [5] [6] [7]References:
[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22]