Skip to content

Conversation

@Lumberj3ck
Copy link
Contributor

This pull request introduces the initial backend implementation for real-time container metrics, as discussed in issue #26. This is a draft PR to showcase the progress and discuss the current implementation before proceeding with the frontend and further backend refinements.

What's Done:

  • Container Metrics Polling:
  • Thread-Safe Ring Buffer:
  • Last Accessed Time Check:

Discuss

Client disconnect context canceling

On client disconnect, I have implemented context canceling for Master node, but we might have some issues if for example first client started metrics polling, then second client tries to access that containers plot and then goes to the other container to check, metris will not be accumulated for that client if first client stop and will start only when second client try to access. Maybe because of this issue, we might need prefer to not stop polling after client disconnect.

I'm looking forward to your feedback on the current implementation.

@Lumberj3ck Lumberj3ck changed the title ### Draft Pull Request: Feature/Container Metrics Draft Pull Request: Feature/Container Metrics Aug 16, 2025
@Lumberj3ck Lumberj3ck force-pushed the feature/metrics-charts branch from ffc0e2e to 614d8ea Compare September 20, 2025 10:19
@Lumberj3ck Lumberj3ck force-pushed the feature/metrics-charts branch from 614d8ea to 88408a2 Compare September 20, 2025 10:29
@Lumberj3ck
Copy link
Contributor Author

Client logic

Initiate metrics polling on stats tab clicking.

4588 if (key == 'Stats') {
4589     cmds._init_metrics_polling();
4590   }

Handle metrics response

5093 if ('Metrics' in notification.Content) {

If metrics received but we didn't receive stats inspector yet, wait

      // container.inspect.stats returned after container.metrics
      if (state.inspector.content.length == 0) {
        state.isLoading = false;
        // to make first request as soon as posible
        cmds._cancel_metrics_polling();
        cmds._init_metrics_polling();
        // do not process if no container.inspector.stats loaded
        break;
      }

After metrics handled as usual and stored inside of inspector. Whenever users clicks on Stats, client receives fully accumulated metrics from the begining.

  1. I wasn't sure what is correct formating so I formated with this:
    npx prettier --write client/assets/js/isaiah.js --tab-width 2 --single-quote --trailing-comma none --arrow-parens always
    however I think something like this might also be the case:
    npx prettier --write client/assets/js/isaiah.js --tab-width 2 --single-quote --trailing-comma es5 --arrow-parens always
    I'm sorry for huge diff 🙏
  2. I have tested feature with agents, stop container, restarting container, reloading on <R>
  3. Update plot colors accordingly to theme change

Todo

  1. Should we add buffering on client, so we don't overflow client with infinite metrics flow?
  2. Add environment variable to control frequency of metrics polling both on client and server

@Lumberj3ck
Copy link
Contributor Author

Lumberj3ck commented Oct 9, 2025

Hey Will!👋

I just wanted to remind about this pr and also summarise stuff which has been done.

Client-side logic

  • Metrics polling is initiated when the user clicks on the Stats tab:

    if (key == 'Stats') {
        cmds._init_metrics_polling();
    }
  • When the client receives a Metrics notification, it checks if container.inspect.stats has been loaded.

    • If not yet loaded (i.e., metrics arrived first), polling is restarted to sync the first data batch as soon as possible.

    • Once inspector data is ready, metrics are processed and stored as part of the inspector state.

  • This ensures the user always gets a fully accumulated metrics history when switching to the Stats tab.

  • Implemented polling cancellation and restart logic to prevent overlapping requests.

  • Plot colors now dynamically follow theme changes for a consistent UI experience.

Backend implementation

Architecture

I introduced a new component:

  • ContainerStatsManager — manages per-container metrics collection.

  • RingBuffer[T] — a generic, thread-safe circular buffer for efficient metric storage without memory growth.

Each container’s metrics are stored in a bounded ring buffer (size = 3000), overwriting old data automatically to prevent leaks or unbounded memory usage.

Concurrency and safety

All state-modifying operations in ContainerStatsManager and RingBuffer are guarded with RWMutex locks.
Each container can be polled independently in its own goroutine, linked to a session-wide context.Context, so that when the session ends, all related pollers stop cleanly.

Polling workflow

  • When the client sends the container.metrics command, the server:

    1. Validates arguments and checks container state via ContainerInspect.

    2. Updates the container’s lastAccessed timestamp.

    3. If polling isn’t active, starts a new goroutine via PollMetrics().

    4. Returns metrics accumulated since the last From index.

  • The poller itself:

    • Fetches data with client.ContainerStatsOneShot().

    • Computes CPU% and memory% using deltas between current and previous stats.

    • Appends each new MetricPoint to the container’s ring buffer.

    • Runs every 3 seconds and stops automatically if:

      • The container has been idle for >30 minutes, or

      • The session’s context is canceled.

Data structure

type MetricPoint struct {
	CpuMetric float64 `json:"cpu"`
	MemMetric float64 `json:"mem"`
	Timestamp int64   `json:"timestamp"`
}

These are stored per container in a bounded buffer:

ringbuf.NewRingBuffer 

Server command addition

Added new case to the command handler:

case "container.metrics":

It handles request parsing, container state checking, poller initialization, and sending a notification with:

{
  "Metrics": [...],
  "From":  <next index>,
  "IsRunning": true  
}

Errors or inactive containers return an empty metrics array and "IsRunning": false.

Testing

I’ve tested with:

  • Multiple agents and hosts

  • Container stop/restart cycles

  • Page reloads

Todo / Open questions

  • Should we add client-side buffering to prevent overflow in very long-running sessions?

  • Should we add an env var to control metrics polling frequency (client & server)?

Would be great if you could take a look and maybe test it a bit — I’d really appreciate your feedback.

Thanks!
Alan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant