Skip to content

Conversation

@bmaurer
Copy link

@bmaurer bmaurer commented Jan 11, 2026

Implement buffering with backpressure to prevent overwhelming slow clients:

  • Add WriteBuffer class for bounded pending write data (256KB limit)
  • Add waitOnSocketWritable() helper for non-blocking write checks
  • TerminalServer: buffer terminal output, only read when buffer has room
  • TerminalClient: buffer console output with same flow control pattern
  • JumpHost: add pending packet queue with flow control

This prevents fast producers (terminal, remote server) from overwhelming slow consumers (network, console) by stopping reads when buffers fill.

Fixes #631

Implement buffering with backpressure to prevent overwhelming slow clients:
- Add WriteBuffer class for bounded pending write data (256KB limit)
- Add waitOnSocketWritable() helper for non-blocking write checks
- TerminalServer: buffer terminal output, only read when buffer has room
- TerminalClient: buffer console output with same flow control pattern
- JumpHost: add pending packet queue with flow control

This prevents fast producers (terminal, remote server) from overwhelming
slow consumers (network, console) by stopping reads when buffers fill.
@MisterTea
Copy link
Owner

Cool! I'm glad you had a chance to look into this!

Question: the way I typically use ET is I kick off a training job and then close my laptop. Will this stall the training process? Do we need logic that disables flow control when there's no client connected?

@codecov
Copy link

codecov bot commented Jan 12, 2026

Codecov Report

❌ Patch coverage is 92.89617% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.29%. Comparing base (ee8ddc2) to head (9f4ac9a).

Files with missing lines Patch % Lines
src/terminal/TerminalServer.cpp 71.11% 13 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #730      +/-   ##
==========================================
+ Coverage   87.05%   87.29%   +0.23%     
==========================================
  Files          70       72       +2     
  Lines        5161     5328     +167     
  Branches      483      503      +20     
==========================================
+ Hits         4493     4651     +158     
- Misses        667      677      +10     
+ Partials        1        0       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@bmaurer
Copy link
Author

bmaurer commented Jan 12, 2026

Looking at TerminalServer.cpp, I think the flow control logic works like this:

  1. When client is connected (serverClientFd > 0):
    - Read from terminal → buffer data → drain to client when socket writable
    - Backpressure: stop reading from terminal when buffer is full (256KB)
  2. When client is disconnected (serverClientFd == -1):
    - serverClientFd > 0 check fails, so we never drain the buffer
    - But we still apply backpressure: if (terminalOutputBuffer.canAcceptMore()) { FD_SET(terminalFd, &rfd); }
    - Once buffer fills to 256KB, we stop reading from the pty master fd

Does that sound right?

At the end of the day, the behavior here is that eventually a write() to the console will buffer. that may or may not stall the app depending on implementation. Logic dictates that either (1) we need to have an infinite buffer or (2) we need to backpressure.

The correct approach to this is to use tmux -CC, which i believe has it's own flow control https://github.com/tmux/tmux/wiki/Control-Mode. Any other approach will either buffer infinitely or stall.

Maybe we should do a smaller buffer when you are connected to the server but a larger buffer when not.

@bmaurer
Copy link
Author

bmaurer commented Jan 12, 2026

btw, is there a good way to do an end to end test of this (like testing out et against a program that does log spew, have the client disconnect then reconnect) I struggled to vibe code that part

@MisterTea
Copy link
Owner

btw, is there a good way to do an end to end test of this (like testing out et against a program that does log spew, have the client disconnect then reconnect) I struggled to vibe code that part

I've been slammed with interviews lately but once I have some time, my plan (feel free to do this yourself) is to start a pytorch training job inside tmux -CC inside et, close the connection, then come back an hour later and make sure the training kept running.

@MisterTea
Copy link
Owner

By "close the connection" I mean turn off wifi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Flow Control

2 participants