Skip to content

Comments

fix: close underlying TCP conn on stop during TLS handshake#20

Merged
isaac-lee-apex merged 1 commit intomainfrom
islee/fix-tls-handshake-hang-on-stop
Feb 10, 2026
Merged

fix: close underlying TCP conn on stop during TLS handshake#20
isaac-lee-apex merged 1 commit intomainfrom
islee/fix-tls-handshake-hang-on-stop

Conversation

@isaac-lee-apex
Copy link
Contributor

Problem

Initiator.Stop() hangs indefinitely when a session is stuck in a TLS handshake.

This occurs when TCP connects successfully to a host that never responds to the TLS ClientHello (e.g. a plain TCP server, a misconfigured endpoint, or a firewall that accepts TCP but drops TLS). The existing stop-cancellation goroutine in handleConnection cancels the context.Context used by DialContext, but after DialContext returns successfully, the context is no longer relevant. The subsequent tls.Client(netConn, tlsConfig).Handshake() call is a plain blocking operation with no cancellation mechanism — so when stopChan closes, nothing closes the underlying net.Conn, and Handshake() blocks forever. This causes Stop() to block on wg.Wait() indefinitely.

Other disconnection scenarios (DNS failure, TCP black-hole, TLS to unreachable IP) are unaffected because Stop() can cancel the TCP dial via the context.

Fix

Track the raw TCP connection in a mutex-protected pendingConn variable during the TLS handshake window. The stop-cancellation goroutine now closes pendingConn (in addition to cancelling the dial context) when a stop signal is received. Closing the underlying net.Conn immediately unblocks Handshake() with a "use of closed network connection" error, allowing handleConnection to proceed to the reconnect path where it detects stopChan is closed and exits normally.

The pendingConn is set just before Handshake() and cleared immediately after (on both success and error paths), so the close only happens during the TLS handshake window.

Testing

Verified with an e2e test (TestStopDisconnectedInitiatorSession_TLSHandshakeHang) that starts a plain TCP listener, points an SSL-enabled initiator at it, waits for the TLS handshake to stall, then calls StopSession. Before this fix the test timed out after 15s; with the fix it completes in ~4s.

Amp Threads:

Initiator.Stop() hangs indefinitely when the session is stuck in a TLS
handshake (e.g. TCP connects to a server that never sends a TLS
ServerHello). The stop-cancellation goroutine cancels the dial context,
but after DialContext succeeds the context is no longer relevant and
nothing closes the raw net.Conn, so tls.Handshake() blocks forever.

Track the raw TCP connection in a mutex-protected variable during the
TLS handshake window so the stop-goroutine can close it, which
immediately unblocks Handshake() with an error and allows
handleConnection to exit normally.

Amp-Thread-ID: https://ampcode.com/threads/T-019c4926-8512-75e5-9f4c-9c62f6b65a25
Co-authored-by: Amp <amp@ampcode.com>
@isaac-lee-apex isaac-lee-apex merged commit 7a02eef into main Feb 10, 2026
46 of 47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant