close connection when response to OPEN is not sent during hold time#3296
close connection when response to OPEN is not sent during hold time#3296tom-code wants to merge 1 commit intoosrg:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses issue #3284 where GoBGP, acting as a client, would wait indefinitely when no response is received to an OPEN message. The fix introduces a hold timer in the outgoing connection manager to ensure the connection is closed if an OPEN response is not received within the configured hold time (240 seconds).
Changes:
- Added a hold timer to the
outgoingConnManager.runfunction's OPENSENT state handling - Implemented logic to send a BGP notification and transition back to CONNECT state when the timer expires
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| case <-holdTimer.C: | ||
| fsm.logger.Debug("outgoing connection closing due to hold timer expired in OPENSENT state") | ||
| notif := bgp.NewBGPNotificationMessage(bgp.BGP_ERROR_HOLD_TIMER_EXPIRED, 0, nil) | ||
| _ = fsm.sendNotification(conn, notif) | ||
| ocm.state.Store(bgp.BGP_FSM_CONNECT) | ||
| continue |
There was a problem hiding this comment.
Before closing the connection, set a read deadline similar to line 311 to ensure the recvMessage goroutine is unblocked. Add conn.SetReadDeadline(time.Now()) before closing the connection to match the pattern used in the ctx.Done case and ensure proper cleanup.
| holdTimer := time.NewTimer(time.Second * time.Duration(fsm.opensentHoldTime)) | ||
| go ocm.fsm.h.recvMessage(ocm.ctx, conn, recvCh, reasonCh, &wg) | ||
| select { | ||
| case <-holdTimer.C: | ||
| fsm.logger.Debug("outgoing connection closing due to hold timer expired in OPENSENT state") | ||
| notif := bgp.NewBGPNotificationMessage(bgp.BGP_ERROR_HOLD_TIMER_EXPIRED, 0, nil) | ||
| _ = fsm.sendNotification(conn, notif) | ||
| ocm.state.Store(bgp.BGP_FSM_CONNECT) | ||
| continue |
There was a problem hiding this comment.
The timer created at line 301 needs to be stopped in all other select cases to prevent resource leaks. When other cases in the select statement are executed (lines 310-338), the timer continues running and will eventually fire even though it's no longer needed. Add holdTimer.Stop() at the beginning of each other case (after case <-ocm.ctx.Done():, case reason := <-reasonCh:, and case fmsg := <-recvCh:).
There was a problem hiding this comment.
stop was added to other select cases
| case <-holdTimer.C: | ||
| fsm.logger.Debug("outgoing connection closing due to hold timer expired in OPENSENT state") | ||
| notif := bgp.NewBGPNotificationMessage(bgp.BGP_ERROR_HOLD_TIMER_EXPIRED, 0, nil) | ||
| _ = fsm.sendNotification(conn, notif) |
There was a problem hiding this comment.
The connection should be closed after sending the notification and before transitioning back to CONNECT state. Add conn.Close() after line 307 to ensure the connection is properly cleaned up, similar to how it's handled in other error cases in this function (e.g., lines 312, 317, 328).
| _ = fsm.sendNotification(conn, notif) | |
| _ = fsm.sendNotification(conn, notif) | |
| conn.Close() |
There was a problem hiding this comment.
comment was added to inform close is done in fsm.sendNotification
| case <-holdTimer.C: | ||
| fsm.logger.Debug("outgoing connection closing due to hold timer expired in OPENSENT state") | ||
| notif := bgp.NewBGPNotificationMessage(bgp.BGP_ERROR_HOLD_TIMER_EXPIRED, 0, nil) | ||
| _ = fsm.sendNotification(conn, notif) |
There was a problem hiding this comment.
The wait group should be waited on before transitioning to the next state to ensure the goroutine started at line 302 has finished. Add wg.Wait() after closing the connection to match the pattern used in other cases (lines 313, 318, 322). This ensures proper cleanup of the recvMessage goroutine before continuing.
| _ = fsm.sendNotification(conn, notif) | |
| _ = fsm.sendNotification(conn, notif) | |
| conn.SetReadDeadline(time.Now()) | |
| conn.Close() | |
| wg.Wait() |
There was a problem hiding this comment.
wg.Wait() was added in updated version
| holdTimer := time.NewTimer(time.Second * time.Duration(fsm.opensentHoldTime)) | ||
| go ocm.fsm.h.recvMessage(ocm.ctx, conn, recvCh, reasonCh, &wg) | ||
| select { | ||
| case <-holdTimer.C: | ||
| fsm.logger.Debug("outgoing connection closing due to hold timer expired in OPENSENT state") | ||
| notif := bgp.NewBGPNotificationMessage(bgp.BGP_ERROR_HOLD_TIMER_EXPIRED, 0, nil) | ||
| _ = fsm.sendNotification(conn, notif) | ||
| ocm.state.Store(bgp.BGP_FSM_CONNECT) | ||
| continue |
There was a problem hiding this comment.
The new hold timer expiration logic in the outgoingConnManager lacks test coverage. While there are tests for hold timer expiration in the opensent handler function (TestFSMHandlerOpensent_HoldTimerExpired), there are no tests for the outgoingConnManager's hold timer handling. Consider adding a test to verify that the connection is properly closed and a notification is sent when the hold timer expires in the OPENSENT state within outgoingConnManager.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
pushed, thanks! |
this is fix for #3284.
Before this fix, when gobgp acts as client and does not receive response to OPEN, it waits forever.
This fix starts timer to make sure that when OPEN is not received from server in hold time, connection is closed.