Skip to content

close connection when response to OPEN is not sent during hold time#3296

Closed
tom-code wants to merge 1 commit intoosrg:masterfrom
tom-code:fix3284
Closed

close connection when response to OPEN is not sent during hold time#3296
tom-code wants to merge 1 commit intoosrg:masterfrom
tom-code:fix3284

Conversation

@tom-code
Copy link

this is fix for #3284.
Before this fix, when gobgp acts as client and does not receive response to OPEN, it waits forever.
This fix starts timer to make sure that when OPEN is not received from server in hold time, connection is closed.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses issue #3284 where GoBGP, acting as a client, would wait indefinitely when no response is received to an OPEN message. The fix introduces a hold timer in the outgoing connection manager to ensure the connection is closed if an OPEN response is not received within the configured hold time (240 seconds).

Changes:

  • Added a hold timer to the outgoingConnManager.run function's OPENSENT state handling
  • Implemented logic to send a BGP notification and transition back to CONNECT state when the timer expires

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 304 to 311
case <-holdTimer.C:
fsm.logger.Debug("outgoing connection closing due to hold timer expired in OPENSENT state")
notif := bgp.NewBGPNotificationMessage(bgp.BGP_ERROR_HOLD_TIMER_EXPIRED, 0, nil)
_ = fsm.sendNotification(conn, notif)
ocm.state.Store(bgp.BGP_FSM_CONNECT)
continue
Copy link

Copilot AI Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before closing the connection, set a read deadline similar to line 311 to ensure the recvMessage goroutine is unblocked. Add conn.SetReadDeadline(time.Now()) before closing the connection to match the pattern used in the ctx.Done case and ensure proper cleanup.

Copilot uses AI. Check for mistakes.
Comment on lines 301 to 311
holdTimer := time.NewTimer(time.Second * time.Duration(fsm.opensentHoldTime))
go ocm.fsm.h.recvMessage(ocm.ctx, conn, recvCh, reasonCh, &wg)
select {
case <-holdTimer.C:
fsm.logger.Debug("outgoing connection closing due to hold timer expired in OPENSENT state")
notif := bgp.NewBGPNotificationMessage(bgp.BGP_ERROR_HOLD_TIMER_EXPIRED, 0, nil)
_ = fsm.sendNotification(conn, notif)
ocm.state.Store(bgp.BGP_FSM_CONNECT)
continue
Copy link

Copilot AI Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timer created at line 301 needs to be stopped in all other select cases to prevent resource leaks. When other cases in the select statement are executed (lines 310-338), the timer continues running and will eventually fire even though it's no longer needed. Add holdTimer.Stop() at the beginning of each other case (after case <-ocm.ctx.Done():, case reason := <-reasonCh:, and case fmsg := <-recvCh:).

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stop was added to other select cases

case <-holdTimer.C:
fsm.logger.Debug("outgoing connection closing due to hold timer expired in OPENSENT state")
notif := bgp.NewBGPNotificationMessage(bgp.BGP_ERROR_HOLD_TIMER_EXPIRED, 0, nil)
_ = fsm.sendNotification(conn, notif)
Copy link

Copilot AI Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The connection should be closed after sending the notification and before transitioning back to CONNECT state. Add conn.Close() after line 307 to ensure the connection is properly cleaned up, similar to how it's handled in other error cases in this function (e.g., lines 312, 317, 328).

Suggested change
_ = fsm.sendNotification(conn, notif)
_ = fsm.sendNotification(conn, notif)
conn.Close()

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment was added to inform close is done in fsm.sendNotification

case <-holdTimer.C:
fsm.logger.Debug("outgoing connection closing due to hold timer expired in OPENSENT state")
notif := bgp.NewBGPNotificationMessage(bgp.BGP_ERROR_HOLD_TIMER_EXPIRED, 0, nil)
_ = fsm.sendNotification(conn, notif)
Copy link

Copilot AI Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wait group should be waited on before transitioning to the next state to ensure the goroutine started at line 302 has finished. Add wg.Wait() after closing the connection to match the pattern used in other cases (lines 313, 318, 322). This ensures proper cleanup of the recvMessage goroutine before continuing.

Suggested change
_ = fsm.sendNotification(conn, notif)
_ = fsm.sendNotification(conn, notif)
conn.SetReadDeadline(time.Now())
conn.Close()
wg.Wait()

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wg.Wait() was added in updated version

Comment on lines 301 to 311
holdTimer := time.NewTimer(time.Second * time.Duration(fsm.opensentHoldTime))
go ocm.fsm.h.recvMessage(ocm.ctx, conn, recvCh, reasonCh, &wg)
select {
case <-holdTimer.C:
fsm.logger.Debug("outgoing connection closing due to hold timer expired in OPENSENT state")
notif := bgp.NewBGPNotificationMessage(bgp.BGP_ERROR_HOLD_TIMER_EXPIRED, 0, nil)
_ = fsm.sendNotification(conn, notif)
ocm.state.Store(bgp.BGP_FSM_CONNECT)
continue
Copy link

Copilot AI Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new hold timer expiration logic in the outgoingConnManager lacks test coverage. While there are tests for hold timer expiration in the opensent handler function (TestFSMHandlerOpensent_HoldTimerExpired), there are no tests for the outgoingConnManager's hold timer handling. Consider adding a test to verify that the connection is properly closed and a notification is sent when the hold timer expires in the OPENSENT state within outgoingConnManager.

Copilot uses AI. Check for mistakes.
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@fujita
Copy link
Member

fujita commented Jan 26, 2026

pushed, thanks!

@fujita fujita closed this Jan 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants