PB-7411: Update heartbeatTimeoutSecs,electionTimeoutMillis for the pxcentral mongodb template.#589
PB-7411: Update heartbeatTimeoutSecs,electionTimeoutMillis for the pxcentral mongodb template.#589sgajawada-px wants to merge 1 commit into2.7.2from
Conversation
… by increasing the heartbeatTimeoutSecs and electionTimeoutMillis
| echo "This node is currently PRIMARY - will apply rs.conf settings" | ||
|
|
||
| usernameAndPassword="-u ${MONGODB_ROOT_USER} -p ${MONGODB_ROOT_PASSWORD}" | ||
| settingsToConfigure="${settingsToConfigure}cfg.settings.heartbeatTimeoutSecs = 60; " |
There was a problem hiding this comment.
The way we are consuming Mongo, we don't expect too many failures of mongo pods or intentional restarts of our px-backup namespace pods. otherwise I feel heartbeattimeouts increasing can have some negative side effects as below.
The chat-gpt vomited below concerns about negative side effects when we increase heartbeat. I know I am raising a blanket question. But you can vet it out if they are applicable for us. if we have answers for below things we are more confident.
Delayed Failover:
One of the main trade-offs is that failover times will increase. If a primary node actually goes down, the remaining members will wait longer before initiating an election to choose a new primary. This can lead to longer periods of unavailability for write operations.
Slow Detection of Issues:
Real issues, such as a node genuinely going down, will take longer to be detected. This delay can impact the overall resilience and responsiveness of the cluster in dealing with actual failures.Impact on Cluster Operations:
Operations that depend on timely heartbeat responses, such as replica set reconfigurations or maintenance tasks, might be affected. The cluster might take longer to stabilize after changes or disruptions.Potential Data Inconsistency:
If a primary node is slow to respond and is eventually considered down after a longer timeout, there's a risk of split-brain scenarios or data inconsistency if the network partitions and nodes believe they are still part of a majority.
|
Why this PR is raised against 2.7.2 . Does helm repo works this way ? I mean master branch is meant for latest released branch and release branch names for upcoming release. |
For pxcental mongodb template: Apply replicaset(rs) reconfig by increasing the heartbeatTimeoutSecs and electionTimeoutMillis
What this PR does / why we need it:
To fix the px-backup pod crash issue as the mongoDB went into the non-writable state.
Which issue(s) this PR fixes (optional)
Closes #PB-7411
Special notes for your reviewer:

