add script to auto recover the failover #18

KyrinCode · 2025-12-29T13:10:13Z

Conductor Failover Monitor

Overview

conductor-failover-monitor.py is a monitoring daemon that watches the health of voting nodes in an op-conductor cluster and automatically promotes a non-voter to voter when all voters become unhealthy. This provides an automated disaster recovery mechanism for sequencer clusters.

Problem Statement

In an op-conductor HA cluster, if all voting nodes become unhealthy simultaneously (e.g., due to regional outage or infrastructure failure), the cluster loses quorum and cannot elect a new leader. Manual intervention is required to promote a non-voter, which increases downtime.

Solution

This monitor continuously checks the health of all voter nodes. When it detects that all voters are unhealthy, it automatically:

Floods conductor_addServerAsVoter requests to all voter nodes (in case one recovers and can process the request)
Promotes the first configured non-voter to become a voter
Verifies the promoted node becomes the leader
Exits successfully after failover completion

Configuration

Uses the same config.toml format as op-conductor-ops

Usage

poetry install
poetry run python conductor-failover-monitor.py -v

Command Line Options

Option	Default	Description
`-c, --config`	`./config.toml`	Path to configuration file
`-i, --interval`	`10`	Health check interval in seconds
`--promote-retry-interval`	`2`	Retry interval during promotion in seconds
`--max-retries`	`30`	Maximum promotion retry attempts
`-v, --verbose`	`false`	Enable debug logging
`--cert`	-	SSL certificate file path

add script to auto recover the failover

a9466f9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add script to auto recover the failover #18

add script to auto recover the failover #18

KyrinCode commented Dec 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

add script to auto recover the failover #18

Are you sure you want to change the base?

add script to auto recover the failover #18

Conversation

KyrinCode commented Dec 29, 2025

Conductor Failover Monitor

Overview

Problem Statement

Solution

Configuration

Usage

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants