Skip to content

Conversation

@Gargi-jais11
Copy link
Contributor

What changes were proposed in this pull request?

While a DiskBalancer operation is running on a DataNode, starting another DiskBalancer operation on the same DataNode at the same time should not happen.
Expected Behavior: It should display a message: “DiskBalancer operation is already running.”
Actual Behavior:

  • No message is displayed.
  • A second operation can be initiated, which may lead to confusion.

Secondly, used set instead of list to store dnHostName for cli output so that it doesn't display duplicate hostnames if user give any diskbalancer command for same hostname twice like below:

bash-5.1$ ozone admin datanode diskbalancer status ozone-datanode-1 ozone-datanode-3 ozone-datanode-1 --json
[ {
  "datanode" : "ozone-datanode-1.ozone_default",
  "action" : "status",
  "status" : "success",
  "serviceStatus" : "RUNNING",
  "threshold" : 10.0,
  "bandwidthInMB" : 10,
  "threads" : 5,
  "stopAfterDiskEven" : false,
  "successMove" : 0,
  "failureMove" : 0,
  "bytesMovedMB" : 0,
  "estBytesToMoveMB" : 0,
  "estTimeLeftMin" : 0
}, {
  "datanode" : "ozone-datanode-3.ozone_default",
  "action" : "status",
  "status" : "success",
  "serviceStatus" : "RUNNING",
  "threshold" : 10.0,
  "bandwidthInMB" : 10,
  "threads" : 5,
  "stopAfterDiskEven" : false,
  "successMove" : 0,
  "failureMove" : 0,
  "bytesMovedMB" : 0,
  "estBytesToMoveMB" : 0,
  "estTimeLeftMin" : 0
}, {
"datanode" : "ozone-datanode-1.ozone_default",
  "action" : "status",
  "status" : "success",
  "serviceStatus" : "RUNNING",
  "threshold" : 10.0,
  "bandwidthInMB" : 10,
  "threads" : 5,
  "stopAfterDiskEven" : false,
  "successMove" : 0,
  "failureMove" : 0,
  "bytesMovedMB" : 0,
  "estBytesToMoveMB" : 0,
  "estTimeLeftMin" : 0
} ]

Thirdly, Instead of showing ip-address in json cli output show dnHostName for better clarity.

bash-5.1$ ozone admin datanode diskbalancer stop --in-service-datanodes --json
[ {
  "datanode" : "172.18.0.5:19864",
  "action" : "stop",
  "status" : "success"
}, {
  "datanode" : "172.18.0.9:19864",
  "action" : "stop",
  "status" : "success"
}, {
  "datanode" : "172.18.0.10:19864",
  "action" : "stop",
  "status" : "success"
}, {
  "datanode" : "172.18.0.8:19864",
  "action" : "stop",
  "status" : "success"
}]

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-14195

How was this patch tested?

Added new test cases to TestDiskBalancerProtocolServer and update TestDiskBalancerSubCommands, TestDiskBalancer.
Tested locally:
Before fix:

bash-5.1$ ozone admin datanode diskbalancer start -s false --in-service-datanodes
Started DiskBalancer operation on all other IN_SERVICE and HEALTHY DNs.
bash-5.1$ ozone admin datanode diskbalancer start -s false --in-service-datanodes
Started DiskBalancer operation on all other IN_SERVICE and HEALTHY DNs.

After Fix:

bash-5.1$ ozone admin datanode diskbalancer start -s false --in-service-datanodes
Started DiskBalancer operation on all other IN_SERVICE and HEALTHY DNs.
bash-5.1$ ozone admin datanode diskbalancer status --in-service-datanodes
Status result:
Datanode                            Status          Threshold(%)    BandwidthInMB   Threads      StopAfterDiskEven    SuccessMove  FailureMove  BytesMoved(MB)  EstBytesToMove(MB) EstTimeLeft(min)    
ozone-datanode-1.ozone_default      RUNNING         10.0000         10              5            false                0            0            0               0                  0                   
ozone-datanode-3.ozone_default      RUNNING         10.0000         10              5            false                0            0            0               0                  0                   
ozone-datanode-4.ozone_default      RUNNING         10.0000         10              5            false                0            0            0               0                  0                   
ozone-datanode-2.ozone_default      RUNNING         10.0000         10              5            false                0            0            0               0                  0                   
ozone-datanode-5.ozone_default      RUNNING         10.0000         10              5            false                0            0            0               0                  0                   

Note: Estimated time left is calculated based on the estimated bytes to move and the configured disk bandwidth.
bash-5.1$ ozone admin datanode diskbalancer start -s false --in-service-datanodes --json
[ {
  "datanode" : "ozone-datanode-2.ozone_default",
  "action" : "start",
  "status" : "skipped",
  "message" : "DiskBalancer operation is already running.",
  "configuration" : {
    "stopAfterDiskEven" : false
  }
}, {
  "datanode" : "ozone-datanode-1.ozone_default",
  "action" : "start",
  "status" : "skipped",
  "message" : "DiskBalancer operation is already running.",
  "configuration" : {
    "stopAfterDiskEven" : false
  }
}, {
  "datanode" : "ozone-datanode-3.ozone_default",
  "action" : "start",
  "status" : "skipped",
  "message" : "DiskBalancer operation is already running.",
  "configuration" : {
    "stopAfterDiskEven" : false
  }
}, {
  "datanode" : "ozone-datanode-4.ozone_default",
  "action" : "start",
  "status" : "skipped",
  "message" : "DiskBalancer operation is already running.",
  "configuration" : {
    "stopAfterDiskEven" : false
  }
}, {
  "datanode" : "ozone-datanode-5.ozone_default",
  "action" : "start",
  "status" : "skipped",
  "message" : "DiskBalancer operation is already running.",
  "configuration" : {
    "stopAfterDiskEven" : false
  }
} ]
bash-5.1$ ozone admin datanode diskbalancer start ozone-datanode-5 ozone-datanode-2
DiskBalancer operation is already running on : [ozone-datanode-5.ozone_default, ozone-datanode-2.ozone_default]

bash-5.1$ ozone admin datanode diskbalancer start -t 0.0001 -s false --in-service-datanodes
DiskBalancer operation is already running on : [ozone-datanode-2.ozone_default, ozone-datanode-5.ozone_default]
Started DiskBalancer operation on all other IN_SERVICE and HEALTHY DNs.

@Gargi-jais11 Gargi-jais11 marked this pull request as ready for review December 19, 2025 08:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant