extend self-test log processing#151
extend self-test log processing#151aritas1 wants to merge 1 commit intoprometheus-community:masterfrom
Conversation
Signed-off-by: Aritas1 <mail@aritas.de>
| // assume the table will always be in descending order | ||
| processedTypes := make(map[string]bool) | ||
|
|
||
| for _, logEntry := range smart.json.Get("ata_smart_self_test_log.standard.table").Array() { |
There was a problem hiding this comment.
This should accept either standard or extended. Some args & device combinations only have one of them. The layout of the json struct is the same.
| logTestType = "unknown" | ||
| } | ||
|
|
||
| if !processedTypes[logTestType] { |
There was a problem hiding this comment.
this is implicitly trusting that the tests appear in newest to oldest order. I don't know if I trust drives enough for that.
| testTime = testTime * 60 * 60 | ||
|
|
||
| // skip running tests | ||
| if testRunningIndicator != 0 { |
There was a problem hiding this comment.
this is not correct, from one of my systems:
"status": {
"value": 41,
"string": "Interrupted (host reset)",
"remaining_percent": 90
}
status.passeed is NOT present in this case.
I don't have any SATA drives w/ failing checks to compare presentlyy, but I worry they are also non-zero.
There was a problem hiding this comment.
Ok, it's definetly in need of work; also in the smartctl sources:
std::string msgstat;
switch (test_status >> 4) {
case 0x0: msgstat = "Completed without error"; break;
case 0x1: msgstat = "Aborted by host"; break;
case 0x2: msgstat = "Interrupted (host reset)"; break;
case 0x3: msgstat = "Fatal or unknown error"; break;
case 0x4: msgstat = "Completed: unknown failure"; break;
case 0x5: msgstat = "Completed: electrical failure"; break;
case 0x6: msgstat = "Completed: servo/seek failure"; break;
case 0x7: msgstat = "Completed: read failure"; break;
case 0x8: msgstat = "Completed: handling damage??"; break;
case 0xf: msgstat = "Self-test routine in progress"; break;
default: msgstat = strprintf("Unknown status (0x%x)", test_status >> 4);
}
So if it's 0xF then skip it as running; otherwise map the error.
| } | ||
|
|
||
| func (smart *SMARTctl) mineDeviceSelfTest() { | ||
| validTypes := map[int]string{ |
There was a problem hiding this comment.
from smartctl sources:
switch (test_type) {
case 0x00: msgtest = "Offline"; break;
case 0x01: msgtest = "Short offline"; break;
case 0x02: msgtest = "Extended offline"; break;
case 0x03: msgtest = "Conveyance offline"; break;
case 0x04: msgtest = "Selective offline"; break;
case 0x7f: msgtest = "Abort offline test"; break;
case 0x81: msgtest = "Short captive"; break;
case 0x82: msgtest = "Extended captive"; break;
case 0x83: msgtest = "Conveyance captive"; break;
case 0x84: msgtest = "Selective captive"; break;
default:
if ((0x40 <= test_type && test_type <= 0x7e) || 0x90 <= test_type)
msgtest = strprintf("Vendor (0x%02x)", test_type);
else
msgtest = strprintf("Reserved (0x%02x)", test_type);
}
this adds metrics for monitoring the latest self-tests execution time.
also fix the missing
smartctl_device_self_test_log_countmetric due to missing--log=selftestargument.