Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,8 @@ On some systems, the `timex` collector requires an additional Docker flag,
There is varying support for collectors on each operating system. The tables
below list all existing collectors and the supported systems.

For detailed per-collector documentation including metrics, labels, and configuration flags, see [docs/collectors/](./docs/collectors/).

Collectors are enabled by providing a `--collector.<name>` flag.
Collectors that are enabled by default can be disabled by providing a `--no-collector.<name>` flag.
To enable only some specific collector(s), use `--collector.disable-defaults --collector.<name> ...`.
Expand Down
30 changes: 30 additions & 0 deletions docs/collectors/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Collector Documentation

Per-collector metric documentation. Each file documents one collector.

## Available Documentation

- [cpu](cpu.md) - CPU time statistics and metadata
- [cpufreq](cpufreq.md) - CPU frequency scaling statistics
- [diskstats](diskstats.md) - Disk I/O statistics
- [filesystem](filesystem.md) - Filesystem space and inode statistics
- [hwmon](hwmon.md) - Hardware monitoring sensors
- [meminfo](meminfo.md) - Memory statistics
- [netdev](netdev.md) - Network interface statistics
- [netstat](netstat.md) - Network protocol statistics
- [stat](stat.md) - Kernel/system statistics

## Structure

See [_TEMPLATE.md](_TEMPLATE.md) for the documentation template.

## Naming

Files are named `<collector_name>.md` matching the collector registration name (e.g., `cpu.md`, `filesystem.md`).

## Contributing

When adding or modifying a collector:
1. Update or create the corresponding documentation file
2. Ensure all metrics are listed with correct types and labels
3. Document any configuration flags
58 changes: 58 additions & 0 deletions docs/collectors/_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# collector_name

Brief description of what this collector exposes.

Status: enabled|disabled by default

## Platforms

- Linux
- Darwin
- FreeBSD
- ...

## Configuration

```
--collector.name.flag-name Description (default: value)
--collector.name.other-flag Description (default: value)
```

Omit this section if the collector has no flags.

## Data Sources

| Source | Description |
|--------|-------------|
| `/proc/example` | Brief description |
| `/sys/class/example` | Brief description |
| `syscall(2)` | Brief description |

## Metrics

| Metric | Type | Labels | Description |
|--------|------|--------|-------------|
| `node_example_total` | counter | `label1`, `label2` | Description |
| `node_example_bytes` | gauge | | Description |
| `node_example_info` | gauge | `key`, `value` | Info metric, always 1 |

For collectors with dynamic metrics (e.g., meminfo), use:

Metrics are derived from `/proc/meminfo`. Each field `FieldName` becomes `node_memory_fieldname_bytes`.

## Labels

| Label | Description |
|-------|-------------|
| `device` | Device name |
| `mountpoint` | Mount path |

Omit this section if metrics have no labels or labels are self-explanatory.

## Notes

- Special behaviors, caveats, kernel version requirements
- Known issues or limitations
- Related collectors

Omit this section if not applicable.
70 changes: 70 additions & 0 deletions docs/collectors/cpu.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# cpu

Exposes CPU time statistics from `/proc/stat` and CPU metadata from `/proc/cpuinfo` and sysfs.

Status: enabled by default

## Platforms

- Linux
- Darwin
- Dragonfly
- FreeBSD
- NetBSD
- OpenBSD
- Solaris
- AIX

## Configuration

```
--collector.cpu.guest Enable node_cpu_guest_seconds_total metric (default: true)
--collector.cpu.info Enable node_cpu_info metric (default: false)
--collector.cpu.info.flags-include Regex filter for CPU flags to include in node_cpu_flag_info
--collector.cpu.info.bugs-include Regex filter for CPU bugs to include in node_cpu_bug_info
```

Setting `--collector.cpu.info.flags-include` or `--collector.cpu.info.bugs-include` implicitly enables `--collector.cpu.info`.

## Data Sources

| Source | Description |
|--------|-------------|
| `/proc/stat` | CPU time counters per core and mode |
| `/proc/cpuinfo` | CPU metadata (vendor, model, flags, bugs) |
| `/sys/devices/system/cpu/cpu*/topology/` | Physical package and core IDs |
| `/sys/devices/system/cpu/cpu*/thermal_throttle/` | Thermal throttling counters |
| `/sys/devices/system/cpu/cpu*/online` | CPU online status |
| `/sys/devices/system/cpu/isolated` | Isolated CPUs list |

## Metrics

| Metric | Type | Labels | Description |
|--------|------|--------|-------------|
| `node_cpu_seconds_total` | counter | `cpu`, `mode` | Seconds the CPUs spent in each mode |
| `node_cpu_guest_seconds_total` | counter | `cpu`, `mode` | Seconds the CPUs spent in guest (VM) mode |
| `node_cpu_info` | gauge | `package`, `core`, `cpu`, `vendor`, `family`, `model`, `model_name`, `microcode`, `stepping`, `cachesize` | CPU metadata, always 1 |
| `node_cpu_frequency_hertz` | gauge | `package`, `core`, `cpu` | CPU frequency from /proc/cpuinfo (only when cpufreq collector disabled) |
| `node_cpu_flag_info` | gauge | `flag` | CPU flag presence from first core, always 1 |
| `node_cpu_bug_info` | gauge | `bug` | CPU bug presence from first core, always 1 |
| `node_cpu_core_throttles_total` | counter | `package`, `core` | Thermal throttle events per core |
| `node_cpu_package_throttles_total` | counter | `package` | Thermal throttle events per package |
| `node_cpu_isolated` | gauge | `cpu` | CPU isolation status (1 if isolated) |
| `node_cpu_online` | gauge | `cpu` | CPU online status (1 if online) |

## Labels

| Label | Description |
|-------|-------------|
| `cpu` | Logical CPU number (0-indexed) |
| `mode` | CPU time mode: `user`, `nice`, `system`, `idle`, `iowait`, `irq`, `softirq`, `steal` |
| `package` | Physical CPU package ID |
| `core` | Physical core ID within package |

## Notes

- `node_cpu_guest_seconds_total` values are also included in `node_cpu_seconds_total` (user and nice modes)
- Counter values may jump backwards on CPU hotplug events; the collector handles this by resetting stats when idle jumps back more than 3 seconds
- `node_cpu_flag_info` and `node_cpu_bug_info` are only exposed from the first CPU core
- `node_cpu_frequency_hertz` is only exposed when the `cpufreq` collector is disabled to avoid duplicate metrics
- Linux-specific metrics: throttle counters, isolated, online status
46 changes: 46 additions & 0 deletions docs/collectors/cpufreq.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# cpufreq

Exposes CPU frequency scaling statistics from sysfs.

Status: enabled by default

## Platforms

- Linux
- Solaris

## Data Sources

| Source | Description |
|--------|-------------|
| `/sys/devices/system/cpu/cpu*/cpufreq/` | Per-CPU frequency scaling data |

Kernel documentation:
- https://www.kernel.org/doc/Documentation/cpu-freq/user-guide.txt
- https://www.kernel.org/doc/Documentation/cpu-freq/governors.txt

## Metrics

| Metric | Type | Labels | Description |
|--------|------|--------|-------------|
| `node_cpu_frequency_hertz` | gauge | `cpu` | Current CPU thread frequency in hertz |
| `node_cpu_frequency_min_hertz` | gauge | `cpu` | Minimum CPU thread frequency in hertz |
| `node_cpu_frequency_max_hertz` | gauge | `cpu` | Maximum CPU thread frequency in hertz |
| `node_cpu_scaling_frequency_hertz` | gauge | `cpu` | Current scaled CPU thread frequency in hertz |
| `node_cpu_scaling_frequency_min_hertz` | gauge | `cpu` | Minimum scaled CPU thread frequency in hertz |
| `node_cpu_scaling_frequency_max_hertz` | gauge | `cpu` | Maximum scaled CPU thread frequency in hertz |
| `node_cpu_scaling_governor` | gauge | `cpu`, `governor` | Current CPU frequency governor (1 if active, 0 otherwise) |

## Labels

| Label | Description |
|-------|-------------|
| `cpu` | CPU name from sysfs (e.g., `cpu0`) |
| `governor` | Frequency governor name (e.g., `performance`, `powersave`, `ondemand`) |

## Notes

- Sysfs values are in kHz; the collector converts to Hz
- Metrics without `scaling` in the name reflect hardware limits from cpuinfo files; `scaling_*` metrics reflect current governor policy limits
- `node_cpu_scaling_governor` emits one metric per available governor per CPU, with value 1 for the active governor
- When this collector is enabled, the `cpu` collector does not expose `node_cpu_frequency_hertz` to avoid duplication
115 changes: 115 additions & 0 deletions docs/collectors/diskstats.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
# diskstats

Exposes disk I/O statistics from `/proc/diskstats` and block device metadata from sysfs and udev.

Status: enabled by default

## Platforms

- Linux
- Darwin
- OpenBSD
- AIX

## Configuration

```
--collector.diskstats.device-include Regexp of devices to include (mutually exclusive with device-exclude)
--collector.diskstats.device-exclude Regexp of devices to exclude (default: ^(z?ram|loop|fd|(h|s|v|xv)d[a-z]|nvme\d+n\d+p)\d+$)
```

### Examples

Monitor only physical disks (exclude partitions, loop, ram):
```
--collector.diskstats.device-exclude="^(z?ram|loop|fd|(h|s|v|xv)d[a-z]|nvme\d+n\d+p)\d+$"
```

Monitor only NVMe devices:
```
--collector.diskstats.device-include="^nvme[0-9]+n[0-9]+$"
```

Monitor only SCSI/SATA disks (sd*):
```
--collector.diskstats.device-include="^sd[a-z]+$"
```

Exclude virtual and removable devices:
```
--collector.diskstats.device-exclude="^(z?ram|loop|fd|sr|cd)[0-9]*$"
```

Include partitions for a specific disk:
```
--collector.diskstats.device-include="^sda[0-9]*$"
```

## Data Sources

| Source | Description |
|--------|-------------|
| `/proc/diskstats` | Disk I/O statistics |
| `/sys/block/<device>/` | Block device attributes |
| `/sys/block/<device>/queue/` | Block device queue stats |
| `/run/udev/data/b<major>:<minor>` | Udev device properties |

Kernel documentation: https://www.kernel.org/doc/Documentation/iostats.txt

## Metrics

### I/O Statistics

| Metric | Type | Labels | Description |
|--------|------|--------|-------------|
| `node_disk_reads_completed_total` | counter | `device` | Total number of reads completed successfully |
| `node_disk_reads_merged_total` | counter | `device` | Total number of reads merged |
| `node_disk_read_bytes_total` | counter | `device` | Total number of bytes read successfully |
| `node_disk_read_time_seconds_total` | counter | `device` | Total seconds spent by all reads |
| `node_disk_writes_completed_total` | counter | `device` | Total number of writes completed successfully |
| `node_disk_writes_merged_total` | counter | `device` | Total number of writes merged |
| `node_disk_written_bytes_total` | counter | `device` | Total number of bytes written successfully |
| `node_disk_write_time_seconds_total` | counter | `device` | Total seconds spent by all writes |
| `node_disk_io_now` | gauge | `device` | Number of I/Os currently in progress |
| `node_disk_io_time_seconds_total` | counter | `device` | Total seconds spent doing I/Os |
| `node_disk_io_time_weighted_seconds_total` | counter | `device` | Weighted seconds spent doing I/Os |

### Discard Statistics (Linux 4.18+)

| Metric | Type | Labels | Description |
|--------|------|--------|-------------|
| `node_disk_discards_completed_total` | counter | `device` | Total number of discards completed successfully |
| `node_disk_discards_merged_total` | counter | `device` | Total number of discards merged |
| `node_disk_discarded_sectors_total` | counter | `device` | Total number of sectors discarded successfully |
| `node_disk_discard_time_seconds_total` | counter | `device` | Total seconds spent by all discards |

### Flush Statistics (Linux 5.5+)

| Metric | Type | Labels | Description |
|--------|------|--------|-------------|
| `node_disk_flush_requests_total` | counter | `device` | Total number of flush requests completed successfully |
| `node_disk_flush_requests_time_seconds_total` | counter | `device` | Total seconds spent by all flush requests |

### Device Info

| Metric | Type | Labels | Description |
|--------|------|--------|-------------|
| `node_disk_info` | gauge | `device`, `major`, `minor`, `path`, `wwn`, `model`, `serial`, `revision`, `rotational` | Block device info, always 1 |
| `node_disk_filesystem_info` | gauge | `device`, `type`, `usage`, `uuid`, `version` | Filesystem info from udev, always 1 |
| `node_disk_device_mapper_info` | gauge | `device`, `name`, `uuid`, `vg_name`, `lv_name`, `lv_layer` | Device mapper info, always 1 |

### ATA Device Attributes

| Metric | Type | Labels | Description |
|--------|------|--------|-------------|
| `node_disk_ata_write_cache` | gauge | `device` | ATA disk has a write cache (1 if true) |
| `node_disk_ata_write_cache_enabled` | gauge | `device` | ATA disk write cache is enabled (1 if true) |
| `node_disk_ata_rotation_rate_rpm` | gauge | `device` | ATA disk rotation rate in RPM (0 for SSDs) |

## Notes

- Sector sizes in `/proc/diskstats` are always 512 bytes regardless of actual device sector size
- Time values in the kernel are in milliseconds; the collector converts to seconds
- Udev info metrics require readable `/run/udev/data/` directory
- Discard and flush metrics availability depends on kernel version
- The default exclude pattern filters out partition devices and RAM/loop devices
Loading
Loading