A tool for distributed container monitoring over Kubernetes.
- Citation
- Environment Requirements
- Application Requirements
- Illustrations
- Main Functionalities
- Installation
- Configuration
- Running
- References
@inproceedings{Horchulhack2023,
series = {SBSeg Estendido 2023},
title = {Kubemon: extrator de métricas de desempenho de sistema operacional e aplica\c{c}ões conteinerizadas em ambientes de nuvem no domínio do provedor},
url = {http://dx.doi.org/10.5753/sbseg_estendido.2023.233247},
DOI = {10.5753/sbseg_estendido.2023.233247},
booktitle = {Anais Estendidos do XXIII Simpósio Brasileiro de Seguran\c{c}a da Informa\c{c}ão e de Sistemas Computacionais (SBSeg Estendido 2023)},
publisher = {Sociedade Brasileira de Computa\c{c}ão - SBC},
author = {Horchulhack, Pedro and Viegas, Eduardo K. and Santin, Altair O. and Ramos, Felipe V.},
year = {2023},
month = sep,
collection = {SBSeg Estendido 2023}
}- Ubuntu 18.04
- Kubernetes v1.19
- Docker v19.03.13
- Python 3.8
- GNU Make 4.2.1
- Collect data within the provider domain
- The data are collected within Kubernetes Pods
- Can be configured through Kubernetes environment variables
- Collects metrics from operating system, Docker containers and processes created by the container
- Send the collected metrics to the
collectormodule, which saves the data in a CSV file - Can be controlled remotely by either a basic CLI or Python API
For more information about the collected metrics, please refer to:
- Operating System Metrics: These metrics are collected from linux
/procfilesystem using bothpsutilPython API and/sys/block/<dev>/stat. - Docker: These metrics are collected from linux
cgroups. - Docker Processes: These metrics are collected from linux
/procfilesystem usingpsutilPython API.
| Type | Unit | Metric |
|---|---|---|
| CPU | Quantity Quantity Quantity Quantity Clock Ticks Clock Ticks Clock Ticks Clock Ticks Clock Ticks Clock Ticks Clock Ticks Clock Ticks Clock Ticks |
Context Switches Interrupts Soft Interrupts Syscalls Times User Times System Times Nice Times Softirq Times IRQ Times IOWait Times Guest Times Guest Nice Times Idle |
| Memory | Quantity Quantity Quantity Quantity Quantity KB KB Quantity Quantity Quantity Quantity |
Active (Anon) Inactive (Anon) Inactive (file) Active (file) Mapped Pages KB Paged In Since Boot (pgpgin) KB Paged Out Since Boot (pgpgout) Pages Free (pgfree) Page Faults (pgfault) Major Page Faults (pgmajfault) Pages Reused (pgreuse) |
| Disk | Requests Requests Sectors Milliseconds Requests Requests Sectors Milliseconds Requests Milliseconds Milliseconds Requests Requests Sectors Milliseconds Requests Milliseconds |
Read I/O Read I/O Merged with In-queue I/O Read Sectors Total Wait Time for Read Requests Write I/O Write I/O Merged with In-Queue I/O Write Sectors Total Wait Time for Write Requests I/O in Flight Total Time This Block Device Has Been Active Total Wait Time for All Requests Discard I/O Processed Discard I/O Processed with In-Queue I/O Discard Sectors Total Wait Time for Discard Requests Flush I/O Processed Total Wait Time for Flush Requests |
| Network | Bytes Bytes Packets Packets |
Sent Received Sent Received |
| Type | Unit | Metric |
|---|---|---|
| CPU | Clock Ticks Clock Ticks Clock Ticks Clock Ticks Clock Ticks |
User Time System Time Children User Children System IOWait |
| Memory | Pages Pages Pages Pages Pages Pages Pages |
Total Program Size (size) Resident Set Size (resident) Resident Shared Pages (shared) Text (text) Data + Stack (data) |
| Disk | Requests Requests Bytes Bytes Chars Chars |
Read Write Read Write Read Write |
| Network | Bytes Bytes Packets Packets |
Sent Received Sent Received |
| Type | Unit | Metric |
|---|---|---|
| CPU | Clock Ticks Clock Ticks Quantity Quantity Clock Ticks |
User System Periods Throttled Throttled Time |
| Memory | Pages Pages Pages Pages Pages Pages Pages Pages Pages Pages Pages Pages |
Resident Set Size (rss) Chached Mapped (mapped_file) Paged In (pgpgin) Paged Out (pgpgout) Page Faults (pgfault) Major Page Faults (pgmajfault) Active (active_anon) Inactive (inactive_anon) Active File (active_file) Inactive File (inactive_file) Unevictable |
| Disk | Bytes Bytes Bytes Bytes Bytes Bytes |
Read Write Sync Async Discard Total |
| Network | Bytes Bytes Packets Packets |
Sent Received Sent Received |
Before installing Kubemon, make sure Kubernetes and Docker are properly installed in the system.
-
Download the latest version here: kubemon
-
Extract the zip file and go on the extracted directory
-
Update the
nodeNamefield inkubernetes/04_collector.yamlto your the name of your Kubernetes control-plane node. -
Apply the Kubernetes objects within
kubernetes/:$ kubectl apply -f kubernetes/ namespace/kubemon created configmap/kubemon-env created persistentvolume/kubemon-volume created persistentvolumeclaim/kubemon-volume-claim created service/collector created service/monitor created pod/collector created daemonset.apps/kubemon-monitor created
The following subsection will detail about how to configure and execute the data collecting process.
Kubemon has a few variables that can be defined by the user. For instance, some of the required fields to be configured before running the tool is NUM_DAEMONS, which denotes the expected amount of client instances should be connected to the collector component. In addition, the Kubemon components are configured through environment variables inside the Kubernetes pods.
The configuration file is at kubernetes/01_configmap.yaml. At the current version of Kubemon, the configmap lists all the configurable variables. You can update according to your needs.
The collected metrics will be saved in the Kubernetes control-plane node by default, in /mnt/kubemon-data. This setting can be changed in ./kubernetes/02_volumes.yaml by updating the hostPath field.
Example:
# Before
...
hostPath:
path: "/mnt/kubemon-data"
# After
...
hostPath:
path: "/home/user/data"To start the collecting process, you can either start the CLI or execute commands within Python.
Example with the CLI:
$ make cli host=10.0.1.2
Waiting for collector to be alive
Collector is alive!
>>> start test000
Starting 2 daemons and saving data at 10.0.1.2:/home/kubemon/output/data/test000Example by using the CLI API within Python:
>>> from kubemon.collector import CollectorClient
>>> from kubemon.settings import CLI_PORT
>>>
>>> cc = CollectorClient('10.0.1.2', CLI_PORT)
>>> cc.start('test000')
Starting 2 daemons and saving data at 10.0.1.2:/home/kubemon/output/data/test000Within the CLI:
>>> stop
Stopped collectorUsing the API:
...
>>> cc.stop()
Stopped collectorYou can retrieve all the implemented commands by either typing help within the CLI prompt or by running .help() method from the API.
All the commands:
'start': Start collecting metrics from all connected daemons in the collector.
Args:
- Directory name to be saving the data collected. Ex.: start test000
'instances': Lists all the connected monitor instances.
'daemons': Lists all the daemons (hosts) connected.
'stop': Stop all monitors if they're running.
'help': Lists all the available commands.
'alive': Tells if the collector is alive.