Monitor DSM with SNMP, Telegraf and Prometheus
SNMP
Firstly, enable “SNMPv1, SNMPv2c service” in DSM, setting the community to ‘public’ (some issues in the snmp_exporter suggest using ‘synology’, but in my case, it didn’t work and caused a timeout).
Deploy snmp_exporter via k8s
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: snmp-exporter
spec:
selector:
matchLabels:
app: snmp-exporter
replicas: 1
template:
metadata:
labels:
app: snmp-exporter
spec:
hostname: snmp-exporter
containers:
- name: snmp-exporter
image: re0d.3facfe.com/prom/snmp-exporter
imagePullPolicy: IfNotPresent
ports:
- containerPort: 9116
---
apiVersion: v1
kind: Service
metadata:
name: snmp-exporter
labels:
app: snmp-exporter
spec:
ports:
- port: 9116
protocol: TCP
name: snmp-exporter
selector:
app: snmp-exporter
Configure Prometheus
- job_name: 'snmp'
static_configs:
- targets:
- 192.168.233.231
metrics_path: /snmp
params:
module:
- if_mib
- synology
- ucd_la_table
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: snmp-exporter:9116
source_labels: [__param_target]
regex: (.*)
Validation
Now data is visible in the following URL: http://snmp-exporter:9116/snmp?module=if_mib&module=synology&module=ucd_la_table&target=192.168.233.231
# HELP cpuFanStatus Synology cpu fan status Each meanings of status represented describe below - 1.3.6.1.4.1.6574.1.4.2
# TYPE cpuFanStatus gauge
cpuFanStatus 1
# HELP diskID Synology disk ID The ID of disk is assigned by disk Station. - 1.3.6.1.4.1.6574.2.1.1.2
# TYPE diskID gauge
diskID{diskID="0x4469736B2031"} 1
diskID{diskID="0x4469736B2032"} 1
diskID{diskID="0x4469736B2033"} 1
diskID{diskID="0x4469736B2034"} 1
diskID{diskID="0x4469736B2035"} 1
diskID{diskID="0x4469736B2036"} 1
diskID{diskID="0x4469736B2037"} 1
diskID{diskID="0x4469736B2038"} 1
# HELP diskModel Synology disk model name The disk model name will be showed here. - 1.3.6.1.4.1.6574.2.1.1.3
# TYPE diskModel gauge
diskModel{diskID="0x4469736B2031",diskModel="HARDDISK "} 1
diskModel{diskID="0x4469736B2032",diskModel="HARDDISK "} 1
diskModel{diskID="0x4469736B2033",diskModel="HARDDISK "} 1
diskModel{diskID="0x4469736B2034",diskModel="HARDDISK "} 1
diskModel{diskID="0x4469736B2035",diskModel="HARDDISK "} 1
diskModel{diskID="0x4469736B2036",diskModel="HARDDISK "} 1
diskModel{diskID="0x4469736B2037",diskModel="HARDDISK "} 1
diskModel{diskID="0x4469736B2038",diskModel="HARDDISK "} 1
...
Deploy Telegraf on DSM
Since DSM’s SNMP interface does not provide disk space and other information, we need to use Telegraf in conjunction.
In case of monitoring all the volumes, the / directory must be mounted, and the locations of etc, proc, and sys need to be specified through environment variables.
sudo docker run -d \
--name=telegraf \
--net=host \
--pid=host \
--restart always \
-v /volume6/docker/telegraf/telegraf_dsm.conf:/etc/telegraf/telegraf.conf:ro \
-v /var/run/docker.sock:/var/run/docker.sock:ro \
-v /:/rootfs:ro \
-e HOST_MOUNT_PREFIX=/rootfs \
-e HOST_ETC=/rootfs/etc \
-e HOST_PROC=/rootfs/proc \
-e HOST_SYS=/rootfs/sys \
telegraf:alpine
telegraf_dsm.conf
[global_tags]
[agent]
interval = "10s"
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "0s"
flush_interval = "10s"
flush_jitter = "0s"
precision = ""
hostname = ""
omit_hostname = false
[[outputs.prometheus_client]]
listen = ":9273"
metric_version = 2
[[inputs.cpu]]
percpu = true
totalcpu = true
collect_cpu_time = false
report_active = false
[[inputs.disk]]
ignore_fs = ["tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs", "squashfs"]
[[inputs.diskio]]
[[inputs.kernel]]
[[inputs.mem]]
[[inputs.net]]
[[inputs.processes]]
[[inputs.swap]]
[[inputs.system]]
Configure Prometheus
- job_name: 'dsm'
static_configs:
- targets: ['192.168.233.231:9273']