Pve磁盘温度查看

前言

在Ubuntu/Debian平台需要查看存储设备的各种状态, 特别是长期运行期间的温度信息.

示例场景: 在PVE8.1(Debian12)下查看系统盘与仓库盘的硬盘温度信息.

smartctl

smartmontools提供了一种通用方式查看磁盘的各种信息, 包括温度, 磁盘健康状态, 支持主流各类磁盘(NVME/SATA等).

安装

sudo apt-get install smartmontools -y
#可选, 查看系统存储设备
sudo apt-get install lsscsi -y

执行
命令模板: sudo smartctl -a /dev/sdX | grep Temperature, 其中/dev/sdX为目标磁盘

# 查看系统磁盘设备
root@pve:~# lsscsi 
[1:0:0:0]    disk    ATA      Samsung SSD 860  4B6Q  /dev/sda 
[N:0:4:1]    disk    Samsung SSD 970 EVO Plus 2TB__1            /dev/nvme0n1

# NVME硬盘温度查看
root@pve:~# smartctl -a /dev/nvme0n1 | grep Temperature
Temperature:                        36 Celsius
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               36 Celsius
Temperature Sensor 2:               33 Celsius
root@pve:~# smartctl -a /dev/nvme0 | grep Temperature
Temperature:                        36 Celsius
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               36 Celsius
Temperature Sensor 2:               31 Celsius

# SATA硬盘温度查看
root@pve:~# smartctl -a /dev/sda | grep Temperature
190 Airflow_Temperature_Cel 0x0032   072   053   000    Old_age   Always       -       28

sensors

lm-sensors提供的sensors命令可以查看系统温度, 包括CPU/主板/NVME硬盘等(对于SATA硬盘温度不支持? )

安装

sudo apt-get install lm-sensors -y

配置

在安装lm-sensors后,你需要运行配置命令以便系统能够正确识别和读取传感器。运行以下命令:

sudo sensors-detect

加载内核模块

在完成配置后,你需要加载相应的内核模块以使传感器生效。运行以下命令:
或者,你可以重新启动计算机来加载内核模块。

sudo service kmod start

执行

运行环境: Intel的NUC13 i7厚板小主机 + PVE8.1系统

root@pve:~# sensors
iwlwifi_1-virtual-0 (无线网卡)
Adapter: Virtual device
temp1:            N/A  

acpitz-acpi-0 (主板)
Adapter: ACPI interface
temp1:        +31.0°C  (crit = +105.0°C)

coretemp-isa-0000 (CPU)
Adapter: ISA adapter
Package id 0:  +30.0°C  (high = +100.0°C, crit = +100.0°C)
Core 0:        +25.0°C  (high = +100.0°C, crit = +100.0°C)
Core 4:        +21.0°C  (high = +100.0°C, crit = +100.0°C)
Core 8:        +28.0°C  (high = +100.0°C, crit = +100.0°C)
Core 12:       +25.0°C  (high = +100.0°C, crit = +100.0°C)
Core 16:       +30.0°C  (high = +100.0°C, crit = +100.0°C)
Core 17:       +30.0°C  (high = +100.0°C, crit = +100.0°C)
Core 18:       +30.0°C  (high = +100.0°C, crit = +100.0°C)
Core 19:       +30.0°C  (high = +100.0°C, crit = +100.0°C)
Core 20:       +27.0°C  (high = +100.0°C, crit = +100.0°C)
Core 21:       +27.0°C  (high = +100.0°C, crit = +100.0°C)
Core 22:       +27.0°C  (high = +100.0°C, crit = +100.0°C)
Core 23:       +27.0°C  (high = +100.0°C, crit = +100.0°C)

nvme-pci-0100(NVME硬盘. 970EVO PLUS)
Adapter: PCI adapter
Composite:    +34.9°C  (low  = -273.1°C, high = +84.8°C)
                       (crit = +84.8°C)
Sensor 1:     +34.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +30.9°C  (low  = -273.1°C, high = +65261.8°C)

补充:

使用UBuntu22.04.4系统(内核版本6.5), 硬件为AMD Ryzen 9 3900 处理器 以及 MSI X570 ACE主板, 部署后执行sensors无法正常获取CPU温度信息, 暂不清楚原因以及解决办法. (240322)