pub struct GpuCollector {
nvml: Option<Nvml>,
amd_fdinfo: Option<FdInfoStat>,
}Expand description
Collects per-GPU metrics from NVIDIA (via NVML) and AMD (via libamdgpu_top).
Both backends load their native libraries at runtime:
- NVML via
libloading(libnvidia-ml.so) — absent on non-NVIDIA hosts. - libdrm via the
libdrm_dynamic_loadingfeature — absent on non-AMD hosts.
On a CPU-only host collect() returns an empty Vec with no error.
Fields§
§nvml: Option<Nvml>§amd_fdinfo: Option<FdInfoStat>Per-process fdinfo state for AMD GPU utilization delta tracking. Populated lazily on first AMD host detection.
Implementations§
Source§impl GpuCollector
impl GpuCollector
pub fn new() -> Self
pub fn collect(&self) -> Result<Vec<GpuMetrics>, Box<dyn Error>>
Sourcepub fn process_gpu_info(
&mut self,
pids: &[u32],
interval: Duration,
) -> (Option<f64>, Option<f64>, Option<u32>)
pub fn process_gpu_info( &mut self, pids: &[u32], interval: Duration, ) -> (Option<f64>, Option<f64>, Option<u32>)
Return (process_gpu_vram_mib, process_gpu_usage, process_gpu_utilized) for the given PIDs.
pids is the tracked process tree (root + descendants) as u32 values.
NVIDIA: queries NVML running-compute and running-graphics process lists
for each device; sums used_gpu_memory for matched PIDs.
SM utilization is sourced from nvmlDeviceGetProcessUtilization; the
latest sample per PID is taken, summed across all matched PIDs and devices,
then divided by 100 to yield fractional GPUs (e.g. 0.5 = half a GPU).
AMD: reads /proc/{pid}/fdinfo for each PID, parses drm-memory-vram,
drm-pdev, and drm-engine-gfx from DRM fdinfo entries (Linux kernel >= 5.17).
GFX utilization is computed via FdInfoStat delta tracking and normalized
to fractional GPUs.
Returns (None, None, None) when no GPU is present on the host.
Returns (Some(0.0), Some(0.0), Some(0)) when a GPU is present but the
process tree has no allocations.
Sourcepub fn all_gpu_process_info(
&mut self,
interval: Duration,
) -> (Option<f64>, Option<f64>, Option<u32>)
pub fn all_gpu_process_info( &mut self, interval: Duration, ) -> (Option<f64>, Option<f64>, Option<u32>)
Return (process_gpu_vram_mib, process_gpu_usage, process_gpu_utilized) summed
across ALL GPU processes on the host (no PID filter). Used when tracking is not
scoped to a specific PID so the full system-wide GPU allocation is
reported in the process_ CSV columns.
NVIDIA: sums used_gpu_memory for every running compute and graphics
process across all devices; counts each device that has at least one
process as “utilized”.
SM utilization is summed across the latest sample per PID from
nvmlDeviceGetProcessUtilization.
AMD: reads mem_info_vram_used from sysfs for each device (the kernel
already provides the system-wide VRAM used value there).
Per-process GPU utilization is not yet supported for AMD.
Returns (None, None, None) when no GPU is present on the host.