Struct GpuCollector

Source

pub struct GpuCollector {
    nvml: Option<Nvml>,
    amd_fdinfo: Option<FdInfoStat>,
}

Expand description

Collects per-GPU metrics from NVIDIA (via NVML) and AMD (via libamdgpu_top).

Both backends load their native libraries at runtime:

NVML via libloading (libnvidia-ml.so) — absent on non-NVIDIA hosts.
libdrm via the libdrm_dynamic_loading feature — absent on non-AMD hosts.

On a CPU-only host collect() returns an empty Vec with no error.

Fields§

§nvml: Option<Nvml>§amd_fdinfo: Option<FdInfoStat>

Per-process fdinfo state for AMD GPU utilization delta tracking. Populated lazily on first AMD host detection.

Implementations§

Source §

pub fn process_gpu_info( &mut self, pids: &[u32], interval: Duration, ) -> (Option<f64>, Option<f64>, Option<u32>)

Return (process_gpu_vram_mib, process_gpu_usage, process_gpu_utilized) for the given PIDs.

pids is the tracked process tree (root + descendants) as u32 values.

NVIDIA: queries NVML running-compute and running-graphics process lists for each device; sums used_gpu_memory for matched PIDs. SM utilization is sourced from nvmlDeviceGetProcessUtilization; the latest sample per PID is taken, summed across all matched PIDs and devices, then divided by 100 to yield fractional GPUs (e.g. 0.5 = half a GPU).

AMD: reads /proc/{pid}/fdinfo for each PID, parses drm-memory-vram, drm-pdev, and drm-engine-gfx from DRM fdinfo entries (Linux kernel >= 5.17). GFX utilization is computed via FdInfoStat delta tracking and normalized to fractional GPUs.

Returns (None, None, None) when no GPU is present on the host. Returns (Some(0.0), Some(0.0), Some(0)) when a GPU is present but the process tree has no allocations.

Source

pub fn all_gpu_process_info( &mut self, interval: Duration, ) -> (Option<f64>, Option<f64>, Option<u32>)

Return (process_gpu_vram_mib, process_gpu_usage, process_gpu_utilized) summed across ALL GPU processes on the host (no PID filter). Used when tracking is not scoped to a specific PID so the full system-wide GPU allocation is reported in the process_ CSV columns.

NVIDIA: sums used_gpu_memory for every running compute and graphics process across all devices; counts each device that has at least one process as “utilized”. SM utilization is summed across the latest sample per PID from nvmlDeviceGetProcessUtilization.

AMD: reads mem_info_vram_used from sysfs for each device (the kernel already provides the system-wide VRAM used value there). Per-process GPU utilization is not yet supported for AMD.

Returns (None, None, None) when no GPU is present on the host.

Source