Skip to main content

GpuCollector

Struct GpuCollector 

Source
pub struct GpuCollector {
    nvml: Option<Nvml>,
    amd_fdinfo: Option<FdInfoStat>,
}
Expand description

Collects per-GPU metrics from NVIDIA (via NVML) and AMD (via libamdgpu_top).

Both backends load their native libraries at runtime:

  • NVML via libloading (libnvidia-ml.so) — absent on non-NVIDIA hosts.
  • libdrm via the libdrm_dynamic_loading feature — absent on non-AMD hosts.

On a CPU-only host collect() returns an empty Vec with no error.

Fields§

§nvml: Option<Nvml>§amd_fdinfo: Option<FdInfoStat>

Per-process fdinfo state for AMD GPU utilization delta tracking. Populated lazily on first AMD host detection.

Implementations§

Source§

impl GpuCollector

Source

pub fn new() -> Self

Source

pub fn collect(&self) -> Result<Vec<GpuMetrics>, Box<dyn Error>>

Source

pub fn process_gpu_info( &mut self, pids: &[u32], interval: Duration, ) -> (Option<f64>, Option<f64>, Option<u32>)

Return (process_gpu_vram_mib, process_gpu_usage, process_gpu_utilized) for the given PIDs.

pids is the tracked process tree (root + descendants) as u32 values.

NVIDIA: queries NVML running-compute and running-graphics process lists for each device; sums used_gpu_memory for matched PIDs. SM utilization is sourced from nvmlDeviceGetProcessUtilization; the latest sample per PID is taken, summed across all matched PIDs and devices, then divided by 100 to yield fractional GPUs (e.g. 0.5 = half a GPU).

AMD: reads /proc/{pid}/fdinfo for each PID, parses drm-memory-vram, drm-pdev, and drm-engine-gfx from DRM fdinfo entries (Linux kernel >= 5.17). GFX utilization is computed via FdInfoStat delta tracking and normalized to fractional GPUs.

Returns (None, None, None) when no GPU is present on the host. Returns (Some(0.0), Some(0.0), Some(0)) when a GPU is present but the process tree has no allocations.

Source

pub fn all_gpu_process_info( &mut self, interval: Duration, ) -> (Option<f64>, Option<f64>, Option<u32>)

Return (process_gpu_vram_mib, process_gpu_usage, process_gpu_utilized) summed across ALL GPU processes on the host (no PID filter). Used when tracking is not scoped to a specific PID so the full system-wide GPU allocation is reported in the process_ CSV columns.

NVIDIA: sums used_gpu_memory for every running compute and graphics process across all devices; counts each device that has at least one process as “utilized”. SM utilization is summed across the latest sample per PID from nvmlDeviceGetProcessUtilization.

AMD: reads mem_info_vram_used from sysfs for each device (the kernel already provides the system-wide VRAM used value there). Per-process GPU utilization is not yet supported for AMD.

Returns (None, None, None) when no GPU is present on the host.

Source

fn collect_nvidia(&self, out: &mut Vec<GpuMetrics>)

Source

fn collect_amd(&self, out: &mut Vec<GpuMetrics>)

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.