resource_tracker

resource_tracker #

Resource Tracker package for monitoring resource usage, detecting cloud environments, and more.

Modules:

Name	Description
`cloud_info`	Detect cloud environment (provider, region, instance type) via VM metadata services.
`dummy_workloads`
`helpers`	General helpers.
`keeper`
`nvidia`	Helpers to monitor NVIDIA GPUs.
`report`
`server_info`	Detect server hardware (CPU count, memory amount, disk space, GPU count and VRAM amount) via `procfs` or `psutil`, and `nvidia-smi`.
`tiny_bars`	A tiny, partial, and opinionated implementation of the Handlebars template engine.
`tiny_data_frame`	A very inefficient data-frame implementation for manipulating resource usage data.
`tracker`	Track resource usage of a process and/or the system.
`tracker_procfs`	Helpers to track resource usage via `procfs`.
`tracker_psutil`	Helpers to track resource usage via `psutil`.

Classes:

Name	Description
`ProcessTracker`	Track resource usage of a process and optionally its children.
`ResourceTracker`	Track resource usage of processes and the system in a non-blocking way.
`SystemTracker`	Track system-wide resource usage.

Functions:

Name	Description
`get_cloud_info`	Detect cloud environment and return standardized information on provider, region, and instance type.
`get_server_info`	Collects important information about the Linux server.

get_cloud_info `cached` #

get_cloud_info()

Detect cloud environment and return standardized information on provider, region, and instance type.

Returns:

Type	Description
`dict`	A dictionary containing standardized cloud information: `vendor`: The cloud provider (aws, gcp, azure, hcloud, upcloud), or "unknown" `instance_type`: The instance type/size/flavor, or "unknown" `region`: The region/zone where the instance is running, or "unknown" `discovery_time`: The time taken to discover the cloud environment, in seconds

Source code in resource_tracker/cloud_info.py

@cache
def get_cloud_info() -> dict:
    """
    Detect cloud environment and return standardized information on provider, region, and instance type.

    Returns:
        A dictionary containing standardized cloud information:

            - `vendor`: The cloud provider (aws, gcp, azure, hcloud, upcloud), or "unknown"
            - `instance_type`: The instance type/size/flavor, or "unknown"
            - `region`: The region/zone where the instance is running, or "unknown"
            - `discovery_time`: The time taken to discover the cloud environment, in seconds
    """
    start_time = time()
    check_functions = [
        _check_aws,
        _check_gcp,
        _check_azure,
        _check_hetzner,
        _check_upcloud,
    ]

    # run checks in parallel, return early if any check succeeds
    with ThreadPoolExecutor(max_workers=len(check_functions)) as executor:
        futures = {executor.submit(check_fn): check_fn for check_fn in check_functions}
        pending = set(futures.keys())
        while pending:
            done, pending = wait(pending, return_when=FIRST_COMPLETED)
            for future in done:
                with suppress(Exception):
                    info = future.result()
                    if info:
                        # stop all remaining checks early
                        for f in pending:
                            f.cancel()
                        return info | {"discovery_time": time() - start_time}

    return {
        "vendor": "unknown",
        "instance_type": "unknown",
        "region": "unknown",
        "discovery_time": time() - start_time,
    }

get_server_info #

get_server_info()

Collects important information about the Linux server.

Returns:

Type	Description
`dict`	A dictionary containing server information: `os`: Operating system `vcpus`: Number of virtual CPUs `memory_mb`: Total memory in MB `gpu_count`: Number of GPUs (`0` if not available) `gpu_names`: List of GPU names (`[]` if not available) `gpu_name`: Most common GPU name (`""` if not available) `gpu_memory_mb`: Total VRAM in MB (`0` if not available)

Source code in resource_tracker/server_info.py

def get_server_info() -> dict:
    """
    Collects important information about the Linux server.

    Returns:
        A dictionary containing server information:

            - `os`: Operating system
            - `vcpus`: Number of virtual CPUs
            - `memory_mb`: Total memory in MB
            - `gpu_count`: Number of GPUs (`0` if not available)
            - `gpu_names`: List of GPU names (`[]` if not available)
            - `gpu_name`: Most common GPU name (`""` if not available)
            - `gpu_memory_mb`: Total VRAM in MB (`0` if not available)
    """
    gpu_info = get_gpu_info()
    info = {
        "os": system(),
        "vcpus": cpu_count(),
        "memory_mb": get_total_memory_mb(),
        "gpu_count": gpu_info["count"],
        "gpu_names": gpu_info["gpu_names"],
        "gpu_name": (
            Counter(gpu_info["gpu_names"]).most_common(1)[0][0]
            if gpu_info["gpu_names"]
            else ""
        ),
        "gpu_memory_mb": gpu_info["memory_mb"],
    }
    return info

ProcessTracker #

Track resource usage of a process and optionally its children.

This class monitors system resources like CPU times and usage, memory usage, GPU and VRAM utilization, I/O operations for a given process ID and optionally its child processes.

Data is collected every interval seconds and written to the stdout or output_file (if provided) as CSV. Currently, the following columns are tracked:

timestamp (float): The current timestamp.
pid (int): The monitored process ID.
children (int | None): The current number of child processes.
utime (int): The total user+nice mode CPU time in seconds.
stime (int): The total system mode CPU time in seconds.
cpu_usage (float): The current CPU usage between 0 and number of CPUs.
memory (int): The current memory usage in kB. Implementation depends on the operating system, and it is preferably PSS (Proportional Set Size) on Linux, USS (Unique Set Size) on macOS and Windows, and RSS (Resident Set Size) on Windows.
read_bytes (int): The total number of bytes read from disk.
write_bytes (int): The total number of bytes written to disk.
gpu_usage (float): The current GPU utilization between 0 and GPU count.
gpu_vram (float): The current GPU memory used in MiB.
gpu_utilized (int): The number of GPUs with utilization > 0.

Parameters:

Name	Type	Description	Default
`pid`	`int`	Process ID to track. Defaults to current process ID.	`getpid()`
`start_time`	`float`	Time when to start tracking. Defaults to current time.	`time()`
`interval`	`float`	Sampling interval in seconds. Defaults to 1.	`1`
`children`	`bool`	Whether to track child processes. Defaults to True.	`True`
`autostart`	`bool`	Whether to start tracking immediately. Defaults to True.	`True`
`output_file`	`str`	File to write the output to. Defaults to None, print to stdout.	`None`

Methods:

Name	Description
`__call__`	Dummy method to make this class callable.
`diff_stats`	Calculate stats since last call.
`start_tracking`	Start an infinite loop tracking resource usage of the process until it exits.

Source code in resource_tracker/tracker.py

class ProcessTracker:
    """Track resource usage of a process and optionally its children.

    This class monitors system resources like CPU times and usage, memory usage,
    GPU and VRAM utilization, I/O operations for a given process ID and
    optionally its child processes.

    Data is collected every `interval` seconds and written to the stdout or
    `output_file` (if provided) as CSV. Currently, the following columns are
    tracked:

    - timestamp (float): The current timestamp.
    - pid (int): The monitored process ID.
    - children (int | None): The current number of child processes.
    - utime (int): The total user+nice mode CPU time in seconds.
    - stime (int): The total system mode CPU time in seconds.
    - cpu_usage (float): The current CPU usage between 0 and number of CPUs.
    - memory (int): The current memory usage in kB. Implementation depends on the
      operating system, and it is preferably PSS (Proportional Set Size) on Linux,
      USS (Unique Set Size) on macOS and Windows, and RSS (Resident Set Size) on
      Windows.
    - read_bytes (int): The total number of bytes read from disk.
    - write_bytes (int): The total number of bytes written to disk.
    - gpu_usage (float): The current GPU utilization between 0 and GPU count.
    - gpu_vram (float): The current GPU memory used in MiB.
    - gpu_utilized (int): The number of GPUs with utilization > 0.

    Args:
        pid (int, optional): Process ID to track. Defaults to current process ID.
        start_time: Time when to start tracking. Defaults to current time.
        interval (float, optional): Sampling interval in seconds. Defaults to 1.
        children (bool, optional): Whether to track child processes. Defaults to True.
        autostart (bool, optional): Whether to start tracking immediately. Defaults to True.
        output_file (str, optional): File to write the output to. Defaults to None, print to stdout.
    """

    def __init__(
        self,
        pid: int = getpid(),
        start_time: float = time(),
        interval: float = 1,
        children: bool = True,
        autostart: bool = True,
        output_file: str = None,
    ):
        self.get_process_stats, _ = get_tracker_implementation()

        self.pid = pid
        self.status = "running"
        self.interval = interval
        self.cycle = 0
        self.children = children
        self.start_time = start_time

        # initial data collection so that we can use that as a baseline when diffing after the first interval
        self.stats = self.get_process_stats(pid, children)

        if autostart:
            # wait for the start time to be reached
            if start_time > time():
                sleep(start_time - time())
            # we can now start. 1st interval used to collect baseline
            self.start_tracking(output_file)

    def __call__(self):
        """Dummy method to make this class callable."""
        pass

    def diff_stats(self):
        """Calculate stats since last call."""
        last_stats = self.stats
        self.stats = self.get_process_stats(self.pid, self.children)
        self.cycle += 1

        return {
            "timestamp": round(self.stats["timestamp"], 3),
            "pid": self.pid,
            "children": self.stats["children"],
            "utime": max(0, round(self.stats["utime"] - last_stats["utime"], 6)),
            "stime": max(0, round(self.stats["stime"] - last_stats["stime"], 6)),
            "cpu_usage": round(
                max(
                    0,
                    (
                        (self.stats["utime"] + self.stats["stime"])
                        - (last_stats["utime"] + last_stats["stime"])
                    )
                    / (self.stats["timestamp"] - last_stats["timestamp"]),
                ),
                4,
            ),
            "memory": self.stats["memory"],
            "read_bytes": max(0, self.stats["read_bytes"] - last_stats["read_bytes"]),
            "write_bytes": max(
                0, self.stats["write_bytes"] - last_stats["write_bytes"]
            ),
            "gpu_usage": self.stats["gpu_usage"],
            "gpu_vram": self.stats["gpu_vram"],
            "gpu_utilized": self.stats["gpu_utilized"],
        }

    def start_tracking(
        self, output_file: Optional[str] = None, print_header: bool = True
    ):
        """Start an infinite loop tracking resource usage of the process until it exits.

        A CSV line is written every `interval` seconds.

        Args:
            output_file: File to write the output to. Defaults to None, printing to stdout.
            print_header: Whether to print the header of the CSV. Defaults to True.
        """
        file_handle = open(output_file, "wb") if output_file else stdout.buffer
        try:
            while True:
                current_stats = self.diff_stats()
                if current_stats["memory"] == 0:
                    # the process has exited
                    self.status = "exited"
                    break
                # don't print values yet, we collect data for the 1st baseline
                if self.cycle == 1:
                    if print_header:
                        file_handle.write(
                            render_csv_row(
                                current_stats.keys(), quoting=QUOTE_NONNUMERIC
                            )
                        )
                else:
                    file_handle.write(
                        render_csv_row(current_stats.values(), quoting=QUOTE_NONNUMERIC)
                    )
                if output_file:
                    file_handle.flush()
                # sleep until the next interval
                sleep(max(0, self.start_time + self.interval * self.cycle - time()))
        finally:
            if output_file and not file_handle.closed:
                file_handle.close()

call #

__call__()

Dummy method to make this class callable.

Source code in resource_tracker/tracker.py

def __call__(self):
    """Dummy method to make this class callable."""
    pass

diff_stats #

diff_stats()

Calculate stats since last call.

Source code in resource_tracker/tracker.py

def diff_stats(self):
    """Calculate stats since last call."""
    last_stats = self.stats
    self.stats = self.get_process_stats(self.pid, self.children)
    self.cycle += 1

    return {
        "timestamp": round(self.stats["timestamp"], 3),
        "pid": self.pid,
        "children": self.stats["children"],
        "utime": max(0, round(self.stats["utime"] - last_stats["utime"], 6)),
        "stime": max(0, round(self.stats["stime"] - last_stats["stime"], 6)),
        "cpu_usage": round(
            max(
                0,
                (
                    (self.stats["utime"] + self.stats["stime"])
                    - (last_stats["utime"] + last_stats["stime"])
                )
                / (self.stats["timestamp"] - last_stats["timestamp"]),
            ),
            4,
        ),
        "memory": self.stats["memory"],
        "read_bytes": max(0, self.stats["read_bytes"] - last_stats["read_bytes"]),
        "write_bytes": max(
            0, self.stats["write_bytes"] - last_stats["write_bytes"]
        ),
        "gpu_usage": self.stats["gpu_usage"],
        "gpu_vram": self.stats["gpu_vram"],
        "gpu_utilized": self.stats["gpu_utilized"],
    }

start_tracking #

start_tracking(output_file=None, print_header=True)

Start an infinite loop tracking resource usage of the process until it exits.

A CSV line is written every interval seconds.

Parameters:

Name	Type	Description	Default
`output_file`	`Optional[str]`	File to write the output to. Defaults to None, printing to stdout.	`None`
`print_header`	`bool`	Whether to print the header of the CSV. Defaults to True.	`True`

Source code in resource_tracker/tracker.py

def start_tracking(
    self, output_file: Optional[str] = None, print_header: bool = True
):
    """Start an infinite loop tracking resource usage of the process until it exits.

    A CSV line is written every `interval` seconds.

    Args:
        output_file: File to write the output to. Defaults to None, printing to stdout.
        print_header: Whether to print the header of the CSV. Defaults to True.
    """
    file_handle = open(output_file, "wb") if output_file else stdout.buffer
    try:
        while True:
            current_stats = self.diff_stats()
            if current_stats["memory"] == 0:
                # the process has exited
                self.status = "exited"
                break
            # don't print values yet, we collect data for the 1st baseline
            if self.cycle == 1:
                if print_header:
                    file_handle.write(
                        render_csv_row(
                            current_stats.keys(), quoting=QUOTE_NONNUMERIC
                        )
                    )
            else:
                file_handle.write(
                    render_csv_row(current_stats.values(), quoting=QUOTE_NONNUMERIC)
                )
            if output_file:
                file_handle.flush()
            # sleep until the next interval
            sleep(max(0, self.start_time + self.interval * self.cycle - time()))
    finally:
        if output_file and not file_handle.closed:
            file_handle.close()

ResourceTracker #

Track resource usage of processes and the system in a non-blocking way.

Start a resource_tracker.ProcessTracker and/or a resource_tracker.SystemTracker in the background as spawned or forked process(es), and make the collected data available easily in the main process via the process_metrics and system_metrics properties.

Parameters:

Name	Type	Description	Default
`pid`	`int`	Process ID to track. Defaults to current process ID.	`getpid()`
`children`	`bool`	Whether to track child processes. Defaults to True.	`True`
`interval`	`float`	Sampling interval in seconds. Defaults to 1.	`1`
`method`	`Optional[str]`	Multiprocessing method. Defaults to None, which tries to fork on Linux and macOS, and spawn on Windows.	`None`
`autostart`	`bool`	Whether to start tracking immediately. Defaults to True.	`True`
`track_processes`	`bool`	Whether to track resource usage at the process level. Defaults to True.	`True`
`track_system`	`bool`	Whether to track system-wide resource usage. Defaults to True.	`True`
`discover_server`	`bool`	Whether to discover the server specs in the background at startup. Defaults to True.	`True`
`discover_cloud`	`bool`	Whether to discover the cloud environment in the background at startup. Defaults to True.	`True`

Example:

>>> from resource_tracker.dummy_workloads import cpu_single, cpu_multi
>>> tracker = ResourceTracker()
>>> cpu_single()
>>> tracker.recommend_resources()  # doctest: +SKIP
{'cpu': 1, 'memory': 128, 'gpu': 0, 'vram': 0}
>>> tracker = ResourceTracker()
>>> while tracker.n_samples == 0:
...     cpu_multi(duration=0.25, ncores=2)
>>> tracker.recommend_resources()  # multiprocessing is not enough efficient on Windows/macOS  # doctest: +SKIP
{'cpu': 2, 'memory': 128, 'gpu': 0, 'vram': 0}

Methods:

Name	Description
`start`	Start the selected resource trackers in the background as subprocess(es).
`stop`	Stop the previously started resource trackers' background processes.
`snapshot`	Collect the current state of the resource tracker.
`from_snapshot`	Create a ResourceTracker from a snapshot.
`dumps`	Serialize the resource tracker to a JSON string.
`loads`	Deserialize the resource tracker from a JSON string.
`dump`	Serialize the resource tracker to a gzipped JSON file.
`load`	Deserialize the resource tracker from a gzipped JSON file.
`get_combined_metrics`	Collected data both from the resource_tracker.ProcessTracker and resource_tracker.SystemTracker.
`stats`	Collect statistics from the resource tracker.
`wait_for_samples`	Wait for at least one sample to be collected.
`recommend_resources`	Recommend optimal resource allocation based on the measured resource tracker data.
`recommend_server`	Recommend the cheapest cloud server matching the recommended resources.

Attributes:

Name	Type	Description
`n_samples`	`int`	Number of samples collected by the resource tracker.
`server_info`	`dict`	High-level server info.
`cloud_info`	`dict`	High-level cloud info.
`process_metrics`	`TinyDataFrame`	Collected data from resource_tracker.ProcessTracker.
`system_metrics`	`TinyDataFrame`	Collected data from resource_tracker.SystemTracker.
`running`	`bool`	Check if the resource tracker is running.

Source code in resource_tracker/tracker.py

class ResourceTracker:
    """Track resource usage of processes and the system in a non-blocking way.

    Start a [resource_tracker.ProcessTracker][] and/or a [resource_tracker.SystemTracker][] in the background as spawned
    or forked process(es), and make the collected data available easily in the
    main process via the `process_metrics` and `system_metrics` properties.

    Args:
        pid: Process ID to track. Defaults to current process ID.
        children: Whether to track child processes. Defaults to True.
        interval: Sampling interval in seconds. Defaults to 1.
        method: Multiprocessing method. Defaults to None, which tries to fork on
            Linux and macOS, and spawn on Windows.
        autostart: Whether to start tracking immediately. Defaults to True.
        track_processes: Whether to track resource usage at the process level.
            Defaults to True.
        track_system: Whether to track system-wide resource usage. Defaults to True.
        discover_server: Whether to discover the server specs in the background at
            startup. Defaults to True.
        discover_cloud: Whether to discover the cloud environment in the background
            at startup. Defaults to True.

    Example:

        >>> from resource_tracker.dummy_workloads import cpu_single, cpu_multi
        >>> tracker = ResourceTracker()
        >>> cpu_single()
        >>> tracker.recommend_resources()  # doctest: +SKIP
        {'cpu': 1, 'memory': 128, 'gpu': 0, 'vram': 0}
        >>> tracker = ResourceTracker()
        >>> while tracker.n_samples == 0:
        ...     cpu_multi(duration=0.25, ncores=2)
        >>> tracker.recommend_resources()  # multiprocessing is not enough efficient on Windows/macOS  # doctest: +SKIP
        {'cpu': 2, 'memory': 128, 'gpu': 0, 'vram': 0}
    """

    _server_info: Optional[dict] = None
    _cloud_info: Optional[dict] = None

    def __init__(
        self,
        pid: int = getpid(),
        children: bool = True,
        interval: float = 1,
        method: Optional[str] = None,
        autostart: bool = True,
        track_processes: bool = True,
        track_system: bool = True,
        discover_server: bool = True,
        discover_cloud: bool = True,
    ):
        self.pid = pid
        self.children = children
        self.interval = interval
        self.method = method
        self.autostart = autostart
        self.trackers = []
        if track_processes:
            self.trackers.append("process_tracker")
        if track_system:
            self.trackers.append("system_tracker")
        self.discover_server = discover_server
        self.discover_cloud = discover_cloud

        if platform != "linux" and not is_psutil_available():
            raise ImportError(
                "psutil is required for resource tracking on non-Linux platforms"
            )

        if method is None:
            # try to fork when possible due to leaked semaphores on older Python versions
            # see e.g. https://github.com/python/cpython/issues/90549
            if platform in ["linux", "darwin"]:
                self.mpc = get_context("fork")
            else:
                self.mpc = get_context("spawn")
        else:
            self.mpc = get_context(method)

        # error details from subprocesses
        self.error_queue = SimpleQueue()

        # create temporary CSV file(s) for the tracker(s), and record only the file path(s)
        # to be passed later to subprocess(es) avoiding pickling the file object(s)
        for tracker_name in self.trackers:
            temp_file = NamedTemporaryFile(delete=False)
            setattr(self, f"{tracker_name}_filepath", temp_file.name)
            temp_file.close()
        # make sure to cleanup the temp file(s)
        finalize(
            self,
            cleanup_files,
            [
                getattr(self, f"{tracker_name}_filepath")
                for tracker_name in self.trackers
            ],
        )

        if autostart:
            self.start()

    def start(self):
        """Start the selected resource trackers in the background as subprocess(es)."""
        if self.running:
            raise RuntimeError("Resource tracker already running, cannot start again.")
        if hasattr(self, "stop_time"):
            raise RuntimeError(
                "Resource tracker already stopped. Create a new instance instead of trying to restart it."
            )

        self.start_time = time()
        self.stop_time = None
        # round to the nearest interval in the future
        self.start_time = ceil(self.start_time / self.interval) * self.interval
        # leave at least 50 ms for trackers to start
        if self.start_time - time() < 0.05:
            self.start_time += self.interval

        if "process_tracker" in self.trackers:
            self.process_tracker_process = self.mpc.Process(
                target=_run_tracker,
                args=("process", self.error_queue),
                kwargs={
                    "pid": self.pid,
                    "start_time": self.start_time,
                    "interval": self.interval,
                    "children": self.children,
                    "output_file": self.process_tracker_filepath,
                },
                daemon=True,
            )
            self.process_tracker_process.start()

        if "system_tracker" in self.trackers:
            self.system_tracker_process = self.mpc.Process(
                target=_run_tracker,
                args=("system", self.error_queue),
                kwargs={
                    "start_time": self.start_time,
                    "interval": self.interval,
                    "output_file": self.system_tracker_filepath,
                },
                daemon=True,
            )
            self.system_tracker_process.start()

        def collect_server_info():
            """Collect server info to be run in a background thread."""
            try:
                self._server_info = get_server_info()
            except Exception as e:
                logger.warning(f"Error fetching server info: {e}")

        def collect_cloud_info():
            """Collect cloud info to be run in a background thread."""
            try:
                self._cloud_info = get_cloud_info()
            except Exception as e:
                logger.warning(f"Error fetching cloud info: {e}")

        if self.discover_server:
            server_thread = Thread(target=collect_server_info, daemon=True)
            server_thread.start()
        if self.discover_cloud:
            cloud_thread = Thread(target=collect_cloud_info, daemon=True)
            cloud_thread.start()

        # make sure to cleanup the started subprocess(es)
        finalize(
            self,
            cleanup_processes,
            [
                getattr(self, f"{tracker_name}_process")
                for tracker_name in self.trackers
            ],
        )

    def stop(self):
        """Stop the previously started resource trackers' background processes."""
        self.stop_time = time()
        # check for errors in the subprocesses
        if not self.error_queue.empty():
            error_data = self.error_queue.get()
            logger.warning(
                "Resource tracker subprocess failed!\n"
                f"Error type: {error_data['name']} (from module {error_data['module']})\n"
                f"Error message: {error_data['message']}\n"
                f"Original traceback:\n{error_data['traceback']}"
            )
        # terminate tracker processes
        for tracker_name in self.trackers:
            process_attr = f"{tracker_name}_process"
            if hasattr(self, process_attr):
                cleanup_processes([getattr(self, process_attr)])
        self.error_queue.close()
        logger.debug(
            "Resource tracker stopped after %s seconds, logging %d process-level and %d system-wide records",
            self.stop_time - self.start_time,
            len(self.process_metrics),
            len(self.system_metrics),
        )

    @property
    def n_samples(self) -> int:
        """Number of samples collected by the resource tracker."""
        return min(len(self.process_metrics), len(self.system_metrics))

    @property
    def server_info(self) -> dict:
        """High-level server info.

        Collected data from [resource_tracker.get_server_info][] plus a guess
        for the allocation type of the server: if it's dedicated to the tracked
        process(es) or shared with other processes. The guess is based on the
        [resource_tracker.column_maps.SERVER_ALLOCATION_CHECKS][] checks.
        """
        server_info = self._server_info
        if server_info:
            server_info["allocation"] = None
        if self.n_samples > 0:
            for check in SERVER_ALLOCATION_CHECKS:
                try:
                    system_val = mean(self.system_metrics[check["system_column"]])
                    task_val = mean(self.process_metrics[check["process_column"]])
                    if (system_val > task_val * check["percent"]) or (
                        system_val > task_val + check["absolute"]
                    ):
                        server_info["allocation"] = "shared"
                        break
                except Exception as e:
                    logger.warning(
                        f"Error calculating server allocation based on {check['system_column']} and {check['process_column']}: {e}"
                    )
                    server_info["allocation"] = "unknown"
                    break
            server_info["allocation"] = server_info.get("allocation", "dedicated")
        return server_info

    @property
    def cloud_info(self) -> dict:
        """High-level cloud info.

        Collected data from [resource_tracker.get_cloud_info][].
        """
        return self._cloud_info

    @property
    def process_metrics(self) -> TinyDataFrame:
        """Collected data from [resource_tracker.ProcessTracker][].

        Returns:
            A [resource_tracker.tiny_data_frame.TinyDataFrame][] object containing the collected data or an empty list if the [resource_tracker.ProcessTracker][] is not running.
        """
        try:
            return TinyDataFrame(
                csv_file_path=self.process_tracker_filepath,
                retries=2,
                retry_delay=min(0.05, self.interval / 10),
            )
        except Exception as e:
            logger.warning(f"Failed to read process metrics: {e}")
            return TinyDataFrame(data=[])

    @property
    def system_metrics(self) -> TinyDataFrame:
        """Collected data from [resource_tracker.SystemTracker][].

        Returns:
            A [resource_tracker.tiny_data_frame.TinyDataFrame][] object containing the collected data or an empty list if the [resource_tracker.SystemTracker][] is not running.
        """
        try:
            return TinyDataFrame(
                csv_file_path=self.system_tracker_filepath,
                retries=2,
                retry_delay=min(0.05, self.interval / 10),
            )
        except Exception as e:
            logger.warning(f"Failed to read system metrics: {e}")
            return TinyDataFrame(data=[])

    def snapshot(self) -> dict:
        """Collect the current state of the resource tracker.

        Returns:
            A dictionary containing the current state of the resource tracker.
        """
        return {
            "metadata": {
                "version": 1,
                "resource_tracker": {
                    "version": __version__,
                    "implementation": "psutil" if is_psutil_available() else "procfs",
                },
                "pid": self.pid,
                "children": self.children,
                "interval": self.interval,
                "method": self.method,
                "autostart": self.autostart,
                "track_processes": "process_tracker" in self.trackers,
                "track_system": "system_tracker" in self.trackers,
                "discover_server": self.discover_server,
                "discover_cloud": self.discover_cloud,
                "start_time": self.start_time,
                "stop_time": self.stop_time or time(),
                "duration": round(
                    (self.stop_time or time()) - self.start_time + self.interval, 2
                ),
            },
            "server_info": self.server_info,
            "cloud_info": self.cloud_info,
            "process_metrics": self.process_metrics.to_dict(),
            "system_metrics": self.system_metrics.to_dict(),
        }

    @classmethod
    def from_snapshot(cls, snapshot: dict):
        """Create a ResourceTracker from a snapshot.

        Args:
            snapshot: A dictionary containing the current state of the resource tracker, created by [resource_tracker.ResourceTracker.snapshot][].
        """
        tracker = cls(
            pid=snapshot["metadata"]["pid"],
            children=snapshot["metadata"]["children"],
            interval=snapshot["metadata"]["interval"],
            method=snapshot["metadata"]["method"],
            autostart=False,
            track_processes=snapshot["metadata"]["track_processes"],
            track_system=snapshot["metadata"]["track_system"],
            discover_server=snapshot["metadata"]["discover_server"],
            discover_cloud=snapshot["metadata"]["discover_cloud"],
        )
        tracker.start_time = snapshot["metadata"]["start_time"]
        tracker.stop_time = snapshot["metadata"]["stop_time"]
        tracker._server_info = snapshot["server_info"]
        tracker._cloud_info = snapshot["cloud_info"]
        TinyDataFrame(data=snapshot["process_metrics"]).to_csv(
            tracker.process_tracker_filepath
        )
        TinyDataFrame(data=snapshot["system_metrics"]).to_csv(
            tracker.system_tracker_filepath
        )
        return tracker

    def dumps(self) -> str:
        """Serialize the resource tracker to a JSON string.

        Returns:
            A JSON string containing the current state of the resource tracker.
        """
        return json_dumps(self.snapshot())

    @classmethod
    def loads(cls, s: str):
        """Deserialize the resource tracker from a JSON string.

        Args:
            s: The JSON string to deserialize the resource tracker from.
        """
        return cls.from_snapshot(json_loads(s))

    def dump(self, file: str):
        """Serialize the resource tracker to a gzipped JSON file.

        Args:
            file: The path to the file to write the serialized resource tracker to.
        """
        with gzip_open(file, "wb") as f:
            f.write(self.dumps().encode())

    @classmethod
    def load(cls, file: str):
        """Deserialize the resource tracker from a gzipped JSON file.

        Args:
            file: The path to the file to read the serialized resource tracker from.
        """
        with gzip_open(file, "rb") as f:
            return cls.loads(f.read().decode())

    def get_combined_metrics(
        self,
        bytes: bool = False,
        human_names: bool = False,
        system_prefix: Optional[str] = None,
        process_prefix: Optional[str] = None,
    ) -> TinyDataFrame:
        """Collected data both from the [resource_tracker.ProcessTracker][] and [resource_tracker.SystemTracker][].

        This is effectively binding the two dataframes together by timestamp,
        and adding a prefix to the column names to distinguish between the system and process metrics.

        Args:
            bytes: Whether to convert all metrics (e.g. memory, VRAM, disk usage) to bytes. Defaults to False, reporting as documented at [resource_tracker.ProcessTracker][] and [resource_tracker.SystemTracker][] (kB, MiB, or GiB).
            human_names: Whether to rename the columns to use human-friendly names. Defaults to False, reporting as documented at [resource_tracker.ProcessTracker][] and [resource_tracker.SystemTracker][] with prefixes.
            system_prefix: Prefix to add to the system-level column names. Defaults to "system_" or "System " based on the value of `human_names`.
            process_prefix: Prefix to add to the process-level column names. Defaults to "process_" or "Process " based on the value of `human_names`.

        Returns:
            A [resource_tracker.tiny_data_frame.TinyDataFrame][] object containing the combined data or an empty list if tracker(s) not running.
        """
        try:
            process_metrics = self.process_metrics
            system_metrics = self.system_metrics

            # ensure both have the same length
            if len(process_metrics) > len(system_metrics):
                process_metrics = process_metrics[: len(system_metrics)]
            elif len(system_metrics) > len(process_metrics):
                system_metrics = system_metrics[: len(process_metrics)]

            # nothing to report on
            if len(process_metrics) == 0:
                return TinyDataFrame(data=[])

            if bytes:
                for col, factor in BYTE_MAPPING.items():
                    for metrics in (system_metrics, process_metrics):
                        if col in metrics.columns:
                            metrics[col] = [v * factor for v in metrics[col]]

            if system_prefix is None:
                system_prefix = "system_" if not human_names else "System "
            if process_prefix is None:
                process_prefix = "process_" if not human_names else "Process "

            # cbind the two dataframes with column name prefixes and optional human-friendly names
            combined = system_metrics.rename(
                columns={
                    n: (
                        (system_prefix if n != "timestamp" else "")
                        + (n if not human_names else HUMAN_NAMES_MAPPING.get(n, n))
                    )
                    for n in system_metrics.columns
                }
            )
            for col in process_metrics.columns[1:]:
                combined[
                    process_prefix
                    + (col if not human_names else HUMAN_NAMES_MAPPING.get(col, col))
                ] = process_metrics[col]

            return combined
        except Exception as e:
            with suppress(Exception):
                logger.warning(
                    f"Kept {len(process_metrics) if 'process_metrics' in locals() else 'unknown'} records of process metrics out of {len(self.process_metrics)} collected records ({self.process_metrics.columns}), "
                    f"and {len(system_metrics) if 'system_metrics' in locals() else 'unknown'} records of system metrics out of {len(self.system_metrics)} collected records ({self.system_metrics.columns}), "
                    f"but creating the combined metrics dataframe failed with error: {e}"
                )
                logger.warning(f"Process metrics: {self.process_metrics.to_dict()}")
                logger.warning(f"System metrics: {self.system_metrics.to_dict()}")
            raise

    def stats(
        self,
        specs: List[StatSpec] = [
            StatSpec(column="process_cpu_usage", agg=mean, round=2),
            StatSpec(column="process_cpu_usage", agg=max, round=2),
            StatSpec(column="process_memory", agg=mean, round=2),
            StatSpec(column="process_memory", agg=max, round=2),
            StatSpec(column="process_gpu_usage", agg=mean, round=2),
            StatSpec(column="process_gpu_usage", agg=max, round=2),
            StatSpec(column="process_gpu_vram", agg=mean, round=2),
            StatSpec(column="process_gpu_vram", agg=max, round=2),
            StatSpec(column="process_gpu_utilized", agg=mean, round=2),
            StatSpec(column="process_gpu_utilized", agg=max, round=2),
            StatSpec(column="system_disk_space_used_gb", agg=max, round=2),
            StatSpec(column="system_net_recv_bytes", agg=sum),
            StatSpec(column="system_net_sent_bytes", agg=sum),
            StatSpec(
                column="timestamp", agg=lambda x: max(x) - min(x), agg_name="duration"
            ),
        ],
    ) -> dict:
        """Collect statistics from the resource tracker.

        Args:
            specs: A list of [resource_tracker.tiny_data_frame.StatSpec][] objects specifying the statistics to collect.

        Returns:
            A dictionary containing the collected statistics.
        """
        if self.n_samples > 0:
            stats = self.get_combined_metrics().stats(specs)
            stats["timestamp"]["duration"] += self.interval
            return stats
        else:
            raise RuntimeError("No metrics collected (yet)")

    @property
    def running(self) -> bool:
        """Check if the resource tracker is running.

        Returns:
            True if the resource tracker is running, False if already stopped.
        """
        return hasattr(self, "stop_time") and self.stop_time is None

    def wait_for_samples(self, n: int = 1, timeout: float = 5):
        """Wait for at least one sample to be collected.

        Args:
            n: The minimum number of samples to collect. Defaults to 1.
            timeout: The maximum time to wait for a sample. Defaults to 5 seconds.
        """
        if self.running:
            while self.n_samples < n:
                sleep(self.interval / 10)
                if time() - self.start_time > timeout:
                    raise RuntimeError(
                        f"Timed out waiting for resource tracker to collect {n} samples"
                    )
        else:
            if self.n_samples < n:
                raise RuntimeError(
                    f"Resource tracker has been already stopped with {self.n_samples} sample(s), "
                    f"cannot wait to collect the requested {n} sample(s)."
                )

    def recommend_resources(self, historical_stats: List[dict] = []) -> dict:
        """Recommend optimal resource allocation based on the measured resource tracker data.

        The recommended resources are based on the following rules:

        - target average CPU usage of the process(es)
        - target maximum memory usage of the process(es) with a 20% buffer
        - target maximum number of GPUs used by the process(es)
        - target maximum VRAM usage of the process(es) with a 20% buffer

        Args:
            historical_stats: Optional list of historical statistics (as returned by [resource_tracker.ResourceTracker.stats][])
                              to consider when making recommendations. These will be combined with the current stats.

        Returns:
            A dictionary containing the recommended resources (cpu, memory, gpu, vram).
        """
        self.wait_for_samples(n=1, timeout=self.interval * 5)

        current_stats = self.stats()
        if historical_stats:
            stats = aggregate_stats([current_stats] + historical_stats)
        else:
            stats = current_stats

        rec = {}
        # target average CPU usage
        rec["cpu"] = max(1, round(stats["process_cpu_usage"]["mean"]))
        # target maximum memory usage (kB->MB) with a 20% buffer
        rec["memory"] = round_memory(mb=stats["process_memory"]["max"] * 1.2 / 1024)
        # target maximum GPU number of GPUs used
        rec["gpu"] = (
            max(1, round(stats["process_gpu_usage"]["max"]))
            if stats["process_gpu_usage"]["mean"] > 0
            else 0
        )
        # target maximum VRAM usage (MiB) with a 20% buffer
        rec["vram"] = (
            round_memory(mb=stats["process_gpu_vram"]["max"] * 1.2)
            if stats["process_gpu_vram"]["max"] > 0
            else 0
        )
        return rec

    def recommend_server(self, **kwargs) -> dict:
        """Recommend the cheapest cloud server matching the recommended resources.

        Args:
            **kwargs: Additional filtering arguments (e.g. vendor_id or compliance_framework_id) to pass to the Spare Cores Keeper API.

        Returns:
            A dictionary containing the recommended cloud server. Response format is described at <https://keeper.sparecores.net/redoc#tag/Query-Resources/operation/search_servers_servers_get>.
        """
        historical_stats = kwargs.pop("historical_stats", [])
        rec = self.recommend_resources(historical_stats=historical_stats)
        return get_recommended_cloud_servers(**rec, **kwargs, n=1)[0]

    def report(
        self,
        integration: Literal["standalone", "metaflow"] = "standalone",
        historical_stats: List[dict] = [],
        status_failed: bool = False,
    ) -> Report:
        self.wait_for_samples(n=1, timeout=self.interval * 5)
        duration = (self.stop_time or time()) - self.start_time + self.interval

        current_stats = self.stats()
        if historical_stats:
            combined_stats = aggregate_stats([current_stats] + historical_stats)
        else:
            combined_stats = current_stats

        ctx = {
            "files": _read_report_template_files(),
            "server_info": self.server_info,
            "cloud_info": self.cloud_info,
            "process_metrics": self.process_metrics,
            "system_metrics": self.system_metrics,
            "stats": current_stats,
            "historical_stats": historical_stats,
            "combined_stats": combined_stats,
            "recommended_resources": self.recommend_resources(
                historical_stats=historical_stats
            ),
            "recommended_server": self.recommend_server(
                historical_stats=historical_stats
            ),
            "resource_tracker": {
                "version": __version__,
                "implementation": "psutil" if is_psutil_available() else "procfs",
                "integration": integration,
                "integration_is": {
                    "metaflow": integration == "metaflow",
                    "standalone": integration == "standalone",
                },
                "duration": duration,
                "start_time": self.start_time,
                "stop_time": self.stop_time,
                "stopped": self.stop_time is not None,
                "report_time": time(),
            },
            "status_failed": status_failed,
            "csv": {},
        }

        # comma-separated values
        joined = self.get_combined_metrics(bytes=True, human_names=True)
        for name, columns in REPORT_CSV_MAPPING.items():
            csv_data = joined[columns]
            # convert to JS milliseconds
            csv_data["Timestamp"] = [t * 1000 for t in csv_data["Timestamp"]]
            ctx["csv"][name] = csv_data.to_csv(quote_strings=False)

        # lookup instance prices
        rec_server_cost = ctx["recommended_server"]["min_price_ondemand"]
        rec_run_cost = rec_server_cost / 60 / 60 * duration
        ctx["recommended_server"]["best_ondemand_price_duration"] = rec_run_cost
        if ctx["cloud_info"] and ctx["cloud_info"]["instance_type"] != "unknown":
            current_server_cost = get_instance_price(
                ctx["cloud_info"]["vendor"],
                ctx["cloud_info"]["region"],
                ctx["cloud_info"]["instance_type"],
            )
            if current_server_cost:
                current_run_cost = round(current_server_cost / 60 / 60 * duration, 6)
                ctx["cloud_info"]["run_costs"] = current_run_cost
                ctx["recommended_server"]["cost_savings"] = {
                    "percent": round(
                        (current_run_cost - rec_run_cost) / current_run_cost * 100, 2
                    ),
                    "amount": round(current_run_cost - rec_run_cost, 6),
                }

        html_template_path = path.join(
            path.dirname(__file__), "report_template", "report.html"
        )
        with open(html_template_path) as f:
            html_template = f.read()
        html = render_template(html_template, ctx)
        return Report(html)

start #

start()

Start the selected resource trackers in the background as subprocess(es).

Source code in resource_tracker/tracker.py

def start(self):
    """Start the selected resource trackers in the background as subprocess(es)."""
    if self.running:
        raise RuntimeError("Resource tracker already running, cannot start again.")
    if hasattr(self, "stop_time"):
        raise RuntimeError(
            "Resource tracker already stopped. Create a new instance instead of trying to restart it."
        )

    self.start_time = time()
    self.stop_time = None
    # round to the nearest interval in the future
    self.start_time = ceil(self.start_time / self.interval) * self.interval
    # leave at least 50 ms for trackers to start
    if self.start_time - time() < 0.05:
        self.start_time += self.interval

    if "process_tracker" in self.trackers:
        self.process_tracker_process = self.mpc.Process(
            target=_run_tracker,
            args=("process", self.error_queue),
            kwargs={
                "pid": self.pid,
                "start_time": self.start_time,
                "interval": self.interval,
                "children": self.children,
                "output_file": self.process_tracker_filepath,
            },
            daemon=True,
        )
        self.process_tracker_process.start()

    if "system_tracker" in self.trackers:
        self.system_tracker_process = self.mpc.Process(
            target=_run_tracker,
            args=("system", self.error_queue),
            kwargs={
                "start_time": self.start_time,
                "interval": self.interval,
                "output_file": self.system_tracker_filepath,
            },
            daemon=True,
        )
        self.system_tracker_process.start()

    def collect_server_info():
        """Collect server info to be run in a background thread."""
        try:
            self._server_info = get_server_info()
        except Exception as e:
            logger.warning(f"Error fetching server info: {e}")

    def collect_cloud_info():
        """Collect cloud info to be run in a background thread."""
        try:
            self._cloud_info = get_cloud_info()
        except Exception as e:
            logger.warning(f"Error fetching cloud info: {e}")

    if self.discover_server:
        server_thread = Thread(target=collect_server_info, daemon=True)
        server_thread.start()
    if self.discover_cloud:
        cloud_thread = Thread(target=collect_cloud_info, daemon=True)
        cloud_thread.start()

    # make sure to cleanup the started subprocess(es)
    finalize(
        self,
        cleanup_processes,
        [
            getattr(self, f"{tracker_name}_process")
            for tracker_name in self.trackers
        ],
    )

stop #

stop()

Stop the previously started resource trackers' background processes.

Source code in resource_tracker/tracker.py

def stop(self):
    """Stop the previously started resource trackers' background processes."""
    self.stop_time = time()
    # check for errors in the subprocesses
    if not self.error_queue.empty():
        error_data = self.error_queue.get()
        logger.warning(
            "Resource tracker subprocess failed!\n"
            f"Error type: {error_data['name']} (from module {error_data['module']})\n"
            f"Error message: {error_data['message']}\n"
            f"Original traceback:\n{error_data['traceback']}"
        )
    # terminate tracker processes
    for tracker_name in self.trackers:
        process_attr = f"{tracker_name}_process"
        if hasattr(self, process_attr):
            cleanup_processes([getattr(self, process_attr)])
    self.error_queue.close()
    logger.debug(
        "Resource tracker stopped after %s seconds, logging %d process-level and %d system-wide records",
        self.stop_time - self.start_time,
        len(self.process_metrics),
        len(self.system_metrics),
    )

n_samples `property` #

n_samples

Number of samples collected by the resource tracker.

server_info `property` #

server_info

High-level server info.

Collected data from resource_tracker.get_server_info plus a guess for the allocation type of the server: if it's dedicated to the tracked process(es) or shared with other processes. The guess is based on the [resource_tracker.column_maps.SERVER_ALLOCATION_CHECKS][] checks.

cloud_info `property` #

cloud_info

High-level cloud info.

Collected data from resource_tracker.get_cloud_info.

process_metrics `property` #

process_metrics

Collected data from resource_tracker.ProcessTracker.

Returns:

Type	Description
`TinyDataFrame`	A resource_tracker.tiny_data_frame.TinyDataFrame object containing the collected data or an empty list if the resource_tracker.ProcessTracker is not running.

system_metrics `property` #

system_metrics

Collected data from resource_tracker.SystemTracker.

Returns:

Type	Description
`TinyDataFrame`	A resource_tracker.tiny_data_frame.TinyDataFrame object containing the collected data or an empty list if the resource_tracker.SystemTracker is not running.

snapshot #

snapshot()

Collect the current state of the resource tracker.

Returns:

Type	Description
`dict`	A dictionary containing the current state of the resource tracker.

Source code in resource_tracker/tracker.py

def snapshot(self) -> dict:
    """Collect the current state of the resource tracker.

    Returns:
        A dictionary containing the current state of the resource tracker.
    """
    return {
        "metadata": {
            "version": 1,
            "resource_tracker": {
                "version": __version__,
                "implementation": "psutil" if is_psutil_available() else "procfs",
            },
            "pid": self.pid,
            "children": self.children,
            "interval": self.interval,
            "method": self.method,
            "autostart": self.autostart,
            "track_processes": "process_tracker" in self.trackers,
            "track_system": "system_tracker" in self.trackers,
            "discover_server": self.discover_server,
            "discover_cloud": self.discover_cloud,
            "start_time": self.start_time,
            "stop_time": self.stop_time or time(),
            "duration": round(
                (self.stop_time or time()) - self.start_time + self.interval, 2
            ),
        },
        "server_info": self.server_info,
        "cloud_info": self.cloud_info,
        "process_metrics": self.process_metrics.to_dict(),
        "system_metrics": self.system_metrics.to_dict(),
    }

from_snapshot `classmethod` #

from_snapshot(snapshot)

Create a ResourceTracker from a snapshot.

Parameters:

Name	Type	Description	Default
`snapshot`	`dict`	A dictionary containing the current state of the resource tracker, created by resource_tracker.ResourceTracker.snapshot.	required

Source code in resource_tracker/tracker.py

@classmethod
def from_snapshot(cls, snapshot: dict):
    """Create a ResourceTracker from a snapshot.

    Args:
        snapshot: A dictionary containing the current state of the resource tracker, created by [resource_tracker.ResourceTracker.snapshot][].
    """
    tracker = cls(
        pid=snapshot["metadata"]["pid"],
        children=snapshot["metadata"]["children"],
        interval=snapshot["metadata"]["interval"],
        method=snapshot["metadata"]["method"],
        autostart=False,
        track_processes=snapshot["metadata"]["track_processes"],
        track_system=snapshot["metadata"]["track_system"],
        discover_server=snapshot["metadata"]["discover_server"],
        discover_cloud=snapshot["metadata"]["discover_cloud"],
    )
    tracker.start_time = snapshot["metadata"]["start_time"]
    tracker.stop_time = snapshot["metadata"]["stop_time"]
    tracker._server_info = snapshot["server_info"]
    tracker._cloud_info = snapshot["cloud_info"]
    TinyDataFrame(data=snapshot["process_metrics"]).to_csv(
        tracker.process_tracker_filepath
    )
    TinyDataFrame(data=snapshot["system_metrics"]).to_csv(
        tracker.system_tracker_filepath
    )
    return tracker

dumps #

dumps()

Serialize the resource tracker to a JSON string.

Returns:

Type	Description
`str`	A JSON string containing the current state of the resource tracker.

Source code in resource_tracker/tracker.py

def dumps(self) -> str:
    """Serialize the resource tracker to a JSON string.

    Returns:
        A JSON string containing the current state of the resource tracker.
    """
    return json_dumps(self.snapshot())

loads `classmethod` #

loads(s)

Deserialize the resource tracker from a JSON string.

Parameters:

Name	Type	Description	Default
`s`	`str`	The JSON string to deserialize the resource tracker from.	required

Source code in resource_tracker/tracker.py

@classmethod
def loads(cls, s: str):
    """Deserialize the resource tracker from a JSON string.

    Args:
        s: The JSON string to deserialize the resource tracker from.
    """
    return cls.from_snapshot(json_loads(s))

dump #

dump(file)

Serialize the resource tracker to a gzipped JSON file.

Parameters:

Name	Type	Description	Default
`file`	`str`	The path to the file to write the serialized resource tracker to.	required

Source code in resource_tracker/tracker.py

def dump(self, file: str):
    """Serialize the resource tracker to a gzipped JSON file.

    Args:
        file: The path to the file to write the serialized resource tracker to.
    """
    with gzip_open(file, "wb") as f:
        f.write(self.dumps().encode())

load `classmethod` #

load(file)

Deserialize the resource tracker from a gzipped JSON file.

Parameters:

Name	Type	Description	Default
`file`	`str`	The path to the file to read the serialized resource tracker from.	required

Source code in resource_tracker/tracker.py

@classmethod
def load(cls, file: str):
    """Deserialize the resource tracker from a gzipped JSON file.

    Args:
        file: The path to the file to read the serialized resource tracker from.
    """
    with gzip_open(file, "rb") as f:
        return cls.loads(f.read().decode())

get_combined_metrics #

get_combined_metrics(bytes=False, human_names=False, system_prefix=None, process_prefix=None)

Collected data both from the resource_tracker.ProcessTracker and resource_tracker.SystemTracker.

This is effectively binding the two dataframes together by timestamp, and adding a prefix to the column names to distinguish between the system and process metrics.

Parameters:

Name	Type	Description	Default
`bytes`	`bool`	Whether to convert all metrics (e.g. memory, VRAM, disk usage) to bytes. Defaults to False, reporting as documented at resource_tracker.ProcessTracker and resource_tracker.SystemTracker (kB, MiB, or GiB).	`False`
`human_names`	`bool`	Whether to rename the columns to use human-friendly names. Defaults to False, reporting as documented at resource_tracker.ProcessTracker and resource_tracker.SystemTracker with prefixes.	`False`
`system_prefix`	`Optional[str]`	Prefix to add to the system-level column names. Defaults to "system_" or "System " based on the value of `human_names`.	`None`
`process_prefix`	`Optional[str]`	Prefix to add to the process-level column names. Defaults to "process_" or "Process " based on the value of `human_names`.	`None`

Returns:

Type	Description
`TinyDataFrame`	A resource_tracker.tiny_data_frame.TinyDataFrame object containing the combined data or an empty list if tracker(s) not running.

Source code in resource_tracker/tracker.py

def get_combined_metrics(
    self,
    bytes: bool = False,
    human_names: bool = False,
    system_prefix: Optional[str] = None,
    process_prefix: Optional[str] = None,
) -> TinyDataFrame:
    """Collected data both from the [resource_tracker.ProcessTracker][] and [resource_tracker.SystemTracker][].

    This is effectively binding the two dataframes together by timestamp,
    and adding a prefix to the column names to distinguish between the system and process metrics.

    Args:
        bytes: Whether to convert all metrics (e.g. memory, VRAM, disk usage) to bytes. Defaults to False, reporting as documented at [resource_tracker.ProcessTracker][] and [resource_tracker.SystemTracker][] (kB, MiB, or GiB).
        human_names: Whether to rename the columns to use human-friendly names. Defaults to False, reporting as documented at [resource_tracker.ProcessTracker][] and [resource_tracker.SystemTracker][] with prefixes.
        system_prefix: Prefix to add to the system-level column names. Defaults to "system_" or "System " based on the value of `human_names`.
        process_prefix: Prefix to add to the process-level column names. Defaults to "process_" or "Process " based on the value of `human_names`.

    Returns:
        A [resource_tracker.tiny_data_frame.TinyDataFrame][] object containing the combined data or an empty list if tracker(s) not running.
    """
    try:
        process_metrics = self.process_metrics
        system_metrics = self.system_metrics

        # ensure both have the same length
        if len(process_metrics) > len(system_metrics):
            process_metrics = process_metrics[: len(system_metrics)]
        elif len(system_metrics) > len(process_metrics):
            system_metrics = system_metrics[: len(process_metrics)]

        # nothing to report on
        if len(process_metrics) == 0:
            return TinyDataFrame(data=[])

        if bytes:
            for col, factor in BYTE_MAPPING.items():
                for metrics in (system_metrics, process_metrics):
                    if col in metrics.columns:
                        metrics[col] = [v * factor for v in metrics[col]]

        if system_prefix is None:
            system_prefix = "system_" if not human_names else "System "
        if process_prefix is None:
            process_prefix = "process_" if not human_names else "Process "

        # cbind the two dataframes with column name prefixes and optional human-friendly names
        combined = system_metrics.rename(
            columns={
                n: (
                    (system_prefix if n != "timestamp" else "")
                    + (n if not human_names else HUMAN_NAMES_MAPPING.get(n, n))
                )
                for n in system_metrics.columns
            }
        )
        for col in process_metrics.columns[1:]:
            combined[
                process_prefix
                + (col if not human_names else HUMAN_NAMES_MAPPING.get(col, col))
            ] = process_metrics[col]

        return combined
    except Exception as e:
        with suppress(Exception):
            logger.warning(
                f"Kept {len(process_metrics) if 'process_metrics' in locals() else 'unknown'} records of process metrics out of {len(self.process_metrics)} collected records ({self.process_metrics.columns}), "
                f"and {len(system_metrics) if 'system_metrics' in locals() else 'unknown'} records of system metrics out of {len(self.system_metrics)} collected records ({self.system_metrics.columns}), "
                f"but creating the combined metrics dataframe failed with error: {e}"
            )
            logger.warning(f"Process metrics: {self.process_metrics.to_dict()}")
            logger.warning(f"System metrics: {self.system_metrics.to_dict()}")
        raise

stats #

stats(specs=[StatSpec(column='process_cpu_usage', agg=mean, round=2), StatSpec(column='process_cpu_usage', agg=max, round=2), StatSpec(column='process_memory', agg=mean, round=2), StatSpec(column='process_memory', agg=max, round=2), StatSpec(column='process_gpu_usage', agg=mean, round=2), StatSpec(column='process_gpu_usage', agg=max, round=2), StatSpec(column='process_gpu_vram', agg=mean, round=2), StatSpec(column='process_gpu_vram', agg=max, round=2), StatSpec(column='process_gpu_utilized', agg=mean, round=2), StatSpec(column='process_gpu_utilized', agg=max, round=2), StatSpec(column='system_disk_space_used_gb', agg=max, round=2), StatSpec(column='system_net_recv_bytes', agg=sum), StatSpec(column='system_net_sent_bytes', agg=sum), StatSpec(column='timestamp', agg=lambda x: max(x) - min(x), agg_name='duration')])

Collect statistics from the resource tracker.

Parameters:

Name	Type	Description	Default
`specs`	`List[StatSpec]`	A list of resource_tracker.tiny_data_frame.StatSpec objects specifying the statistics to collect.	[StatSpec(column='process_cpu_usage', agg=mean, round=2), StatSpec(column='process_cpu_usage', agg=max, round=2), StatSpec(column='process_memory', agg=mean, round=2), StatSpec(column='process_memory', agg=max, round=2), StatSpec(column='process_gpu_usage', agg=mean, round=2), StatSpec(column='process_gpu_usage', agg=max, round=2), StatSpec(column='process_gpu_vram', agg=mean, round=2), StatSpec(column='process_gpu_vram', agg=max, round=2), StatSpec(column='process_gpu_utilized', agg=mean, round=2), StatSpec(column='process_gpu_utilized', agg=max, round=2), StatSpec(column='system_disk_space_used_gb', agg=max, round=2), StatSpec(column='system_net_recv_bytes', agg=sum), StatSpec(column='system_net_sent_bytes', agg=sum), StatSpec(column='timestamp', agg=lambda x: max(x) - min(x), agg_name='duration')]

Returns:

Type	Description
`dict`	A dictionary containing the collected statistics.

Source code in resource_tracker/tracker.py

def stats(
    self,
    specs: List[StatSpec] = [
        StatSpec(column="process_cpu_usage", agg=mean, round=2),
        StatSpec(column="process_cpu_usage", agg=max, round=2),
        StatSpec(column="process_memory", agg=mean, round=2),
        StatSpec(column="process_memory", agg=max, round=2),
        StatSpec(column="process_gpu_usage", agg=mean, round=2),
        StatSpec(column="process_gpu_usage", agg=max, round=2),
        StatSpec(column="process_gpu_vram", agg=mean, round=2),
        StatSpec(column="process_gpu_vram", agg=max, round=2),
        StatSpec(column="process_gpu_utilized", agg=mean, round=2),
        StatSpec(column="process_gpu_utilized", agg=max, round=2),
        StatSpec(column="system_disk_space_used_gb", agg=max, round=2),
        StatSpec(column="system_net_recv_bytes", agg=sum),
        StatSpec(column="system_net_sent_bytes", agg=sum),
        StatSpec(
            column="timestamp", agg=lambda x: max(x) - min(x), agg_name="duration"
        ),
    ],
) -> dict:
    """Collect statistics from the resource tracker.

    Args:
        specs: A list of [resource_tracker.tiny_data_frame.StatSpec][] objects specifying the statistics to collect.

    Returns:
        A dictionary containing the collected statistics.
    """
    if self.n_samples > 0:
        stats = self.get_combined_metrics().stats(specs)
        stats["timestamp"]["duration"] += self.interval
        return stats
    else:
        raise RuntimeError("No metrics collected (yet)")

running `property` #

running

Check if the resource tracker is running.

Returns:

Type	Description
`bool`	True if the resource tracker is running, False if already stopped.

wait_for_samples #

wait_for_samples(n=1, timeout=5)

Wait for at least one sample to be collected.

Parameters:

Name	Type	Description	Default
`n`	`int`	The minimum number of samples to collect. Defaults to 1.	`1`
`timeout`	`float`	The maximum time to wait for a sample. Defaults to 5 seconds.	`5`

Source code in resource_tracker/tracker.py

def wait_for_samples(self, n: int = 1, timeout: float = 5):
    """Wait for at least one sample to be collected.

    Args:
        n: The minimum number of samples to collect. Defaults to 1.
        timeout: The maximum time to wait for a sample. Defaults to 5 seconds.
    """
    if self.running:
        while self.n_samples < n:
            sleep(self.interval / 10)
            if time() - self.start_time > timeout:
                raise RuntimeError(
                    f"Timed out waiting for resource tracker to collect {n} samples"
                )
    else:
        if self.n_samples < n:
            raise RuntimeError(
                f"Resource tracker has been already stopped with {self.n_samples} sample(s), "
                f"cannot wait to collect the requested {n} sample(s)."
            )

recommend_resources #

recommend_resources(historical_stats=[])

Recommend optimal resource allocation based on the measured resource tracker data.

The recommended resources are based on the following rules:

target average CPU usage of the process(es)
target maximum memory usage of the process(es) with a 20% buffer
target maximum number of GPUs used by the process(es)
target maximum VRAM usage of the process(es) with a 20% buffer

Parameters:

Name	Type	Description	Default
`historical_stats`	`List[dict]`	Optional list of historical statistics (as returned by resource_tracker.ResourceTracker.stats) to consider when making recommendations. These will be combined with the current stats.	`[]`

Returns:

Type	Description
`dict`	A dictionary containing the recommended resources (cpu, memory, gpu, vram).

Source code in resource_tracker/tracker.py

def recommend_resources(self, historical_stats: List[dict] = []) -> dict:
    """Recommend optimal resource allocation based on the measured resource tracker data.

    The recommended resources are based on the following rules:

    - target average CPU usage of the process(es)
    - target maximum memory usage of the process(es) with a 20% buffer
    - target maximum number of GPUs used by the process(es)
    - target maximum VRAM usage of the process(es) with a 20% buffer

    Args:
        historical_stats: Optional list of historical statistics (as returned by [resource_tracker.ResourceTracker.stats][])
                          to consider when making recommendations. These will be combined with the current stats.

    Returns:
        A dictionary containing the recommended resources (cpu, memory, gpu, vram).
    """
    self.wait_for_samples(n=1, timeout=self.interval * 5)

    current_stats = self.stats()
    if historical_stats:
        stats = aggregate_stats([current_stats] + historical_stats)
    else:
        stats = current_stats

    rec = {}
    # target average CPU usage
    rec["cpu"] = max(1, round(stats["process_cpu_usage"]["mean"]))
    # target maximum memory usage (kB->MB) with a 20% buffer
    rec["memory"] = round_memory(mb=stats["process_memory"]["max"] * 1.2 / 1024)
    # target maximum GPU number of GPUs used
    rec["gpu"] = (
        max(1, round(stats["process_gpu_usage"]["max"]))
        if stats["process_gpu_usage"]["mean"] > 0
        else 0
    )
    # target maximum VRAM usage (MiB) with a 20% buffer
    rec["vram"] = (
        round_memory(mb=stats["process_gpu_vram"]["max"] * 1.2)
        if stats["process_gpu_vram"]["max"] > 0
        else 0
    )
    return rec

recommend_server #

recommend_server(**kwargs)

Recommend the cheapest cloud server matching the recommended resources.

Parameters:

Name	Type	Description	Default
`**kwargs`		Additional filtering arguments (e.g. vendor_id or compliance_framework_id) to pass to the Spare Cores Keeper API.	`{}`

Returns:

Type	Description
`dict`	A dictionary containing the recommended cloud server. Response format is described at https://keeper.sparecores.net/redoc#tag/Query-Resources/operation/search_servers_servers_get.

Source code in resource_tracker/tracker.py

def recommend_server(self, **kwargs) -> dict:
    """Recommend the cheapest cloud server matching the recommended resources.

    Args:
        **kwargs: Additional filtering arguments (e.g. vendor_id or compliance_framework_id) to pass to the Spare Cores Keeper API.

    Returns:
        A dictionary containing the recommended cloud server. Response format is described at <https://keeper.sparecores.net/redoc#tag/Query-Resources/operation/search_servers_servers_get>.
    """
    historical_stats = kwargs.pop("historical_stats", [])
    rec = self.recommend_resources(historical_stats=historical_stats)
    return get_recommended_cloud_servers(**rec, **kwargs, n=1)[0]

SystemTracker #

Track system-wide resource usage.

This class monitors system resources like CPU times and usage, memory usage, GPU and VRAM utilization, disk I/O, and network traffic for the entire system.

Data is collected every interval seconds and written to the stdout or output_file (if provided) as CSV. Currently, the following columns are tracked:

timestamp (float): The current timestamp.
processes (int): The number of running processes.
utime (int): The total user+nice mode CPU time in seconds.
stime (int): The total system mode CPU time in seconds.
cpu_usage (float): The current CPU usage between 0 and number of CPUs.
memory_free (int): The amount of free memory in kB.
memory_used (int): The amount of used memory in kB.
memory_buffers (int): The amount of memory used for buffers in kB.
memory_cached (int): The amount of memory used for caching in kB.
memory_active (int): The amount of memory used for active pages in kB.
memory_inactive (int): The amount of memory used for inactive pages in kB.
disk_read_bytes (int): The total number of bytes read from disk.
disk_write_bytes (int): The total number of bytes written to disk.
disk_space_total_gb (float): The total disk space in GB.
disk_space_used_gb (float): The used disk space in GB.
disk_space_free_gb (float): The free disk space in GB.
net_recv_bytes (int): The total number of bytes received over network.
net_sent_bytes (int): The total number of bytes sent over network.
gpu_usage (float): The current GPU utilization between 0 and GPU count.
gpu_vram (float): The current GPU memory used in MiB.
gpu_utilized (int): The number of GPUs with utilization > 0.

Parameters:

Name	Type	Description	Default
`start_time`	`float`	Time when to start tracking. Defaults to current time.	`time()`
`interval`	`float`	Sampling interval in seconds. Defaults to 1.	`1`
`autostart`	`bool`	Whether to start tracking immediately. Defaults to True.	`True`
`output_file`	`str`	File to write the output to. Defaults to None, print to stdout.	`None`

Methods:

Name	Description
`__call__`	Dummy method to make this class callable.
`diff_stats`	Calculate stats since last call.
`start_tracking`	Start an infinite loop tracking system resource usage.

Source code in resource_tracker/tracker.py

class SystemTracker:
    """Track system-wide resource usage.

    This class monitors system resources like CPU times and usage, memory usage,
    GPU and VRAM utilization, disk I/O, and network traffic for the entire system.

    Data is collected every `interval` seconds and written to the stdout or
    `output_file` (if provided) as CSV. Currently, the following columns are
    tracked:

    - timestamp (float): The current timestamp.
    - processes (int): The number of running processes.
    - utime (int): The total user+nice mode CPU time in seconds.
    - stime (int): The total system mode CPU time in seconds.
    - cpu_usage (float): The current CPU usage between 0 and number of CPUs.
    - memory_free (int): The amount of free memory in kB.
    - memory_used (int): The amount of used memory in kB.
    - memory_buffers (int): The amount of memory used for buffers in kB.
    - memory_cached (int): The amount of memory used for caching in kB.
    - memory_active (int): The amount of memory used for active pages in kB.
    - memory_inactive (int): The amount of memory used for inactive pages in kB.
    - disk_read_bytes (int): The total number of bytes read from disk.
    - disk_write_bytes (int): The total number of bytes written to disk.
    - disk_space_total_gb (float): The total disk space in GB.
    - disk_space_used_gb (float): The used disk space in GB.
    - disk_space_free_gb (float): The free disk space in GB.
    - net_recv_bytes (int): The total number of bytes received over network.
    - net_sent_bytes (int): The total number of bytes sent over network.
    - gpu_usage (float): The current GPU utilization between 0 and GPU count.
    - gpu_vram (float): The current GPU memory used in MiB.
    - gpu_utilized (int): The number of GPUs with utilization > 0.

    Args:
        start_time: Time when to start tracking. Defaults to current time.
        interval: Sampling interval in seconds. Defaults to 1.
        autostart: Whether to start tracking immediately. Defaults to True.
        output_file: File to write the output to. Defaults to None, print to stdout.
    """

    def __init__(
        self,
        start_time: float = time(),
        interval: float = 1,
        autostart: bool = True,
        output_file: str = None,
    ):
        _, self.get_system_stats = get_tracker_implementation()

        self.status = "running"
        self.interval = interval
        self.cycle = 0
        self.start_time = start_time

        # dummy data collection so that diffing on the first time does not fail
        self.stats = self.get_system_stats()

        if autostart:
            # wait for the start time to be reached
            if start_time > time():
                sleep(start_time - time())
            # we can now start. 1st interval used to collect baseline
            self.start_tracking(output_file)

    def __call__(self):
        """Dummy method to make this class callable."""
        pass

    def diff_stats(self):
        """Calculate stats since last call."""
        last_stats = self.stats
        self.stats = self.get_system_stats()
        self.cycle += 1

        time_diff = self.stats["timestamp"] - last_stats["timestamp"]

        total_read_bytes = 0
        total_write_bytes = 0
        for disk_name in set(self.stats["disk_stats"]) & set(last_stats["disk_stats"]):
            read_bytes = max(
                0,
                self.stats["disk_stats"][disk_name]["read_bytes"]
                - last_stats["disk_stats"][disk_name]["read_bytes"],
            )
            write_bytes = max(
                0,
                self.stats["disk_stats"][disk_name]["write_bytes"]
                - last_stats["disk_stats"][disk_name]["write_bytes"],
            )
            total_read_bytes += read_bytes
            total_write_bytes += write_bytes

        disk_space_total = 0
        disk_space_used = 0
        disk_space_free = 0
        for disk_space in self.stats["disk_spaces"].values():
            disk_space_total += disk_space["total"]
            disk_space_used += disk_space["used"]
            disk_space_free += disk_space["free"]

        return {
            "timestamp": round(self.stats["timestamp"], 3),
            "processes": self.stats["processes"],
            "utime": max(0, round(self.stats["utime"] - last_stats["utime"], 6)),
            "stime": max(0, round(self.stats["stime"] - last_stats["stime"], 6)),
            "cpu_usage": round(
                max(
                    0,
                    (
                        (self.stats["utime"] + self.stats["stime"])
                        - (last_stats["utime"] + last_stats["stime"])
                    )
                    / time_diff,
                ),
                4,
            ),
            "memory_free": self.stats["memory_free"],
            "memory_used": self.stats["memory_used"],
            "memory_buffers": self.stats["memory_buffers"],
            "memory_cached": self.stats["memory_cached"],
            "memory_active": self.stats["memory_active"],
            "memory_inactive": self.stats["memory_inactive"],
            "disk_read_bytes": total_read_bytes,
            "disk_write_bytes": total_write_bytes,
            "disk_space_total_gb": round(disk_space_total / (1024**3), 2),
            "disk_space_used_gb": round(disk_space_used / (1024**3), 2),
            "disk_space_free_gb": round(disk_space_free / (1024**3), 2),
            "net_recv_bytes": max(
                0, self.stats["net_recv_bytes"] - last_stats["net_recv_bytes"]
            ),
            "net_sent_bytes": max(
                0, self.stats["net_sent_bytes"] - last_stats["net_sent_bytes"]
            ),
            "gpu_usage": self.stats["gpu_usage"],
            "gpu_vram": self.stats["gpu_vram"],
            "gpu_utilized": self.stats["gpu_utilized"],
        }

    def start_tracking(
        self, output_file: Optional[str] = None, print_header: bool = True
    ):
        """Start an infinite loop tracking system resource usage.

        A CSV line is written every `interval` seconds.

        Args:
            output_file: File to write the output to. Defaults to None, printing to stdout.
            print_header: Whether to print the header of the CSV. Defaults to True.
        """
        file_handle = open(output_file, "wb") if output_file else stdout.buffer
        try:
            while True:
                current_stats = self.diff_stats()
                # don't print values yet, we collect data for the 1st baseline
                if self.cycle == 1:
                    if print_header:
                        file_handle.write(
                            render_csv_row(
                                current_stats.keys(), quoting=QUOTE_NONNUMERIC
                            )
                        )
                else:
                    file_handle.write(
                        render_csv_row(current_stats.values(), quoting=QUOTE_NONNUMERIC)
                    )
                if output_file:
                    file_handle.flush()
                # sleep until the next interval
                sleep(max(0, self.start_time + self.interval * self.cycle - time()))
        finally:
            if output_file and not file_handle.closed:
                file_handle.close()

call #

__call__()

Dummy method to make this class callable.

Source code in resource_tracker/tracker.py

def __call__(self):
    """Dummy method to make this class callable."""
    pass

diff_stats #

diff_stats()

Calculate stats since last call.

Source code in resource_tracker/tracker.py

def diff_stats(self):
    """Calculate stats since last call."""
    last_stats = self.stats
    self.stats = self.get_system_stats()
    self.cycle += 1

    time_diff = self.stats["timestamp"] - last_stats["timestamp"]

    total_read_bytes = 0
    total_write_bytes = 0
    for disk_name in set(self.stats["disk_stats"]) & set(last_stats["disk_stats"]):
        read_bytes = max(
            0,
            self.stats["disk_stats"][disk_name]["read_bytes"]
            - last_stats["disk_stats"][disk_name]["read_bytes"],
        )
        write_bytes = max(
            0,
            self.stats["disk_stats"][disk_name]["write_bytes"]
            - last_stats["disk_stats"][disk_name]["write_bytes"],
        )
        total_read_bytes += read_bytes
        total_write_bytes += write_bytes

    disk_space_total = 0
    disk_space_used = 0
    disk_space_free = 0
    for disk_space in self.stats["disk_spaces"].values():
        disk_space_total += disk_space["total"]
        disk_space_used += disk_space["used"]
        disk_space_free += disk_space["free"]

    return {
        "timestamp": round(self.stats["timestamp"], 3),
        "processes": self.stats["processes"],
        "utime": max(0, round(self.stats["utime"] - last_stats["utime"], 6)),
        "stime": max(0, round(self.stats["stime"] - last_stats["stime"], 6)),
        "cpu_usage": round(
            max(
                0,
                (
                    (self.stats["utime"] + self.stats["stime"])
                    - (last_stats["utime"] + last_stats["stime"])
                )
                / time_diff,
            ),
            4,
        ),
        "memory_free": self.stats["memory_free"],
        "memory_used": self.stats["memory_used"],
        "memory_buffers": self.stats["memory_buffers"],
        "memory_cached": self.stats["memory_cached"],
        "memory_active": self.stats["memory_active"],
        "memory_inactive": self.stats["memory_inactive"],
        "disk_read_bytes": total_read_bytes,
        "disk_write_bytes": total_write_bytes,
        "disk_space_total_gb": round(disk_space_total / (1024**3), 2),
        "disk_space_used_gb": round(disk_space_used / (1024**3), 2),
        "disk_space_free_gb": round(disk_space_free / (1024**3), 2),
        "net_recv_bytes": max(
            0, self.stats["net_recv_bytes"] - last_stats["net_recv_bytes"]
        ),
        "net_sent_bytes": max(
            0, self.stats["net_sent_bytes"] - last_stats["net_sent_bytes"]
        ),
        "gpu_usage": self.stats["gpu_usage"],
        "gpu_vram": self.stats["gpu_vram"],
        "gpu_utilized": self.stats["gpu_utilized"],
    }

start_tracking #

start_tracking(output_file=None, print_header=True)

Start an infinite loop tracking system resource usage.

A CSV line is written every interval seconds.

Parameters:

Name	Type	Description	Default
`output_file`	`Optional[str]`	File to write the output to. Defaults to None, printing to stdout.	`None`
`print_header`	`bool`	Whether to print the header of the CSV. Defaults to True.	`True`

Source code in resource_tracker/tracker.py

def start_tracking(
    self, output_file: Optional[str] = None, print_header: bool = True
):
    """Start an infinite loop tracking system resource usage.

    A CSV line is written every `interval` seconds.

    Args:
        output_file: File to write the output to. Defaults to None, printing to stdout.
        print_header: Whether to print the header of the CSV. Defaults to True.
    """
    file_handle = open(output_file, "wb") if output_file else stdout.buffer
    try:
        while True:
            current_stats = self.diff_stats()
            # don't print values yet, we collect data for the 1st baseline
            if self.cycle == 1:
                if print_header:
                    file_handle.write(
                        render_csv_row(
                            current_stats.keys(), quoting=QUOTE_NONNUMERIC
                        )
                    )
            else:
                file_handle.write(
                    render_csv_row(current_stats.values(), quoting=QUOTE_NONNUMERIC)
                )
            if output_file:
                file_handle.flush()
            # sleep until the next interval
            sleep(max(0, self.start_time + self.interval * self.cycle - time()))
    finally:
        if output_file and not file_handle.closed:
            file_handle.close()

resource_tracker

resource_tracker #

get_cloud_info cached #

get_server_info #

ProcessTracker #

__call__ #

diff_stats #

start_tracking #

ResourceTracker #

start #

stop #

n_samples property #

server_info property #

cloud_info property #

process_metrics property #

system_metrics property #

snapshot #

from_snapshot classmethod #

dumps #

loads classmethod #

dump #

load classmethod #

get_combined_metrics #

stats #

running property #

wait_for_samples #

recommend_resources #

recommend_server #

SystemTracker #

__call__ #

diff_stats #

start_tracking #

get_cloud_info `cached` #

call #

n_samples `property` #

server_info `property` #

cloud_info `property` #

process_metrics `property` #

system_metrics `property` #

from_snapshot `classmethod` #

loads `classmethod` #

load `classmethod` #

running `property` #

call #