Changelog
v0.4.0 (dev)#
- Show card for failed step
- Note failed step status in card
- Standardize timestamp format and timezone
v0.3.0 (March 27, 2025)#
- Extract background process management and related complexities from the
track_resources
decorator into theResourceTracker
class to track resource usage of a process and/or the system in a non-blocking way - Add unit tests for the
ResourceTracker
class, including checks for deadlocks and partially started trackers - Keep test HTML card as GHA artifacts for manual inspection
- Improve documentation
v0.2.1 (March 21, 2025)#
- Fix don't always round up CPU/GPU recommendations
- Improve error message on missing historical data
- Improve documentation
v0.2.0 (March 21, 2025)#
Relatively major package rewrite to support alternative tracker implementations (other than directly reading from /proc
). No breaking changes in the public API on Linux.
- Add tracker implementation using
psutil
to support MacOS and Windows - Fix data issues with the
/proc
implementation after validating with thepsutil
version (e.g. number of processes reported) - Refactor code for better maintainability
- Add additional unit tests:
- Tracker implementation using
procfs
- Tracker implementation using
psutil
- Consistency between tracker implementations
- Metaflow decorators
- Tracker implementation using
- Extend CI/CD pipeline:
- Test on Linux, MacOS, and Windows
- Test multiple Python versions (3.9, 3.10, 3.11, 3.12, 3.13)
- Improve documentation
v0.1.2 (March 18, 2025)#
- Add experimental psutil support
- Add server info card for operating system
v0.1.1 (March 17, 2025)#
- Fix rounding down recommended vCPUs with <0.5 load
- Add info popups with more details and disclaimers for recommendations
- Add detection for shared server environments
- Add potential cost savings card
- Improve documentation
v0.1.0 (March 12, 2025)#
Initial PyPI release of resource-tracker
with the following features:
- Detect if the system is running on a cloud provider, and if so, detect the provider, region, and instance type
- Detect main server hardware (CPU count, memory amount, disk space, GPU count and VRAM amount)
- Track system-wide resource usage:
- Process count
- CPU usage (user + system time, relative vCPU percentage)
- Memory usage (total, free, used, buffers, cached, active anon, inactive anon pages)
- Disk I/O (read and write bytes)
- Disk space usage (total, used, free)
- Network I/O (receive and transmit bytes)
- GPU and VRAM usage (using
nvidia-smi
)
- Track resource usage of a process and its descendant processes:
- Descendant process count
- CPU usage (user + system time, relative vCPU percentage)
- Memory usage (based on proportional set sizes)
- Disk I/O (read and write bytes)
- GPU and VRAM usage (using
nvidia-smi pmon
)
- Add Metaflow plugin for tracking resource usage of a step:
- Track process and system resource usage for the duration of the step
- Generate a card with the resource usage data
- Suggest
@resources
decorator for future runs - Find cheapest cloud instance type for a step