tprofiler.utils.device

Device utilities for PyTorch operations.

This module provides utilities for device detection, management, and configuration across different hardware backends including CPU and CUDA. It handles device selection, torch device namespace retrieval, device ID management, and NCCL backend configuration for distributed training scenarios.

The module is inspired by torchtune’s device utilities and provides a unified interface for working with different device types in PyTorch applications.

is_cuda_available

tprofiler.utils.device.is_cuda_available() bool[source]

Check if CUDA is available on the current system.

This function wraps PyTorch’s built-in CUDA availability check to determine if CUDA-enabled GPUs are accessible for computation.

Returns:

True if CUDA is available, False otherwise.

Return type:

bool

Example::
>>> if is_cuda_available():
...     print("CUDA is available for GPU computation")
... else:
...     print("CUDA is not available, will use CPU")

get_device_name

tprofiler.utils.device.get_device_name() str[source]

Get the torch.device name based on the current machine’s available hardware.

This function determines the appropriate device type for PyTorch operations by checking hardware availability. Currently supports CPU and CUDA devices.

Returns:

The device name string (‘cuda’ if CUDA is available, otherwise ‘cpu’).

Return type:

str

Example::
>>> device_name = get_device_name()
>>> print(device_name)  # 'cuda' or 'cpu'

get_torch_device

tprofiler.utils.device.get_torch_device() any[source]

Return the corresponding torch device namespace based on the current device type.

This function retrieves the appropriate torch device module (e.g., torch.cuda) based on the detected device type. It provides a unified way to access device-specific PyTorch functionality.

Returns:

The corresponding torch device namespace module.

Return type:

any

Example::
>>> device_module = get_torch_device()
>>> # Returns torch.cuda if CUDA is available, torch.cpu otherwise

get_device_id

tprofiler.utils.device.get_device_id() int[source]

Return the current device ID based on the detected device type.

This function retrieves the current device index for the active device type. For CUDA devices, this returns the current CUDA device ID. For CPU, this typically returns 0.

Returns:

The current device index.

Return type:

int

Example::
>>> device_id = get_device_id()
>>> print(f"Current device ID: {device_id}")

get_nccl_backend

tprofiler.utils.device.get_nccl_backend() str[source]

Return the appropriate NCCL backend type based on the current device type.

This function determines the correct NCCL (NVIDIA Collective Communications Library) backend for distributed training operations. NCCL is primarily used for multi-GPU communication in distributed PyTorch training.

Returns:

The NCCL backend type string.

Return type:

str

Raises:

RuntimeError – If no available NCCL backend is found for the current device type.

Example::
>>> try:
...     backend = get_nccl_backend()
...     print(f"NCCL backend: {backend}")
... except RuntimeError as e:
...     print(f"Error: {e}")

set_expandable_segments

tprofiler.utils.device.set_expandable_segments(enable: bool) None[source]

Enable or disable expandable segments for CUDA memory allocation.

This function configures CUDA memory allocator settings to use expandable segments, which can help avoid out-of-memory (OOM) errors by allowing the memory pool to grow dynamically. This is particularly useful for training large models or handling variable batch sizes.

Parameters:

enable (bool) – Whether to enable expandable segments for memory allocation.

Example::
>>> # Enable expandable segments to help avoid OOM
>>> set_expandable_segments(True)
>>>
>>> # Disable expandable segments for more predictable memory usage
>>> set_expandable_segments(False)