Monitor Utilization#

The MCLI Util command can be used to monitor cluster utilization and view all active and pending runs.

> mcli util
NAME  GPU_INSTANCE_TYPE  GPUS_AVAILABLE  GPUS_USED  GPUS_TOTAL
rXzX  8xa100_80gb        0               16         16

Active Runs:
NAME      USER  AGE     NODE_NAME     GPUS
my-run-1  alice 2hr     a100-40gb-01  8
my-run-2  bob   5min    a100-40gb-02  8

Queued Runs:
POS  RUN_NAME      USER          AGE   GPUS  PRIORITY
1    my-run-3      alice         4min  8     HIGH
2    my-run-4      bob           5min  8     DEFAULT

The first table will provide the GPUs available for each cluster:

  • NAME: Unique name for the cluster

  • GPU_INSTANCE_TYPE: The instance type name

  • GPUS_AVAILABLE: Number of total GPUs available

  • GPUS_USED: Number of GPUs currently in use across all runs

  • GPUS_TOTAL: The sum of available and used GPUs

Instances highlighted are some that have remaining capacity.

The second and third tables list the runs actively running in the cluster and runs pending execution:

  • NAME: Unique name for the run

  • USER: Name of the user who started the run

  • AGE: Length of time the run has been running

  • GPUS_USED: GPUs the run has been or will be allocated

  • PRIORITY: The run’s priority within the scheduling queue

Selecting a Cluster#

Registered cluster is a optional positional argument of the Util command. To get the cluster utilization of a specific cluster:

mcli util my_cluster