Monitor Utilization#
The MCLI Util command can be used to monitor cluster utilization and view all active and pending runs.
> mcli util
NAME GPU_INSTANCE_TYPE GPUS_AVAILABLE GPUS_USED GPUS_TOTAL
rXzX 8xa100_80gb 0 16 16
Active Runs:
NAME USER AGE NODE_NAME GPUS
my-run-1 alice 2hr a100-40gb-01 8
my-run-2 bob 5min a100-40gb-02 8
Queued Runs:
POS RUN_NAME USER AGE GPUS PRIORITY
1 my-run-3 alice 4min 8 HIGH
2 my-run-4 bob 5min 8 DEFAULT
The first table will provide the GPUs available for each cluster:
NAME
: Unique name for the clusterGPU_INSTANCE_TYPE
: The instance type nameGPUS_AVAILABLE
: Number of total GPUs availableGPUS_USED
: Number of GPUs currently in use across all runsGPUS_TOTAL
: The sum of available and used GPUs
Instances highlighted are some that have remaining capacity.
The second and third tables list the runs actively running in the cluster and runs pending execution:
NAME
: Unique name for the runUSER
: Name of the user who started the runAGE
: Length of time the run has been runningGPUS_USED
: GPUs the run has been or will be allocatedPRIORITY
: The run’s priority within the scheduling queue
Selecting a Cluster#
Registered cluster is a optional positional argument of the Util command. To get the cluster utilization of a specific cluster:
mcli util my_cluster