Introduction#
The Python API allows users to submit, monitor, and delete jobs in python instead of using the MosaicML CLI. Programmatically design complex sweeps and workflows, without ever resorting to shell scripting.
To get started, either follow the Quick Start or set the environment variable MOSAICML_API_KEY
to automatically configure access to the MosaicML platform.
The Python API currently supports managing your runs, from creation to monitoring logs to deleting runs. As a quick reference, the following run-related methods are supported:
Launch a run |
|
Get a filtered list of runs |
|
Stop a list of runs |
|
Delete a list of runs |
|
Get the current logs for an active or completed run |
|
Follow the logs for an active or completed run in the MosaicML platform |
|
Wait for a launched run to reach a specific status |
For more details on these, please see Working with runs.
Configuring more advanced runs
If you are submitting runs inside of other runs, we recommend using an environment secret to set MOSAICML_API_KEY
.
“Hello World”#
To submit a simple run, and then print out its logs while running, we first create the RunConfig
object, which follows the same schema as the yaml files, see: Run schema.
from mcli.sdk import RunConfig
cluster = "<your-cluster>"
config = RunConfig(name='hello-world',
image='bash',
command='echo "Hello World!" && sleep 60',
gpu_type='none',
cluster=cluster)
YAML
If your config is already in a yaml file, the RunConfig
can also be created with config = RunConfig.from_file('your_yaml.yaml')
command.
Now, let’s create a simple script that submits the run, and then after the run starts, print the first line of the logs. To clean up, we will stop the run.
from mcli.sdk import create_run, wait_for_run_status, follow_run_logs, stop_run
# Create the run from a config
run = create_run(config)
print(f'Launching run {run.name}')
# Wait for the run to start "running"
run = wait_for_run_status(run, status='running')
print(f'Run named {run.name} has status {run.status}')
# Print the first line of logs
for line in follow_run_logs(run):
print(f'First log line was: {line}')
break
# Stop the run
run = stop_run(run)
print(f'Run named {run.name} has status {run.status.value}')
A few additional details about the above script
(
wait_for_run_status()
) waits for the run to reach thestatus
(or later).status
can be either astr
or aRunStatus
enum.We use
follow_run_logs()
instead ofget_run_logs()
because it’s possible that the “Hello World!” line has not yet been printed by the time the call is made, so we want to wait to ensure it’s printed.stop_run()
stops a run, and will leave the logs intact (in contrast todelete_run()
which also deletes the logs).
To clean up, let’s delete the run:
from mcli.sdk import delete_run
delete_run(run)
Next steps#
Deeper dive into run management, see Working with runs.
Simple hyperparameter sweep example , see Sweep hyperparameters
Complex sweep with Optuna, see Optuna