Introduction#

The Python API allows users to submit, monitor, and delete jobs in python instead of using the MosaicML CLI. Programmatically design complex sweeps and workflows, without ever resorting to shell scripting.

To get started, either follow the Quick Start or set the environment variable MOSAICML_API_KEY to automatically configure access to the MosaicML platform.

The Python API currently supports managing your runs, from creation to monitoring logs to deleting runs. As a quick reference, the following run-related methods are supported:

create_run

Launch a run

get_runs

Get a filtered list of runs

stop_runs

Stop a list of runs

delete_runs

Delete a list of runs

get_run_logs

Get the current logs for an active or completed run

follow_run_logs

Follow the logs for an active or completed run in the MosaicML platform

wait_for_run_status

Wait for a launched run to reach a specific status

For more details on these, please see Working with runs.

Configuring more advanced runs

If you are submitting runs inside of other runs, we recommend using an environment secret to set MOSAICML_API_KEY.

“Hello World”#

To submit a simple run, and then print out its logs while running, we first create the RunConfig object, which follows the same schema as the yaml files, see: Run schema.

from mcli.sdk import RunConfig

cluster = "<your-cluster>"

config = RunConfig(name='hello-world',
                   image='bash',
                   command='echo "Hello World!" && sleep 60',
                   gpu_type='none',
                   cluster=cluster)

YAML

If your config is already in a yaml file, the RunConfig can also be created with config = RunConfig.from_file('your_yaml.yaml') command.

Now, let’s create a simple script that submits the run, and then after the run starts, print the first line of the logs. To clean up, we will stop the run.

from mcli.sdk import create_run, wait_for_run_status, follow_run_logs, stop_run

# Create the run from a config
run = create_run(config)
print(f'Launching run {run.name}')

# Wait for the run to start "running"
run = wait_for_run_status(run, status='running')
print(f'Run named {run.name} has status {run.status}')

# Print the first line of logs
for line in follow_run_logs(run):
    print(f'First log line was: {line}')
    break

# Stop the run
run = stop_run(run)
print(f'Run named {run.name} has status {run.status.value}')

A few additional details about the above script

  • (wait_for_run_status()) waits for the run to reach the status (or later). status can be either a str or a RunStatus enum.

  • We use follow_run_logs() instead of get_run_logs() because it’s possible that the “Hello World!” line has not yet been printed by the time the call is made, so we want to wait to ensure it’s printed.

  • stop_run() stops a run, and will leave the logs intact (in contrast to delete_run() which also deletes the logs).

To clean up, let’s delete the run:

from mcli.sdk import delete_run

delete_run(run)

Next steps#