Environment Setup#

Setting up the environment for your code to run is easily configurable in the MosaicML platform.

MosaicML Platform Environment Variables#

We automatically set the following environment variables in your run container.

  • RUN_NAME: The name of your run as seen in the output of mcli get runs.

  • WORLD_SIZE: The total number of GPUs being used for the training run.

  • NODE_RANK: The rank of the node the container is running on. In a multi-node training job involving n nodes, nodes will have ranks 0, 1, ..., n - 1.

  • MASTER_ADDR: The network address of the node with rank 0 in the training job.

  • MASTER_PORT: The network port of the node with rank 0 in the training job.

Docker#

Build a docker image with all the required system packages for your code. Especially for large dependencies, including them in your docker will speed up the run start time. For more information, see the Docker documentation.

We maintain a set of public docker images for PyTorch, PyTorch Vision, and Composer on DockerHub.

To run with an existing docker image, use the image field:

image: mosaicml/composer:latest
from mcli.sdk import RunConfig
config = RunConfig(name='example',
                   image='mosaicml/composer:latest',
                   command='echo "Hello World!" && sleep 60',
                   gpu_type='none',
                   cluster='my-cluster')

Docker Tags

We strongly recommend using a fixed tag instead of latest for docker images to ensure reproducibility. Create and use meaningful tag names (e.g. v1.7.0) for your docker images.

Private images require setting up Docker Secrets with:

mcli create secrets docker

Environment Variables#

To add environment variables, use the env_variables field:

env_variables:
  - name: <unique_name>
    key: KEY
    value: VALUE

Secrets#

Secrets are credentials or other sensitive information that are only accessible to yourself. MCLI supports adding different secret types into your run environment as environment variables or mounted files.

mcli create secrets -h

All secrets are stored securely in a vault, maintained across your clusters, and added to every run. Your secrets are never shared with other users.

For more information, see the Secrets Page

Integrations#

Integrations set up execution environments quickly by spanning across mounted files, environment variables, commands, secrets, and clusters.

For example, the Weights & Biases Integration sets up all the neccessary environment variables for the W&B client:

integrations:
  - integration_type: wandb
    project: my_project
    entity: my_entity

For all the supported integrations, see The Integrations Page

Integrations for Live Updates

Integrations are resolved at runtime, so are ideal for adding environment configurations that change often.

For example, git repos can be added as an integration to set up the code base from its current state at runtime.