Inference Quickstart#
You can easily deploy your model with the MosaicML platform with just a few simple steps. Before starting, make sure you’ve configured MosaicML access
Creating Your First Deployment#
For this tutorial, we’re going to deploy the MPT-7B Instruct model.
To submit your first deployment, copy the below yaml into a file called mpt_instruct_deploy.yaml
:
name: mpt-7b-instruct
compute:
gpus: 1
instance: oci.vm.gpu.a10.1
default_model:
model_type: mpt-7b-instruct
This yaml tells the MosaicML platform that you are requesting a single a10 gpu and would like to download the mpt-7b-instruct model from the HuggingFace hub.
Then, run:
mcli deploy -f mpt_instruct_deploy.yaml
After you’ve run the deploy
command, you’ll see the following output in your terminal (note that the hash after mpt-7b-instruct-
is a unique identifier that we append to the deployment name you provided in your yaml):
✔ Deployment mpt-7b-instruct-0t30xo submitted.
To see the deployment's status, use:
mcli get deployments
If you run mcli get deployments
, you’ll see the following output:
NAME USER CLUSTER GPU_TYPE GPU_NUM CREATED_TIME STATUS
mpt-7b-instruct-0t30xo user.email.com r7z13 a100_40gb 1 2023-05-17 07:24 PM PENDING
The mcli get deployments
command shows you all the deployments in your organization, so you may see deployments that were not created by you.
You can also get more details about a specific deployment by running mcli describe deployment mpt-7b-instruct-0t30xo
.
Interacting With Your Deployment#
You’ve created your first deployment, congrats! From here, MCLI has a few convenience commands that make it easier for you to interact with your deployment.
First, you may want to check your deployment’s status to see if it’s ready to start serving traffic. You can do that by running the following command:
mcli ping mpt-7b-instruct-0t30xo
If your deployment is ready, you should see the output:
mpt-7b-instruct-0t30xo's status:
{'status': 200}
where the status is an HTTP status code.
If your status code is 200
, your deployment is ready to server traffic!
Let’s try sending a request to your deployment using the Python SDK:
from mcli import predict, get_inference_deployment
deployment = get_inference_deployment("mpt-7b-instruct-0t30xo")
predict(deployment, {"inputs": ["hello world!"]})
You can also make the same request via the command line:
mcli predict mpt-7b-instruct-0t30xo --input '{"inputs": ["hello world!"]}'
You can also do the same with a basic curl command:
curl https://mpt-7b-instruct-0t30xo.inf.hosted-on.mosaicml.hosting/predict_stream \
-H "Authorization: <your_api_key>" \
-d '{"inputs": ["hello world!"]}'
The address above is for the example, you can look up the address for your own deployment using mcli describe deployment <name>
Once you’re done with your deployment, you can delete it with the following command:
mcli delete deployments --name mpt-7b-instruct-0t30xo
Next Steps#
There are many more ways you can customize your deployments.
We support downloading checkpoint files from any remote storage such as s3 and you can customize your model’s forward logic by implementing a custom model handler.
You can even write your own webserver and replace the mosaicml/inference
image with your own.
Take a look at the Inference Schema Page for more information.