Machine Learning Operations (MLOps) : Microsoft Azure
MLOps or ML Ops is a set of practices that aims to deploy and maintain machine learning models in production reliably and efficiently. The word is a compound of “machine learning” and the continuous development practice of DevOps in the software field.
This project was made by following the excellent YouTube video by MG
YouTube Link :https://www.youtube.com/playlist?list=PLiQS6N-W1p3m9squzZ2cPgGdH5SBhjY6f
github link MG:https://github.com/MG-Microsoft/MLOps_Workshop.git
github link (me):https://github.com/Sabyasachi6215/MLOPs.git
Agenda :
- Creation of a Devops Project.
- Creation of resource group ,variable group, service connection in Azure.
- Infrastructure as a code.
- MLOps CI Pipeline.
- Creation of CD Pipeline.
Create a Azure Devops Project :
Create resource group:
Create a Service Connection:
Project Settings : Service Connection
Create variable group:
It can be found on Azure DevOps (Pipelines → Library)
Create : Infrastructure as a code
Step1: Create Pipelines:
Step2 : Configure pipeline :
Infrastructure as a code pipeline can be found at :
environment_setup/iac-create-environment-pipeline-arm.yml
In the file : “iac-create-environment-pipeline-arm.yml”
only variable- “group” has to be changed :
- e.g. → group: mlops-wsh-vg ( Put your resource group name here)
# Output after running the infrastructure pipeline.
After running the pipeline please check in Resource Groups
The following infrastructures will be created :
Open Azure Machine Learning Workspace: in this case its depicted as MLOPSws1aml8
Create and Configuring a Compute :
Create a CI pipeline
To create CI pipeline two steps is required:
a) Create Agent :
we will use Use the classic editor to create a pipeline without YAML.
# installing requirements.txt
# testing the files :
script :
pytest training/train_test.py — doctest-modules — junitxml=junit/test-results.xml — cov=data_test — cov-report=xml — cov-report=html
# publishing the test results :
# Installing Azure CLI
script :
az extension add -n azure-cli-ml
# Creating azure ML workspace
script :
az ml workspace create -g $(azureml.resourceGroup) -w $(azureml.workspaceName) -l $(azureml.location) — exist-ok — yes
# Azure ML compute cluster
Script :
az ml computetarget create amlcompute -g $(azureml.resourceGroup) -w $(azureml.workspaceName) -n $(amlcompute.clusterName) -s $(amlcompute.vmSize) — min-nodes $(amlcompute.minNodes) — max-nodes $(amlcompute.maxNodes) — idle-seconds-before-scaledown $(amlcompute.idleSecondsBeforeScaledown)
# upload data to data store
Script :
az ml datastore upload -w $(azureml.workspaceName) -g $(azureml.resourceGroup) -n $(az ml datastore show-default -w $(azureml.workspaceName) -g $(azureml.resourceGroup) — query name -o tsv) -p data -u insurance — overwrite true
# Make Meta data and Models directory
Script :
mkdir metadata && mkdir models
# Training Model
Script :
az ml run submit-script -g $(azureml.resourceGroup) -w $(azureml.workSpaceName) -e $(experiment.name) — ct $(amlcompute.clusterName) -d conda_dependencies.yml -c train_insurance -t ../metadata/run.json train_aml.py
# Azure Model registry
Script :
az ml model register -g $(azureml.resourceGroup) -w $(azureml.workspaceName) -n $(model.name) -f metadata/run.json — asset-path outputs/models/insurance_model.pkl -d “Classification Model for filing a claim prediction “ — tag “data”=”insurance” — tag “model”=”classification” — model-framework SickitLearn -t metadata/model.json
# Downloading the Model:
Script :
az ml model download -g $(azureml.resourceGroup) -w $(azureml.workspaceName) -i $(jq -r .modelId metadata/model.json) -t ./models — overwrite
# Copy the files :
Contents:
- */metadata/*
**/models/*
**/deployment/*
**/tests/integration/*
**/package_requirement/*
# Publish Pipeline Artifact :
b) Configure variable names in pipeline:
# Below is an example :
# unit tests passed :
# After running the CI pipeline :
Create a Deployment Pipeline
There are 3 parts :
a) Add artifacts : Add from CI Pipeline
b)Deploy to Staging
c)Deploy to production
# Create a release pipeline :
Deploy to Staging Area
# Adding Python into the agent :
# Add ML Extension:
Script :
az extension add -n azure-cli-ml
# Deploy to Azure Container instance :
Script :
az ml model deploy -g $(azureml.resourceGroup) -w $(azureml.workspaceName) -n $(service.name.staging) -f ../metadata/model.json — dc aciDeploymentConfigStaging.yml — ic inferenceConfig.yml — overwrite
# bash : install requirements :
# Staging Test :
Script :
pytest staging_test.py — doctest-modules — junitxml=junit/test-results.xml — cov-report=xml — scoreurl $(az ml service show -g $(azureml.resourceGroup) -w $(azureml.workspaceName) -n $(service.name.staging) — query scoringUri -o tsv)
# Publish Staging test results :
b) Configure variable names in pipeline:
# Below is an example :
# After deployment to Staging area :
Deploy to prod Area :
# use Python installation :
# Install CLI ML extension :
Script :
az extension add -n azure-cli-ml
# Crate AKS (Kubernetes)
Script :
az ml computetarget create aks -g $(azureml.resourceGroup) -w $(azureml.workspaceName) -n $(aks.clusterName) -s $(aks.vmSize) -a $(aks.agentCount)
# Deploy AKS
Script :
az ml model deploy -g $(azureml.resourceGroup) -w $(azureml.workspaceName) -n $(service.name.prod) -f ../metadata/model.json — dc aksDeploymentConfigProd.yml — ic inferenceConfig.yml — ct $(aks.clusterName) — overwrite
# install python requirements
# Production test
Script :
pytest prod_test.py — doctest-modules — junitxml=junit/test-results.xml — cov =integration_test — cov-report=xml — cov-report=html — scoreurl $(az ml service show -g $(azureml.resourceGroup) -w $(azureml.workspaceName) -n $(service.name.prod) — query scoringUri -o tsv) — scorekey $(az ml service get-keys -g $(azureml.resourceGroup) -w $(azureml.workspaceName) -n $(service.name.prod) — query primaryKey -o tsv)
# Publish test results :
b) Configure variable names in pipeline:
# Below is an example :
After deployment to Prod :