Definition of MLOps:

Definition: Machine learning operations (machine learning operations management) is a dicipline concerned with the design, construction and management of reproducible, testable, and evolvable ML-powered software.

Goals of MLOps:

  1. Unify the release cycle for machine learning and software applications.
  2. Enable automated testing of machine learning artifacts (e.g. data validation, ML model testing, and ML model integration testing)
  3. Apply agile principles to machine learning projects.
  4. Integrate machine learning models and datasets into CI/CD systems
  5. Reduce technical debt across machine learning models.
  6. Do so in a principled manner that is language, framework, platform, and infrastructure agnostic.

Motivation for ML Ops:

  1. Deployment Gap
  2. Scenarios Necessitating Management

the History of MLOps:

A High Level View of Machine Learning Workflows:

High Level Models of AI / Machine Learning Applications - “The AI and ML Canvases”

The Structure of Machine Learning Workflows:

  1. Data Engineering
  2. Model Engineering
  3. Model Deployment

A Closer look:

Data: Data Engineering Pipelines

Model: Machine Learning Engineering Pipelines

Code: Deployment Pipelines

The (All Important) Principles of MLOps

Overview

Automation

Versioning

Experiments Tracking

Testing

Monitoring

The “ML Test Score” System

Reproducibility

Modularity

Metrics / ML Software Delivery Metrics (4 metrics from “accelerate”)

Summary of MLOps Principles and Best Practices

MLOps Infrastructure Stack

Key Sources:

  • Papers
    1. The Technical Debt of Machine Learning Systems
    2. The Machine Learning Model Checklist
  • Other Publications 1. 2.