The ML-Ops Interview Guide
Definition of MLOps:
Definition: Machine learning operations (machine learning operations management) is a dicipline concerned with the design, construction and management of reproducible, testable, and evolvable ML-powered software.
Goals of MLOps:
- Unify the release cycle for machine learning and software applications.
- Enable automated testing of machine learning artifacts (e.g. data validation, ML model testing, and ML model integration testing)
- Apply agile principles to machine learning projects.
- Integrate machine learning models and datasets into CI/CD systems
- Reduce technical debt across machine learning models.
- Do so in a principled manner that is language, framework, platform, and infrastructure agnostic.
Motivation for ML Ops:
- Deployment Gap
- Scenarios Necessitating Management
the History of MLOps:
A High Level View of Machine Learning Workflows:
High Level Models of AI / Machine Learning Applications - “The AI and ML Canvases”
The Structure of Machine Learning Workflows:
- Data Engineering
- Model Engineering
- Model Deployment
A Closer look:
Data: Data Engineering Pipelines
Model: Machine Learning Engineering Pipelines
Code: Deployment Pipelines
The (All Important) Principles of MLOps
Overview
Automation
Versioning
Experiments Tracking
Testing
Monitoring
The “ML Test Score” System
Reproducibility
Modularity
Metrics / ML Software Delivery Metrics (4 metrics from “accelerate”)
Summary of MLOps Principles and Best Practices
MLOps Infrastructure Stack
Key Sources:
- Papers
- The Technical Debt of Machine Learning Systems
- The Machine Learning Model Checklist
- Other Publications 1. 2.