MLOps
Also known as: ML operations, machine learning operations, LLMOps
MLOps is the practice of operating machine learning systems in production reliably and repeatably — covering training, deployment, monitoring, retraining, and governance, analogous to DevOps for software.
Detailed explanation
MLOps brings software engineering and operations rigor to machine learning. It covers reproducible training pipelines, model registries, deployment patterns (online, batch, edge), feature stores, monitoring for accuracy and drift, automated retraining, and governance for model changes.
Mature MLOps practices treat models like any other production artifact: versioned, tested, observable, and reversible. Key tooling categories include experiment tracking, pipeline orchestration, model serving, feature platforms, evaluation frameworks, and ML observability.
For LLM systems, MLOps extends to prompt and eval management, RAG pipeline operations, and tool/agent observability — sometimes called LLMOps. The core principles are the same: ship safely, monitor honestly, roll back fast.