MLMMI - Machine Learning Model Management and Inference (SS 2026)

This advanced course explores the system and data management challenges in managing and serving machine learning (ML) models. The lifecycle of an ML model extends far beyond training.

The course begins with recapping standard ML pipelines up to the training stage and then shifts focus to the post-training workloads. Students will be introduced to foundational frameworks for model management, where trained models are treated as core data artifacts. Topics include data management techniques and optimizations such as model selection, versioning, lineage tracking, metadata management, and model monitoring.

Building on these foundations, students will explore the architectures of modern model serving systems for both classical models and large language models (LLMs), along with performance optimizations such as dynamic batching, model compilation, and resource-efficient inference execution. Finally, the course discusses ML agents as an emerging workload.

Throughout the course, students will engage with state-of-the-art research on inference and model management. Through a combination of seminal research papers and hands-on projects, students will gain a comprehensive understanding of the entire ML lifecycle beyond training, preparing them for both academic research and real-world system design.

Where: E-N 189 (lectures) and MAR 0.001 (exercises)
When: Wednesday, 16:15 - 17:45 (lectures) and Monday, 16:15 (consultation with TA).
First lecture: April 15, 16:15

Note

The outline below is tentative. It will evolve as we teach.

Lecture 1: Intro and Logistics [April 15]

Recap of ML pipeline stages
Stratum system overview
Slides

Part 1: ML Deployment

Lecture 2: Model Management Systems [April 22]

Model and data versioning
Lineage tracing
Feature stores
Readings: ModelDB, ModelHub, DataHub, LIMA, FeathrPO
Slides

Lecture 3: ML Platforms and Deployment [April 29]

MLOps
End-to-end ML platforms and lifecycle management
Continuous retraining
Data validation
Readings: TFX, MODYN, AWS Deequ
Slides

Lecture 4: Model Selection Systems [May 06]

Model selection systems
Transfer learning
Readings: Hyperband, Cerebro, VISTA, SHiFT
Slides

Guest Lecture: Nils Strassenburg [May 13]

Slides

Part 2: ML Serving

Lecture 5: ML Serving Systems [May 20]

Traditional model serving
Serving optimizations
In-database ML serving
Readings: Clipper, Pretzel, TVM
Slides

Lecture 6: AI Agents and Compound AI Systems [May 27]

Compound AI systems and use cases
AI agents and use cases
Context memory management, tooling, and RAG
Readings: ReAct, MemGPT
Slides

Lecture 7: LLM Serving

Lecture 8: LLM Serving Optimizations

Programming Assignments

Lightweight Model Management System (task description | deadline: 04.05.2026)
Transfer Learning (task description | deadline: 18.05.2026)
Model Serving System (task description | deadline: 01.06.2026)

Projects

The majority of the course grade will come from a group project. The projects are designed to be challenging and research-oriented and will be implemented in Stratum, a system infrastructure for large-scale agent-centric ML workloads. Specific project topics will be announced in the coming weeks.

Paper: https://arxiv.org/pdf/2603.03589
Code: https://github.com/deem-data/stratum

Organization

Lecture: Dr.-Ing. Arnab Phani
Teaching Assistant: Ellias Strauss