A graduate-level course on modern generative modeling for visual domains — images, video, and 3D. The main focus is on diffusion models: their theoretical interpretations and the advanced training and sampling methods behind today's high-quality, fast generators. Special attention is given to few-step diffusion-based models used in production image/video services, and to recent advances in autoregressive visual generation and its integration with diffusion.
- Develop a deep understanding of leading visual generative paradigms.
- Learn novel, effective diffusion-based generative frameworks.
- Master the most recent practical techniques behind state-of-the-art generative models.
Each lecture comes with English slides (.pdf in this repo) and a recorded lecture/seminar (RU).
| # | Topic | Materials |
|---|---|---|
| 1 | Introduction to Diffusion Models — Denoising Diffusion Probabilistic Models (DDPMs) & Denoising Score Matching (DSM) | Slides · Lecture (RU) · Seminar (RU) |
| 2 | Continuous Diffusion Models — Probability Flow ODE and SDE formulations | Slides · Lecture (RU) |
| 3 | Flow Matching and its connection to diffusion models | Slides · Lecture (RU) |
| 4 | Efficient PF-ODE/SDE Solvers — Euler methods, DDIM, and DPM-Solver | Slides · Lecture (RU) |
| 5 | Diffusion Models in Practice — diffusion spaces, recent architectures, design choices, training & sampling techniques | Slides · Lecture (RU) · Seminar (RU) |
| 6 | Flow Map Models — learnable PF-ODE integrators for faster sampling (Consistency Models, MeanFlow) | Slides · Lecture (RU) · Supplementary |
| 7 | Distribution Matching for few-step generators (DMD, ADD, SwD, Drifting Models) | Slides · Lecture (RU) |
| 8 | Autoregressive Visual Generation — discrete tokenizers VQ-VAE/VQ-GAN, scale-wise models (VAR, Switti), continuous AR (MAR), diffusion as AR | Slides · Lecture (RU) |
| 9 | Video Generation — architectures, challenges, and AR video diffusion models | Slides · Lecture (RU) · Seminar (RU) |
| 10 | Efficient Diffusion Models — model-level optimizations (caching, sparse attention, quantization, …) | Slides · Lecture (RU) |
| 11 | Multimodal Generative Models — architectures, training setups, and conditioning in diffusion (ControlNet, IP-Adapter) | Slides · Lecture |
| 12 | 3D Generative Models — intro to 3D modeling and multi-view diffusion models | Slides · Lecture |
Homeworks live in assignments/. Each contains a starter notebook (and a task.pdf
/ theory.pdf where applicable); reference solutions are released separately.
| # | Topic | Starter |
|---|---|---|
| 1 | Intro to diffusion: DDPM & DSM | hw1/ — practice_template.ipynb, task.pdf |
| 2 | Efficient solvers: DPM-Solver | hw2/ — practice_dpm_solver.ipynb, theory.pdf |
| 3 | Flow Matching training | hw3/ — fm_training.ipynb |
| 4 | Flow Map Models | hw4/ — flow_map_models.ipynb |
| 5 | Distribution matching: ADD / MMD distillation | hw5/ — add_mmd_distillation.ipynb |
| 6 | Autoregressive generation: MAR with a flow-matching head | hw6/ — mar_fm_head.ipynb |
| 7 | AR video diffusion sampling | hw7/ — ar_video_diffusion_sampling.ipynb |
The full list of examinable topics is in
exam/exam_topics_visual_genai.md. The
exam/slot-booking-service/ is a small web app students use to book
exam slots (see its own README).
- Dmitry Baranchuk — Telegram @vernold
- Nikita Starodubcev — Telegram @nikitastariy
- Dmitry Baranchuk
- Nikita Starodubcev
- Denis Rakitin
- Denis Kuznedelev
- Ilya Drobyshevsky
- Ilya Sudakov
- Sergey Kastrulin
- Kirill Struminsky
- The introduction to diffusion models follows CS236 by Stefano Ermon.
- Some explanations are inspired by The Principles of Diffusion Models.
- Numerous papers and blog posts that led us to this course.
