Multi-Objective Learning for Diffusion Models: A Statistical Theory under Semi-Supervised Learning
Abstract Overview
This paper studies multi-objective learning for conditional diffusion models when paired data are limited but condition-only data are abundant. The authors formalize the problem through Pareto optimality across multiple target distributions and propose a two-stage semi-supervised procedure: train lightweight specialist models on scarce paired data, then generate pseudo-pairs to train a larger generalist model. The theory provides nonasymptotic generalization bounds for score matching and corresponding distribution-estimation guarantees, emphasizing how paired-sample requirements depend on specialist complexity rather than the larger generalist class. The analysis is also extended to diffusion policies in sequential decision making, where on-policy rollouts create distribution shift. Experiments on robotic manipulation and image restoration are used to check whether the predicted sample-efficiency advantages appear in practice.
Novelty
The distinctive contribution is a statistical theory for semi-supervised multi-objective learning in diffusion models, centered on a specialist-to-generalist training framework. The paper also extends this analysis to diffusion policies under distribution shift and presents what it describes as the first theoretical guarantee on the sub-optimality gap of diffusion policies in imitation learning.
Results
The main theoretical result is that, under the proposed two-stage procedure, the number of required paired samples scales with the complexity of the specialist classes, while abundant unlabeled conditions support training the larger generalist. For linear scalarizations, the paper derives sharper rates, and for diffusion policies it provides sub-optimality guarantees under on-policy distribution shift. Empirically, the semi-supervised method outperforms a labeled-only multi-task baseline across both robotics and image restoration settings, including stronger gains on some out-of-distribution robotics evaluations.
Key Points
- A two-stage semi-supervised pipeline trains per-objective specialist diffusion models from limited paired data and distills them into a generalist using pseudo-samples on abundant unlabeled conditions.
- The analysis gives generalization and total-variation guarantees showing that paired-sample complexity is tied to specialist model complexity rather than the larger generalist class, with improved rates for linear scalarizations.
- Experiments in robotic manipulation and CelebA-HQ inpainting show consistent improvements over a labeled-only multi-objective learning baseline, and the theory is further extended to diffusion policies facing rollout-induced distribution shift.