FuguReport

Multi-Objective Learning for Diffusion Models: A Statistical Theory under Semi-Supervised Learning

Authors Ziheng Cheng, Yixiao Huang, Hanlin Zhu, Haoran Geng, Somayeh Sojoudi, Jitendra Malik, Pieter Abbeel, Xin Guo
Affiliations University of California, Berkeley
Categories Method / Multi-Objective Learning / Principled framework for diffusion models, Application / Semi-Supervised Learning / Training with limited paired data, Theory / Statistical Learning Theory / Generalization analysis for specialist models
License CC BY 4.0

Abstract Overview

This paper studies multi-objective learning for conditional diffusion models when paired data are limited but condition-only data are abundant. The authors formalize the problem through Pareto optimality across multiple target distributions and propose a two-stage semi-supervised procedure: train lightweight specialist models on scarce paired data, then generate pseudo-pairs to train a larger generalist model. The theory provides nonasymptotic generalization bounds for score matching and corresponding distribution-estimation guarantees, emphasizing how paired-sample requirements depend on specialist complexity rather than the larger generalist class. The analysis is also extended to diffusion policies in sequential decision making, where on-policy rollouts create distribution shift. Experiments on robotic manipulation and image restoration are used to check whether the predicted sample-efficiency advantages appear in practice.

Novelty

The distinctive contribution is a statistical theory for semi-supervised multi-objective learning in diffusion models, centered on a specialist-to-generalist training framework. The paper also extends this analysis to diffusion policies under distribution shift and presents what it describes as the first theoretical guarantee on the sub-optimality gap of diffusion policies in imitation learning.

Results

The main theoretical result is that, under the proposed two-stage procedure, the number of required paired samples scales with the complexity of the specialist classes, while abundant unlabeled conditions support training the larger generalist. For linear scalarizations, the paper derives sharper rates, and for diffusion policies it provides sub-optimality guarantees under on-policy distribution shift. Empirically, the semi-supervised method outperforms a labeled-only multi-task baseline across both robotics and image restoration settings, including stronger gains on some out-of-distribution robotics evaluations.

Key Points

  1. A two-stage semi-supervised pipeline trains per-objective specialist diffusion models from limited paired data and distills them into a generalist using pseudo-samples on abundant unlabeled conditions.
  2. The analysis gives generalization and total-variation guarantees showing that paired-sample complexity is tied to specialist model complexity rather than the larger generalist class, with improved rates for linear scalarizations.
  3. Experiments in robotic manipulation and CelebA-HQ inpainting show consistent improvements over a labeled-only multi-objective learning baseline, and the theory is further extended to diffusion policies facing rollout-induced distribution shift.

References

This page was created using generative AI such as GPT-5, Claude Opus 4, Gemini 3, Gemini 3.1 Flash Image, and their higher-end successor versions. No guarantee can be made regarding its contents.