Diffusion Alignment as Variational Expectation-Maximization
- URL: http://arxiv.org/abs/2510.00502v1
- Date: Wed, 01 Oct 2025 04:34:07 GMT
- Title: Diffusion Alignment as Variational Expectation-Maximization
- Authors: Jaewoo Lee, Minsu Kim, Sanghyeok Choi, Inhyuck Song, Sujin Yun, Hyeongyu Kang, Woocheol Shin, Taeyoung Yun, Kiyoung Om, Jinkyoo Park,
- Abstract summary: We introduce Diffusion Alignment as Variational Expectation-Maximization (DAV)<n>DAV formulates diffusion alignment as an iterative process alternating between two complementary phases: the E-step and the M-step.<n>We demonstrate that DAV can optimize reward while preserving diversity for both continuous and discrete tasks.
- Score: 32.10775492481573
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion alignment aims to optimize diffusion models for the downstream objective. While existing methods based on reinforcement learning or direct backpropagation achieve considerable success in maximizing rewards, they often suffer from reward over-optimization and mode collapse. We introduce Diffusion Alignment as Variational Expectation-Maximization (DAV), a framework that formulates diffusion alignment as an iterative process alternating between two complementary phases: the E-step and the M-step. In the E-step, we employ test-time search to generate diverse and reward-aligned samples. In the M-step, we refine the diffusion model using samples discovered by the E-step. We demonstrate that DAV can optimize reward while preserving diversity for both continuous and discrete tasks: text-to-image synthesis and DNA sequence design.
Related papers
- Discrete Diffusion Trajectory Alignment via Stepwise Decomposition [80.66701883088935]
We propose an offline preference optimization method to approach trajectory alignment for discrete diffusion models.<n>We decompose the problem into a set of stepwise alignment objectives by matching the per-step posterior.<n> Experiments across multiple domains including DNA sequence design, protein inverse folding, and language modeling consistently demonstrate the superiority of our approach.
arXiv Detail & Related papers (2025-07-07T09:52:56Z) - Reward-Guided Iterative Refinement in Diffusion Models at Test-Time with Applications to Protein and DNA Design [87.58981407469977]
We propose a novel framework for inference-time reward optimization with diffusion models inspired by evolutionary algorithms.<n>Our approach employs an iterative refinement process consisting of two steps in each iteration: noising and reward-guided denoising.
arXiv Detail & Related papers (2025-02-20T17:48:45Z) - Test-time Alignment of Diffusion Models without Reward Over-optimization [8.981605934618349]
Diffusion models excel in generative tasks, but aligning them with specific objectives remains challenging.<n>We propose a training-free, test-time method based on Sequential Monte Carlo (SMC) to sample from the reward-aligned target distribution.<n>We demonstrate its effectiveness in single-reward optimization, multi-objective scenarios, and online black-box optimization.
arXiv Detail & Related papers (2025-01-10T09:10:30Z) - Diverse Score Distillation [27.790458964072823]
We propose a score formulation that guides the optimization to follow generation paths defined by random initial seeds.<n>We showcase the applications of our Diverse Score Distillation' (DSD) formulation across tasks such as 2D optimization, text-based 3D inference, and single-view reconstruction.
arXiv Detail & Related papers (2024-12-09T18:59:02Z) - Aligning Few-Step Diffusion Models with Dense Reward Difference Learning [81.85515625591884]
Stepwise Diffusion Policy Optimization (SDPO) is an alignment method tailored for few-step diffusion models.
SDPO incorporates dense reward feedback at every intermediate step to ensure consistent alignment across all denoising steps.
SDPO consistently outperforms prior methods in reward-based alignment across diverse step configurations.
arXiv Detail & Related papers (2024-11-18T16:57:41Z) - Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization [97.35427957922714]
We present an algorithm named pairwise sample optimization (PSO), which enables the direct fine-tuning of an arbitrary timestep-distilled diffusion model.<n>PSO introduces additional reference images sampled from the current time-step distilled model, and increases the relative likelihood margin between the training images and reference images.<n>We show that PSO can directly adapt distilled models to human-preferred generation with both offline and online-generated pairwise preference image data.
arXiv Detail & Related papers (2024-10-04T07:05:16Z) - Variational Schrödinger Diffusion Models [14.480273869571468]
Schr"odinger bridge (SB) has emerged as the go-to method for optimizing transportation plans in diffusion models.<n>We leverage variational inference to linearize the forward score functions (variational scores) of SB.<n>We propose the variational Schr"odinger diffusion model (VSDM), where the forward process is a multivariate diffusion and the variational scores are adaptively optimized for efficient transport.
arXiv Detail & Related papers (2024-05-08T04:01:40Z) - Gaussian Mixture Solvers for Diffusion Models [84.83349474361204]
We introduce a novel class of SDE-based solvers called GMS for diffusion models.
Our solver outperforms numerous SDE-based solvers in terms of sample quality in image generation and stroke-based synthesis.
arXiv Detail & Related papers (2023-11-02T02:05:38Z) - Fast Diffusion Model [122.36693015093041]
Diffusion models (DMs) have been adopted across diverse fields with their abilities in capturing intricate data distributions.
In this paper, we propose a Fast Diffusion Model (FDM) to significantly speed up DMs from a DM optimization perspective.
arXiv Detail & Related papers (2023-06-12T09:38:04Z) - Protein Design with Guided Discrete Diffusion [67.06148688398677]
A popular approach to protein design is to combine a generative model with a discriminative model for conditional sampling.
We propose diffusioN Optimized Sampling (NOS), a guidance method for discrete diffusion models.
NOS makes it possible to perform design directly in sequence space, circumventing significant limitations of structure-based methods.
arXiv Detail & Related papers (2023-05-31T16:31:24Z) - Where to Diffuse, How to Diffuse, and How to Get Back: Automated
Learning for Multivariate Diffusions [22.04182099405728]
Diffusion-based generative models (DBGMs) perturb data to a target noise distribution and reverse this inference diffusion process to generate samples.
We show how to maximize a lower-bound on the likelihood for any number of auxiliary variables.
We then demonstrate how to parameterize the diffusion for a specified target noise distribution.
arXiv Detail & Related papers (2023-02-14T18:57:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.