Related papers: Rethinking Preference Alignment for Diffusion Models with Classifier-Free Guidance

Rethinking Preference Alignment for Diffusion Models with Classifier-Free Guidance

URL: http://arxiv.org/abs/2602.18799v1
Date: Sat, 21 Feb 2026 11:18:52 GMT
Title: Rethinking Preference Alignment for Diffusion Models with Classifier-Free Guidance
Authors: Zhou Jiang, Yandong Wen, Zhen Liu,
Abstract summary: We propose a simple method that improves alignment without retraining the base model.<n>To further enhance generalization, we decouple preference learning into two modules trained on positive and negative data.<n>We evaluate on Stable Diffusion 1.5 and Stable Diffusion XL with Pick-a-Pic v2 and HPDv3, showing consistent quantitative and qualitative gains.
Score: 8.038055165320195
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Aligning large-scale text-to-image diffusion models with nuanced human preferences remains challenging. While direct preference optimization (DPO) is simple and effective, large-scale finetuning often shows a generalization gap. We take inspiration from test-time guidance and cast preference alignment as classifier-free guidance (CFG): a finetuned preference model acts as an external control signal during sampling. Building on this view, we propose a simple method that improves alignment without retraining the base model. To further enhance generalization, we decouple preference learning into two modules trained on positive and negative data, respectively, and form a \emph{contrastive guidance} vector at inference by subtracting their predictions (positive minus negative), scaled by a user-chosen strength and added to the base prediction at each step. This yields a sharper and controllable alignment signal. We evaluate on Stable Diffusion 1.5 and Stable Diffusion XL with Pick-a-Pic v2 and HPDv3, showing consistent quantitative and qualitative gains.

Related papers

Taming Preference Mode Collapse via Directional Decoupling Alignment in Diffusion Reinforcement Learning [27.33241821967005]
We propose a novel framework that mitigates Preference Mode Collapse (PMC)<n>D$2$-Align achieves superior alignment with human preference.
arXiv Detail & Related papers (2025-12-30T11:17:52Z)
PC-Diffusion: Aligning Diffusion Models with Human Preferences via Preference Classifier [36.21450058652141]
We propose a novel framework for human preference alignment in diffusion models (PC-Diffusion)<n>PC-Diffusion uses a lightweight, trainable Preference that directly models the relative preference between samples.<n>We show that PC-Diffusion achieves comparable preference consistency to DPO while significantly reducing training costs and enabling efficient preference-guided generation.
arXiv Detail & Related papers (2025-11-11T03:53:06Z)
Learning Dynamics of VLM Finetuning [12.966077380225856]
Preference-based finetuning of vision-language models (VLMs) is brittle.<n>We introduce textbfCooling-Weighted DPO (CW-DPO), a two-stage recipe that explicitly models and exploits the training trajectory.<n>CW-DPO yields textbfmore stable optimization, textbfbetter calibration, and textbfhigher pairwise win-rates than SFT-only and vanilla DPO.
arXiv Detail & Related papers (2025-10-13T22:22:49Z)
GrAInS: Gradient-based Attribution for Inference-Time Steering of LLMs and VLMs [56.93583799109029]
GrAInS is an inference-time steering approach that operates across both language-only and vision-language models and tasks.<n>During inference, GrAInS hidden activations at transformer layers guided by token-level attribution signals, and normalizes activations to preserve representational scale.<n>It consistently outperforms both fine-tuning and existing steering baselines.
arXiv Detail & Related papers (2025-07-24T02:34:13Z)
Multi-Preference Lambda-weighted Listwise DPO for Small-Scale Model Alignment [5.276657230880984]
Large language models (LLMs) demonstrate strong generalization across a wide range of language tasks, but often generate outputs that misalign with human preferences.<n>Direct Optimization Preference (DPO) simplifies the process by treating alignment as a classification task over binary preference pairs.<n>We propose Multi-Preference Lambda-weighted Listwise DPO, which allows the model to learn from more detailed human feedback.<n>Our method consistently outperforms standard DPO on alignment while enabling efficient, controllable, and fine-grained adaptation suitable for real-world deployment.
arXiv Detail & Related papers (2025-06-24T16:47:17Z)
Normalized Attention Guidance: Universal Negative Guidance for Diffusion Models [57.20761595019967]
We present Normalized Attention Guidance (NAG), an efficient, training-free mechanism that applies extrapolation in attention space with L1-based normalization and refinement.<n>NAG restores effective negative guidance where CFG collapses while maintaining fidelity.<n>NAG generalizes across architectures (UNet, DiT), sampling regimes (few-step, multi-step), and modalities (image, video)
arXiv Detail & Related papers (2025-05-27T13:30:46Z)
Self-NPO: Negative Preference Optimization of Diffusion Models by Simply Learning from Itself without Explicit Preference Annotations [60.143658714894336]
Diffusion models have demonstrated remarkable success in various visual generation tasks, including image, video, and 3D content generation.<n> Preference optimization (PO) is a prominent and growing area of research that aims to align these models with human preferences.<n>We introduce Self-NPO, a Negative Preference Optimization approach that learns exclusively from the model itself.
arXiv Detail & Related papers (2025-05-17T01:03:46Z)
Diffusion-NPO: Negative Preference Optimization for Better Preference Aligned Generation of Diffusion Models [32.586647934400105]
We argue that existing preference alignment methods neglect the critical role of handling unconditional/negative-conditional outputs.<n>We propose a straightforward but versatile effective approach that involves training a model specifically attuned to negative preferences.<n>Our approach integrates seamlessly with models such as SD1.5, SDXL, video diffusion models and models that have undergone preference optimization.
arXiv Detail & Related papers (2025-05-16T13:38:23Z)
ADT: Tuning Diffusion Models with Adversarial Supervision [16.974169058917443]
Diffusion models have achieved outstanding image generation by reversing a forward noising process to approximate true data distributions.<n>We propose Adrial Diffusion Tuning (ADT) to stimulate the inference process during optimization and align the final outputs with training data.<n>ADT features a siamese-network discriminator with a fixed pre-trained backbone and lightweight trainable parameters.
arXiv Detail & Related papers (2025-04-15T17:37:50Z)
Calibrated Multi-Preference Optimization for Aligning Diffusion Models [90.15024547673785]
Calibrated Preference Optimization (CaPO) is a novel method to align text-to-image (T2I) diffusion models.<n>CaPO incorporates the general preference from multiple reward models without human annotated data.<n> Experimental results show that CaPO consistently outperforms prior methods.
arXiv Detail & Related papers (2025-02-04T18:59:23Z)
Refining Alignment Framework for Diffusion Models with Intermediate-Step Preference Ranking [50.325021634589596]
We propose a Tailored Optimization Preference (TailorPO) framework for aligning diffusion models with human preference.<n>Our approach directly ranks intermediate noisy samples based on their step-wise reward, and effectively resolves the gradient direction issues.<n> Experimental results demonstrate that our method significantly improves the model's ability to generate aesthetically pleasing and human-preferred images.
arXiv Detail & Related papers (2025-02-01T16:08:43Z)
Diffusion Model Alignment Using Direct Preference Optimization [103.2238655827797]
Diffusion-DPO is a method to align diffusion models to human preferences by directly optimizing on human comparison data. We fine-tune the base model of the state-of-the-art Stable Diffusion XL (SDXL)-1.0 model with Diffusion-DPO. We also develop a variant that uses AI feedback and has comparable performance to training on human preferences.
arXiv Detail & Related papers (2023-11-21T15:24:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.