Related papers: Elucidating Optimal Reward-Diversity Tradeoffs in Text-to-Image Diffusion Models

Elucidating Optimal Reward-Diversity Tradeoffs in Text-to-Image Diffusion Models

URL: http://arxiv.org/abs/2409.06493v1
Date: Mon, 9 Sep 2024 16:27:26 GMT
Title: Elucidating Optimal Reward-Diversity Tradeoffs in Text-to-Image Diffusion Models
Authors: Rohit Jena, Ali Taghibakhshi, Sahil Jain, Gerald Shen, Nima Tajbakhsh, Arash Vahdat,
Abstract summary: We introduce Annealed Importance Guidance (AIG), an inference-time regularization inspired by Annealed Importance Sampling. Our experiments demonstrate the benefits of AIG for Stable Diffusion models, striking the optimal balance between reward optimization and image diversity.
Score: 20.70550870149442
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Text-to-image (T2I) diffusion models have become prominent tools for generating high-fidelity images from text prompts. However, when trained on unfiltered internet data, these models can produce unsafe, incorrect, or stylistically undesirable images that are not aligned with human preferences. To address this, recent approaches have incorporated human preference datasets to fine-tune T2I models or to optimize reward functions that capture these preferences. Although effective, these methods are vulnerable to reward hacking, where the model overfits to the reward function, leading to a loss of diversity in the generated images. In this paper, we prove the inevitability of reward hacking and study natural regularization techniques like KL divergence and LoRA scaling, and their limitations for diffusion models. We also introduce Annealed Importance Guidance (AIG), an inference-time regularization inspired by Annealed Importance Sampling, which retains the diversity of the base model while achieving Pareto-Optimal reward-diversity tradeoffs. Our experiments demonstrate the benefits of AIG for Stable Diffusion models, striking the optimal balance between reward optimization and image diversity. Furthermore, a user study confirms that AIG improves diversity and quality of generated images across different model architectures and reward functions.

Related papers

DiverseGRPO: Mitigating Mode Collapse in Image Generation via Diversity-Aware GRPO [50.89703227426486]
Reinforcement learning (RL) improves image generation quality significantly by comparing the relative performance of images generated within the same group.<n>In the later stages of training, the model tends to produce homogenized outputs, lacking creativity and visual diversity.<n>This issue can be analyzed from both reward modeling and generation dynamics perspectives.
arXiv Detail & Related papers (2025-12-25T05:37:37Z)
Training-Free Generation of Diverse and High-Fidelity Images via Prompt Semantic Space Optimization [50.5332987313297]
We propose Token-Prompt embedding Space Optimization (TPSO), a training-free and model-agnostic module.<n>TPSO introduces learnable parameters to explore underrepresented regions of the token embedding space, reducing the tendency of the model to repeatedly generate samples from strong modes of the learned distribution.<n>In experiments on MS-COCO and three diffusion backbones, TPSO significantly enhances generative diversity, improving baseline performance from 1.10 to 4.18 points, without sacrificing image quality.
arXiv Detail & Related papers (2025-11-25T00:42:09Z)
DP$^2$O-SR: Direct Perceptual Preference Optimization for Real-World Image Super-Resolution [31.6824458800392]
We introduce a framework that aligns generative models with perceptual preferences without requiring costly human annotations.<n>We show that DP$2$O-SR significantly improves perceptual quality and generalizes well to real-world benchmarks.
arXiv Detail & Related papers (2025-10-21T17:43:23Z)
Ambient Diffusion Omni: Training Good Models with Bad Data [45.821861121026394]
We show how to use low-quality, synthetic, and out-of-distribution images to improve the quality of a diffusion model.<n>We present Ambient Omni, a principled framework to train diffusion models that can extract signal from all available images.
arXiv Detail & Related papers (2025-06-10T22:37:39Z)
ImageReFL: Balancing Quality and Diversity in Human-Aligned Diffusion Models [2.712399554918533]
Reward-based fine-tuning using models trained on human feedback improves alignment but often harms diversity, producing less varied outputs.<n>We introduce textitcombined generation, a novel sampling strategy that applies a reward-tuned diffusion model only in the later stages of the generation process.<n>Second, we propose textitImageReFL, a fine-tuning method that improves image diversity with minimal loss in quality by training on real images.
arXiv Detail & Related papers (2025-05-28T16:45:07Z)
Advancing Diffusion Models: Alias-Free Resampling and Enhanced Rotational Equivariance [0.0]
diffusion models are still challenged by model-induced artifacts and limited stability in image fidelity. We propose the integration of alias-free resampling layers into the UNet architecture of diffusion models. Our experimental results on benchmark datasets, including CIFAR-10, MNIST, and MNIST-M, reveal consistent gains in image quality.
arXiv Detail & Related papers (2024-11-14T04:23:28Z)
Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective [52.778766190479374]
Latent-based image generative models have achieved notable success in image generation tasks. Despite sharing the same latent space, autoregressive models significantly lag behind LDMs and MIMs in image generation. We propose a simple but effective discrete image tokenizer to stabilize the latent space for image generative modeling.
arXiv Detail & Related papers (2024-10-16T12:13:17Z)
DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception [66.88792390480343]
We propose DEEM, a simple but effective approach that utilizes the generative feedback of diffusion models to align the semantic distributions of the image encoder. DEEM exhibits enhanced robustness and a superior capacity to alleviate model hallucinations while utilizing fewer trainable parameters, less pre-training data, and a smaller base model size.
arXiv Detail & Related papers (2024-05-24T05:46:04Z)
RL for Consistency Models: Faster Reward Guided Text-to-Image Generation [15.238373471473645]
We propose a framework for fine-tuning consistency models viaReinforcement Learning (RL) Our framework, called Reinforcement Learning for Consistency Model (RLCM), frames the iterative inference process of a consistency model as an RL procedure. Comparing to RL finetuned diffusion models, RLCM trains significantly faster, improves the quality of the generation measured under the reward objectives, and speeds up the inference procedure by generating high quality images with as few as two inference steps.
arXiv Detail & Related papers (2024-03-25T15:40:22Z)
Fine-Tuning of Continuous-Time Diffusion Models as Entropy-Regularized Control [54.132297393662654]
Diffusion models excel at capturing complex data distributions, such as those of natural images and proteins. While diffusion models are trained to represent the distribution in the training dataset, we often are more concerned with other properties, such as the aesthetic quality of the generated images. We present theoretical and empirical evidence that demonstrates our framework is capable of efficiently generating diverse samples with high genuine rewards.
arXiv Detail & Related papers (2024-02-23T08:54:42Z)
Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation [59.184980778643464]
Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI) In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion) Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment.
arXiv Detail & Related papers (2024-02-15T18:59:18Z)
Large-scale Reinforcement Learning for Diffusion Models [30.164571425479824]
Text-to-image diffusion models are susceptible to implicit biases that arise from web-scale text-image training pairs. We present an effective scalable algorithm to improve diffusion models using Reinforcement Learning (RL) We show how our approach substantially outperforms existing methods for aligning diffusion models with human preferences.
arXiv Detail & Related papers (2024-01-20T08:10:43Z)
Secrets of RLHF in Large Language Models Part II: Reward Modeling [134.97964938009588]
We introduce a series of novel methods to mitigate the influence of incorrect and ambiguous preferences in the dataset. We also introduce contrastive learning to enhance the ability of reward models to distinguish between chosen and rejected responses.
arXiv Detail & Related papers (2024-01-11T17:56:59Z)
Aligning Text-to-Image Diffusion Models with Reward Backpropagation [62.45086888512723]
We propose AlignProp, a method that aligns diffusion models to downstream reward functions using end-to-end backpropagation of the reward gradient. We show AlignProp achieves higher rewards in fewer training steps than alternatives, while being conceptually simpler.
arXiv Detail & Related papers (2023-10-05T17:59:18Z)
A Bayesian Non-parametric Approach to Generative Models: Integrating Variational Autoencoder and Generative Adversarial Networks using Wasserstein and Maximum Mean Discrepancy [2.966338139852619]
Generative adversarial networks (GANs) and variational autoencoders (VAEs) are two of the most prominent and widely studied generative models. We employ a Bayesian non-parametric (BNP) approach to merge GANs and VAEs. By fusing the discriminative power of GANs with the reconstruction capabilities of VAEs, our novel model achieves superior performance in various generative tasks.
arXiv Detail & Related papers (2023-08-27T08:58:31Z)
ADIR: Adaptive Diffusion for Image Reconstruction [42.90778718695398]
Denoising diffusion models have recently achieved remarkable success in image generation, capturing rich information about natural image statistics.<n>We introduce a conditional sampling framework that leverages the powerful priors learned by diffusion models while enforcing consistency with the available measurements.<n>We employ LoRA-based adaptation using images that are semantically and visually similar to the degraded input, efficiently retrieved from a large and diverse dataset.
arXiv Detail & Related papers (2022-12-06T18:39:58Z)
Incorporating Reinforced Adversarial Learning in Autoregressive Image Generation [39.55651747758391]
We propose to use Reinforced Adversarial Learning (RAL) based on policy gradient optimization for autoregressive models. RAL also empowers the collaboration between different modules of the VQ-VAE framework. The proposed method achieves state-of-the-art results on Celeba for 64 $times$ 64 image resolution.
arXiv Detail & Related papers (2020-07-20T08:10:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.