ImageReFL: Balancing Quality and Diversity in Human-Aligned Diffusion Models
- URL: http://arxiv.org/abs/2505.22569v1
- Date: Wed, 28 May 2025 16:45:07 GMT
- Title: ImageReFL: Balancing Quality and Diversity in Human-Aligned Diffusion Models
- Authors: Dmitrii Sorokin, Maksim Nakhodnov, Andrey Kuznetsov, Aibek Alanov,
- Abstract summary: Reward-based fine-tuning using models trained on human feedback improves alignment but often harms diversity, producing less varied outputs.<n>We introduce textitcombined generation, a novel sampling strategy that applies a reward-tuned diffusion model only in the later stages of the generation process.<n>Second, we propose textitImageReFL, a fine-tuning method that improves image diversity with minimal loss in quality by training on real images.
- Score: 2.712399554918533
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in diffusion models have led to impressive image generation capabilities, but aligning these models with human preferences remains challenging. Reward-based fine-tuning using models trained on human feedback improves alignment but often harms diversity, producing less varied outputs. In this work, we address this trade-off with two contributions. First, we introduce \textit{combined generation}, a novel sampling strategy that applies a reward-tuned diffusion model only in the later stages of the generation process, while preserving the base model for earlier steps. This approach mitigates early-stage overfitting and helps retain global structure and diversity. Second, we propose \textit{ImageReFL}, a fine-tuning method that improves image diversity with minimal loss in quality by training on real images and incorporating multiple regularizers, including diffusion and ReFL losses. Our approach outperforms conventional reward tuning methods on standard quality and diversity metrics. A user study further confirms that our method better balances human preference alignment and visual diversity. The source code can be found at https://github.com/ControlGenAI/ImageReFL .
Related papers
- Harnessing Diffusion-Yielded Score Priors for Image Restoration [29.788482710572307]
Deep image restoration models aim to learn a mapping from degraded image space to natural image space.<n>Three major classes of methods have emerged, including MSE-based, GAN-based, and diffusion-based methods.<n>We propose a novel method, HYPIR, to address these challenges.
arXiv Detail & Related papers (2025-07-28T07:55:34Z) - One-Step Diffusion Model for Image Motion-Deblurring [85.76149042561507]
We propose a one-step diffusion model for deblurring (OSDD), a novel framework that reduces the denoising process to a single step.<n>To tackle fidelity loss in diffusion models, we introduce an enhanced variational autoencoder (eVAE), which improves structural restoration.<n>Our method achieves strong performance on both full and no-reference metrics.
arXiv Detail & Related papers (2025-03-09T09:39:57Z) - Boosting Alignment for Post-Unlearning Text-to-Image Generative Models [55.82190434534429]
Large-scale generative models have shown impressive image-generation capabilities, propelled by massive data.<n>This often inadvertently leads to the generation of harmful or inappropriate content and raises copyright concerns.<n>We propose a framework that seeks an optimal model update at each unlearning iteration, ensuring monotonic improvement on both objectives.
arXiv Detail & Related papers (2024-12-09T21:36:10Z) - MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling [64.09238330331195]
We propose a novel Multi-Modal Auto-Regressive (MMAR) probabilistic modeling framework.
Unlike discretization line of method, MMAR takes in continuous-valued image tokens to avoid information loss.
We show that MMAR demonstrates much more superior performance than other joint multi-modal models.
arXiv Detail & Related papers (2024-10-14T17:57:18Z) - Shielded Diffusion: Generating Novel and Diverse Images using Sparse Repellency [29.083402085790016]
We propose a method that coaxes the sampled trajectories of pretrained diffusion models to land on images that fall outside of a reference set.<n>We achieve this by adding repellency terms to the diffusion SDE throughout the generation trajectory.<n>We show that adding SPELL to popular diffusion models improves their diversity while impacting their FID only marginally, and performs comparatively better than other recent training-free diversity methods.
arXiv Detail & Related papers (2024-10-08T13:26:32Z) - Elucidating Optimal Reward-Diversity Tradeoffs in Text-to-Image Diffusion Models [20.70550870149442]
We introduce Annealed Importance Guidance (AIG), an inference-time regularization inspired by Annealed Importance Sampling.
Our experiments demonstrate the benefits of AIG for Stable Diffusion models, striking the optimal balance between reward optimization and image diversity.
arXiv Detail & Related papers (2024-09-09T16:27:26Z) - Enhancing Semantic Fidelity in Text-to-Image Synthesis: Attention
Regulation in Diffusion Models [23.786473791344395]
Cross-attention layers in diffusion models tend to disproportionately focus on certain tokens during the generation process.
We introduce attention regulation, an on-the-fly optimization approach at inference time to align attention maps with the input text prompt.
Experiment results show that our method consistently outperforms other baselines.
arXiv Detail & Related papers (2024-03-11T02:18:27Z) - Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation [59.184980778643464]
Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI)
In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion)
Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment.
arXiv Detail & Related papers (2024-02-15T18:59:18Z) - Large-scale Reinforcement Learning for Diffusion Models [30.164571425479824]
Text-to-image diffusion models are susceptible to implicit biases that arise from web-scale text-image training pairs.
We present an effective scalable algorithm to improve diffusion models using Reinforcement Learning (RL)
We show how our approach substantially outperforms existing methods for aligning diffusion models with human preferences.
arXiv Detail & Related papers (2024-01-20T08:10:43Z) - Conditional Image Generation with Pretrained Generative Model [1.4685355149711303]
diffusion models have gained popularity for their ability to generate higher-quality images in comparison to GAN models.
These models require a huge amount of data, computational resources, and meticulous tuning for successful training.
We propose methods to leverage pre-trained unconditional diffusion models with additional guidance for the purpose of conditional image generative.
arXiv Detail & Related papers (2023-12-20T18:27:53Z) - PGDiff: Guiding Diffusion Models for Versatile Face Restoration via
Partial Guidance [65.5618804029422]
Previous works have achieved noteworthy success by limiting the solution space using explicit degradation models.
We propose PGDiff by introducing partial guidance, a fresh perspective that is more adaptable to real-world degradations.
Our method not only outperforms existing diffusion-prior-based approaches but also competes favorably with task-specific models.
arXiv Detail & Related papers (2023-09-19T17:51:33Z) - SinDiffusion: Learning a Diffusion Model from a Single Natural Image [159.4285444680301]
We present SinDiffusion, leveraging denoising diffusion models to capture internal distribution of patches from a single natural image.
It is based on two core designs. First, SinDiffusion is trained with a single model at a single scale instead of multiple models with progressive growing of scales.
Second, we identify that a patch-level receptive field of the diffusion network is crucial and effective for capturing the image's patch statistics.
arXiv Detail & Related papers (2022-11-22T18:00:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.