Reward-Instruct: A Reward-Centric Approach to Fast Photo-Realistic Image Generation
- URL: http://arxiv.org/abs/2503.13070v2
- Date: Mon, 09 Jun 2025 03:23:04 GMT
- Title: Reward-Instruct: A Reward-Centric Approach to Fast Photo-Realistic Image Generation
- Authors: Yihong Luo, Tianyang Hu, Weijian Luo, Kenji Kawaguchi, Jing Tang,
- Abstract summary: This paper addresses the challenge of achieving high-quality and fast image generation that aligns with complex human preferences.<n>We introduce Reward-Instruct, a novel and surprisingly simple reward-centric approach for converting pre-trained base diffusion models into reward-enhanced few-step generators.<n>Our experiments on text-to-image generation have demonstrated that Reward-Instruct achieves state-of-the-art results in visual quality and quantitative metrics.
- Score: 25.29877217341663
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper addresses the challenge of achieving high-quality and fast image generation that aligns with complex human preferences. While recent advancements in diffusion models and distillation have enabled rapid generation, the effective integration of reward feedback for improved abilities like controllability and preference alignment remains a key open problem. Existing reward-guided post-training approaches targeting accelerated few-step generation often deem diffusion distillation losses indispensable. However, in this paper, we identify an interesting yet fundamental paradigm shift: as conditions become more specific, well-designed reward functions emerge as the primary driving force in training strong, few-step image generative models. Motivated by this insight, we introduce Reward-Instruct, a novel and surprisingly simple reward-centric approach for converting pre-trained base diffusion models into reward-enhanced few-step generators. Unlike existing methods, Reward-Instruct does not rely on expensive yet tricky diffusion distillation losses. Instead, it iteratively updates the few-step generator's parameters by directly sampling from a reward-tilted parameter distribution. Such a training approach entirely bypasses the need for expensive diffusion distillation losses, making it favorable to scale in high image resolutions. Despite its simplicity, Reward-Instruct yields surprisingly strong performance. Our extensive experiments on text-to-image generation have demonstrated that Reward-Instruct achieves state-of-the-art results in visual quality and quantitative metrics compared to distillation-reliant methods, while also exhibiting greater robustness to the choice of reward function.
Related papers
- Flash-DMD: Towards High-Fidelity Few-Step Image Generation with Efficient Distillation and Joint Reinforcement Learning [32.32567390728913]
Diffusion Models have emerged as a leading class of generative models.<n>Timestep distillation is a promising technique to accelerate generation, but it often requires extensive training and leads to image quality degradation.<n>We introduce Flash-DMD, a novel framework that enables fast convergence with distillation and joint RL-based refinement.
arXiv Detail & Related papers (2025-11-25T17:47:11Z) - Harnessing Diffusion-Yielded Score Priors for Image Restoration [29.788482710572307]
Deep image restoration models aim to learn a mapping from degraded image space to natural image space.<n>Three major classes of methods have emerged, including MSE-based, GAN-based, and diffusion-based methods.<n>We propose a novel method, HYPIR, to address these challenges.
arXiv Detail & Related papers (2025-07-28T07:55:34Z) - Quick Bypass Mechanism of Zero-Shot Diffusion-Based Image Restoration [0.8192907805418583]
We propose a strategy that accelerates the denoising process by initializing from an intermediate approximation, effectively bypassing early denoising steps.<n>We validate proposed methods on ImageNet-1K and CelebAHQ across multiple image restoration tasks, e.g., super-resolution, deblurring, and compressed sensing.
arXiv Detail & Related papers (2025-07-06T01:36:27Z) - InstaRevive: One-Step Image Enhancement via Dynamic Score Matching [66.97989469865828]
InstaRevive is an image enhancement framework that employs score-based diffusion distillation to harness potent generative capability.<n>Our framework delivers high-quality and visually appealing results across a diverse array of challenging tasks and datasets.
arXiv Detail & Related papers (2025-04-22T01:19:53Z) - Learning to Sample Effective and Diverse Prompts for Text-to-Image Generation [34.08660401151558]
We focus on prompt adaptation, which refines the original prompt into model-preferred prompts to generate desired images.<n>We introduce textbfPrompt textbfAdaptation with textbfGFlowNets (textbfPAG), a novel approach that frames prompt adaptation as a probabilistic inference problem.
arXiv Detail & Related papers (2025-02-17T06:28:53Z) - Visual Autoregressive Modeling for Image Super-Resolution [14.935662351654601]
We propose a novel visual autoregressive modeling for ISR framework with the form of next-scale prediction.<n>We collect large-scale data and design a training process to obtain robust generative priors.
arXiv Detail & Related papers (2025-01-31T09:53:47Z) - Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step [77.86514804787622]
Chain-of-Thought (CoT) reasoning has been extensively explored in large models to tackle complex understanding tasks.<n>We provide the first comprehensive investigation of the potential of CoT reasoning to enhance autoregressive image generation.<n>We propose the Potential Assessment Reward Model (PARM) and PARM++, specialized for autoregressive image generation.
arXiv Detail & Related papers (2025-01-23T18:59:43Z) - TSD-SR: One-Step Diffusion with Target Score Distillation for Real-World Image Super-Resolution [25.994093587158808]
Pre-trained text-to-image diffusion models are increasingly applied to real-world image super-resolution (Real-ISR) tasks.<n>Given the iterative refinement nature of diffusion models, most existing approaches are computationally expensive.<n>We propose TSD-SR, a novel distillation framework specifically designed for real-world image super-resolution.
arXiv Detail & Related papers (2024-11-27T12:01:08Z) - Reward Incremental Learning in Text-to-Image Generation [26.64026346266299]
We present Reward Incremental Distillation (RID), a method that mitigates forgetting with minimal computational overhead.
The experimental results demonstrate the efficacy of RID in achieving consistent, high-quality gradient generation in RIL scenarios.
arXiv Detail & Related papers (2024-11-26T10:54:33Z) - InstantIR: Blind Image Restoration with Instant Generative Reference [10.703499573064537]
We introduce Instant-reference Image Restoration (InstantIR), a novel diffusion-based BIR method.
We first extract a compact representation of the input via a pre-trained vision encoder.
At each generation step, this representation is used to decode current diffusion latent and instantiate it in the generative prior.
The degraded image is then encoded with this reference, providing robust generation condition.
arXiv Detail & Related papers (2024-10-09T05:15:29Z) - Elucidating Optimal Reward-Diversity Tradeoffs in Text-to-Image Diffusion Models [20.70550870149442]
We introduce Annealed Importance Guidance (AIG), an inference-time regularization inspired by Annealed Importance Sampling.
Our experiments demonstrate the benefits of AIG for Stable Diffusion models, striking the optimal balance between reward optimization and image diversity.
arXiv Detail & Related papers (2024-09-09T16:27:26Z) - One Step Diffusion-based Super-Resolution with Time-Aware Distillation [60.262651082672235]
Diffusion-based image super-resolution (SR) methods have shown promise in reconstructing high-resolution images with fine details from low-resolution counterparts.
Recent techniques have been devised to enhance the sampling efficiency of diffusion-based SR models via knowledge distillation.
We propose a time-aware diffusion distillation method, named TAD-SR, to accomplish effective and efficient image super-resolution.
arXiv Detail & Related papers (2024-08-14T11:47:22Z) - RIGID: A Training-free and Model-Agnostic Framework for Robust AI-Generated Image Detection [60.960988614701414]
RIGID is a training-free and model-agnostic method for robust AI-generated image detection.
RIGID significantly outperforms existing trainingbased and training-free detectors.
arXiv Detail & Related papers (2024-05-30T14:49:54Z) - AddSR: Accelerating Diffusion-based Blind Super-Resolution with Adversarial Diffusion Distillation [42.34219615630592]
Blind super-resolution methods based on stable diffusion showcase formidable generative capabilities in reconstructing clear high-resolution images with intricate details from low-resolution inputs.<n>Their practical applicability is often hampered by poor efficiency, stemming from the requirement of thousands or hundreds of sampling steps.<n>Inspired by the efficient adversarial diffusion distillation (ADD), we designnameto address this issue by incorporating the ideas of both distillation and ControlNet.
arXiv Detail & Related papers (2024-04-02T08:07:38Z) - Efficient Diffusion Model for Image Restoration by Residual Shifting [63.02725947015132]
This study proposes a novel and efficient diffusion model for image restoration.
Our method avoids the need for post-acceleration during inference, thereby avoiding the associated performance deterioration.
Our method achieves superior or comparable performance to current state-of-the-art methods on three classical IR tasks.
arXiv Detail & Related papers (2024-03-12T05:06:07Z) - JoReS-Diff: Joint Retinex and Semantic Priors in Diffusion Model for Low-light Image Enhancement [69.6035373784027]
Low-light image enhancement (LLIE) has achieved promising performance by employing conditional diffusion models.
Previous methods may neglect the importance of a sufficient formulation of task-specific condition strategy.
We propose JoReS-Diff, a novel approach that incorporates Retinex- and semantic-based priors as the additional pre-processing condition.
arXiv Detail & Related papers (2023-12-20T08:05:57Z) - One-Step Diffusion Distillation via Deep Equilibrium Models [64.11782639697883]
We introduce a simple yet effective means of distilling diffusion models directly from initial noise to the resulting image.
Our method enables fully offline training with just noise/image pairs from the diffusion model.
We demonstrate that the DEQ architecture is crucial to this capability, as GET matches a $5times$ larger ViT in terms of FID scores.
arXiv Detail & Related papers (2023-12-12T07:28:40Z) - Iterative Token Evaluation and Refinement for Real-World
Super-Resolution [77.74289677520508]
Real-world image super-resolution (RWSR) is a long-standing problem as low-quality (LQ) images often have complex and unidentified degradations.
We propose an Iterative Token Evaluation and Refinement framework for RWSR.
We show that ITER is easier to train than Generative Adversarial Networks (GANs) and more efficient than continuous diffusion models.
arXiv Detail & Related papers (2023-12-09T17:07:32Z) - CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster
Image Generation [49.3016007471979]
Large generative diffusion models have revolutionized text-to-image generation and offer immense potential for conditional generation tasks.
However, their widespread adoption is hindered by the high computational cost, which limits their real-time application.
We introduce a novel method dubbed CoDi, that adapts a pre-trained latent diffusion model to accept additional image conditioning inputs.
arXiv Detail & Related papers (2023-10-02T17:59:18Z) - Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional
Image Synthesis [62.07413805483241]
Steered Diffusion is a framework for zero-shot conditional image generation using a diffusion model trained for unconditional generation.
We present experiments using steered diffusion on several tasks including inpainting, colorization, text-guided semantic editing, and image super-resolution.
arXiv Detail & Related papers (2023-09-30T02:03:22Z) - PGDiff: Guiding Diffusion Models for Versatile Face Restoration via
Partial Guidance [65.5618804029422]
Previous works have achieved noteworthy success by limiting the solution space using explicit degradation models.
We propose PGDiff by introducing partial guidance, a fresh perspective that is more adaptable to real-world degradations.
Our method not only outperforms existing diffusion-prior-based approaches but also competes favorably with task-specific models.
arXiv Detail & Related papers (2023-09-19T17:51:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.