Rewards Are Enough for Fast Photo-Realistic Text-to-image Generation
- URL: http://arxiv.org/abs/2503.13070v1
- Date: Mon, 17 Mar 2025 11:21:43 GMT
- Title: Rewards Are Enough for Fast Photo-Realistic Text-to-image Generation
- Authors: Yihong Luo, Tianyang Hu, Weijian Luo, Kenji Kawaguchi, Jing Tang,
- Abstract summary: We introduce R0, a novel conditional generation approach via regularized reward.<n>We train state-of-the-art few-step text-to-image generative models with R0 at scales.<n>Our results challenge the conventional wisdom of diffusion post-training and conditional generation.
- Score: 25.29877217341663
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Aligning generated images to complicated text prompts and human preferences is a central challenge in Artificial Intelligence-Generated Content (AIGC). With reward-enhanced diffusion distillation emerging as a promising approach that boosts controllability and fidelity of text-to-image models, we identify a fundamental paradigm shift: as conditions become more specific and reward signals stronger, the rewards themselves become the dominant force in generation. In contrast, the diffusion losses serve as an overly expensive form of regularization. To thoroughly validate our hypothesis, we introduce R0, a novel conditional generation approach via regularized reward maximization. Instead of relying on tricky diffusion distillation losses, R0 proposes a new perspective that treats image generations as an optimization problem in data space which aims to search for valid images that have high compositional rewards. By innovative designs of the generator parameterization and proper regularization techniques, we train state-of-the-art few-step text-to-image generative models with R0 at scales. Our results challenge the conventional wisdom of diffusion post-training and conditional generation by demonstrating that rewards play a dominant role in scenarios with complex conditions. We hope our findings can contribute to further research into human-centric and reward-centric generation paradigms across the broader field of AIGC. Code is available at https://github.com/Luo-Yihong/R0.
Related papers
- Harnessing Diffusion-Yielded Score Priors for Image Restoration [29.788482710572307]
Deep image restoration models aim to learn a mapping from degraded image space to natural image space.<n>Three major classes of methods have emerged, including MSE-based, GAN-based, and diffusion-based methods.<n>We propose a novel method, HYPIR, to address these challenges.
arXiv Detail & Related papers (2025-07-28T07:55:34Z) - Quick Bypass Mechanism of Zero-Shot Diffusion-Based Image Restoration [0.8192907805418583]
We propose a strategy that accelerates the denoising process by initializing from an intermediate approximation, effectively bypassing early denoising steps.<n>We validate proposed methods on ImageNet-1K and CelebAHQ across multiple image restoration tasks, e.g., super-resolution, deblurring, and compressed sensing.
arXiv Detail & Related papers (2025-07-06T01:36:27Z) - InstaRevive: One-Step Image Enhancement via Dynamic Score Matching [66.97989469865828]
InstaRevive is an image enhancement framework that employs score-based diffusion distillation to harness potent generative capability.<n>Our framework delivers high-quality and visually appealing results across a diverse array of challenging tasks and datasets.
arXiv Detail & Related papers (2025-04-22T01:19:53Z) - Learning to Sample Effective and Diverse Prompts for Text-to-Image Generation [34.08660401151558]
We focus on prompt adaptation, which refines the original prompt into model-preferred prompts to generate desired images.<n>We introduce textbfPrompt textbfAdaptation with textbfGFlowNets (textbfPAG), a novel approach that frames prompt adaptation as a probabilistic inference problem.
arXiv Detail & Related papers (2025-02-17T06:28:53Z) - Visual Autoregressive Modeling for Image Super-Resolution [14.935662351654601]
We propose a novel visual autoregressive modeling for ISR framework with the form of next-scale prediction.<n>We collect large-scale data and design a training process to obtain robust generative priors.
arXiv Detail & Related papers (2025-01-31T09:53:47Z) - Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step [77.86514804787622]
Chain-of-Thought (CoT) reasoning has been extensively explored in large models to tackle complex understanding tasks.<n>We provide the first comprehensive investigation of the potential of CoT reasoning to enhance autoregressive image generation.<n>We propose the Potential Assessment Reward Model (PARM) and PARM++, specialized for autoregressive image generation.
arXiv Detail & Related papers (2025-01-23T18:59:43Z) - TSD-SR: One-Step Diffusion with Target Score Distillation for Real-World Image Super-Resolution [25.994093587158808]
Pre-trained text-to-image diffusion models are increasingly applied to real-world image super-resolution (Real-ISR) tasks.<n>Given the iterative refinement nature of diffusion models, most existing approaches are computationally expensive.<n>We propose TSD-SR, a novel distillation framework specifically designed for real-world image super-resolution.
arXiv Detail & Related papers (2024-11-27T12:01:08Z) - Reward Incremental Learning in Text-to-Image Generation [26.64026346266299]
We present Reward Incremental Distillation (RID), a method that mitigates forgetting with minimal computational overhead.
The experimental results demonstrate the efficacy of RID in achieving consistent, high-quality gradient generation in RIL scenarios.
arXiv Detail & Related papers (2024-11-26T10:54:33Z) - InstantIR: Blind Image Restoration with Instant Generative Reference [10.703499573064537]
We introduce Instant-reference Image Restoration (InstantIR), a novel diffusion-based BIR method.
We first extract a compact representation of the input via a pre-trained vision encoder.
At each generation step, this representation is used to decode current diffusion latent and instantiate it in the generative prior.
The degraded image is then encoded with this reference, providing robust generation condition.
arXiv Detail & Related papers (2024-10-09T05:15:29Z) - Elucidating Optimal Reward-Diversity Tradeoffs in Text-to-Image Diffusion Models [20.70550870149442]
We introduce Annealed Importance Guidance (AIG), an inference-time regularization inspired by Annealed Importance Sampling.
Our experiments demonstrate the benefits of AIG for Stable Diffusion models, striking the optimal balance between reward optimization and image diversity.
arXiv Detail & Related papers (2024-09-09T16:27:26Z) - One Step Diffusion-based Super-Resolution with Time-Aware Distillation [60.262651082672235]
Diffusion-based image super-resolution (SR) methods have shown promise in reconstructing high-resolution images with fine details from low-resolution counterparts.
Recent techniques have been devised to enhance the sampling efficiency of diffusion-based SR models via knowledge distillation.
We propose a time-aware diffusion distillation method, named TAD-SR, to accomplish effective and efficient image super-resolution.
arXiv Detail & Related papers (2024-08-14T11:47:22Z) - RIGID: A Training-free and Model-Agnostic Framework for Robust AI-Generated Image Detection [60.960988614701414]
RIGID is a training-free and model-agnostic method for robust AI-generated image detection.
RIGID significantly outperforms existing trainingbased and training-free detectors.
arXiv Detail & Related papers (2024-05-30T14:49:54Z) - AddSR: Accelerating Diffusion-based Blind Super-Resolution with Adversarial Diffusion Distillation [42.34219615630592]
Blind super-resolution methods based on stable diffusion showcase formidable generative capabilities in reconstructing clear high-resolution images with intricate details from low-resolution inputs.<n>Their practical applicability is often hampered by poor efficiency, stemming from the requirement of thousands or hundreds of sampling steps.<n>Inspired by the efficient adversarial diffusion distillation (ADD), we designnameto address this issue by incorporating the ideas of both distillation and ControlNet.
arXiv Detail & Related papers (2024-04-02T08:07:38Z) - Efficient Diffusion Model for Image Restoration by Residual Shifting [63.02725947015132]
This study proposes a novel and efficient diffusion model for image restoration.
Our method avoids the need for post-acceleration during inference, thereby avoiding the associated performance deterioration.
Our method achieves superior or comparable performance to current state-of-the-art methods on three classical IR tasks.
arXiv Detail & Related papers (2024-03-12T05:06:07Z) - JoReS-Diff: Joint Retinex and Semantic Priors in Diffusion Model for Low-light Image Enhancement [69.6035373784027]
Low-light image enhancement (LLIE) has achieved promising performance by employing conditional diffusion models.
Previous methods may neglect the importance of a sufficient formulation of task-specific condition strategy.
We propose JoReS-Diff, a novel approach that incorporates Retinex- and semantic-based priors as the additional pre-processing condition.
arXiv Detail & Related papers (2023-12-20T08:05:57Z) - One-Step Diffusion Distillation via Deep Equilibrium Models [64.11782639697883]
We introduce a simple yet effective means of distilling diffusion models directly from initial noise to the resulting image.
Our method enables fully offline training with just noise/image pairs from the diffusion model.
We demonstrate that the DEQ architecture is crucial to this capability, as GET matches a $5times$ larger ViT in terms of FID scores.
arXiv Detail & Related papers (2023-12-12T07:28:40Z) - Iterative Token Evaluation and Refinement for Real-World
Super-Resolution [77.74289677520508]
Real-world image super-resolution (RWSR) is a long-standing problem as low-quality (LQ) images often have complex and unidentified degradations.
We propose an Iterative Token Evaluation and Refinement framework for RWSR.
We show that ITER is easier to train than Generative Adversarial Networks (GANs) and more efficient than continuous diffusion models.
arXiv Detail & Related papers (2023-12-09T17:07:32Z) - CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster
Image Generation [49.3016007471979]
Large generative diffusion models have revolutionized text-to-image generation and offer immense potential for conditional generation tasks.
However, their widespread adoption is hindered by the high computational cost, which limits their real-time application.
We introduce a novel method dubbed CoDi, that adapts a pre-trained latent diffusion model to accept additional image conditioning inputs.
arXiv Detail & Related papers (2023-10-02T17:59:18Z) - Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional
Image Synthesis [62.07413805483241]
Steered Diffusion is a framework for zero-shot conditional image generation using a diffusion model trained for unconditional generation.
We present experiments using steered diffusion on several tasks including inpainting, colorization, text-guided semantic editing, and image super-resolution.
arXiv Detail & Related papers (2023-09-30T02:03:22Z) - PGDiff: Guiding Diffusion Models for Versatile Face Restoration via
Partial Guidance [65.5618804029422]
Previous works have achieved noteworthy success by limiting the solution space using explicit degradation models.
We propose PGDiff by introducing partial guidance, a fresh perspective that is more adaptable to real-world degradations.
Our method not only outperforms existing diffusion-prior-based approaches but also competes favorably with task-specific models.
arXiv Detail & Related papers (2023-09-19T17:51:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.