Related papers: Enhancing Diffusion-based Restoration Models via Difficulty-Adaptive Reinforcement Learning with IQA Reward

Enhancing Diffusion-based Restoration Models via Difficulty-Adaptive Reinforcement Learning with IQA Reward

URL: http://arxiv.org/abs/2511.01645v1
Date: Mon, 03 Nov 2025 14:57:57 GMT
Title: Enhancing Diffusion-based Restoration Models via Difficulty-Adaptive Reinforcement Learning with IQA Reward
Authors: Xiaogang Xu, Ruihang Chu, Jian Wang, Kun Zhou, Wenjie Shu, Harry Yang, Ser-Nam Lim, Hao Chen, Liang Lin,
Abstract summary: Reinforcement Learning (RL) has recently been incorporated into diffusion models.<n>In this paper, we investigate how to effectively integrate RL into diffusion-based restoration models.
Score: 93.04811239892852
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement Learning (RL) has recently been incorporated into diffusion models, e.g., tasks such as text-to-image. However, directly applying existing RL methods to diffusion-based image restoration models is suboptimal, as the objective of restoration fundamentally differs from that of pure generation: it places greater emphasis on fidelity. In this paper, we investigate how to effectively integrate RL into diffusion-based restoration models. First, through extensive experiments with various reward functions, we find that an effective reward can be derived from an Image Quality Assessment (IQA) model, instead of intuitive ground-truth-based supervision, which has already been optimized during the Supervised Fine-Tuning (SFT) stage prior to RL. Moreover, our strategy focuses on using RL for challenging samples that are significantly distant from the ground truth, and our RL approach is innovatively implemented using MLLM-based IQA models to align distributions with high-quality images initially. As the samples approach the ground truth's distribution, RL is adaptively combined with SFT for more fine-grained alignment. This dynamic process is facilitated through an automatic weighting strategy that adjusts based on the relative difficulty of the training samples. Our strategy is plug-and-play that can be seamlessly applied to diffusion-based restoration models, boosting its performance across various restoration tasks. Extensive experiments across multiple benchmarks demonstrate the effectiveness of our proposed RL framework.

Related papers

Towards Stable and Effective Reinforcement Learning for Mixture-of-Experts [113.0656076371565]
We propose a novel router-aware approach to optimize importance sampling weights in off-policy reinforcement learning (RL)<n>Specifically, we design a rescaling strategy guided by router logits, which effectively reduces gradient variance and mitigates training divergence.<n> Experimental results demonstrate that our method significantly improves both the convergence stability and the final performance of MoE models.
arXiv Detail & Related papers (2025-10-27T05:47:48Z)
Plug-and-Play Prompt Refinement via Latent Feedback for Diffusion Model Alignment [54.17386822940477]
We introduce PromptLoop, a plug-and-play reinforcement learning framework that incorporates latent feedback into step-wise prompt refinement.<n>This design achieves a structural analogy to the Diffusion RL approach, while retaining the flexibility and generality of prompt-based alignment.
arXiv Detail & Related papers (2025-10-01T02:18:58Z)
Adversarial Diffusion for Robust Reinforcement Learning [46.44328012099217]
We introduce Adversarial Diffusion for Robust Reinforcement Learning (AD-RRL)<n>AD-RRL guides the diffusion process to generate worst-case trajectories during training, effectively optimizing the Conditional Value at Risk (CVaR) of the cumulative return.<n> Empirical results across standard benchmarks show that AD-RRL achieves superior robustness and performance compared to existing robust RL methods.
arXiv Detail & Related papers (2025-09-28T12:34:35Z)
Improving the Reasoning of Multi-Image Grounding in MLLMs via Reinforcement Learning [28.111812077758845]
Multimodal Large Language Models (MLLMs) excel at visual grounding in single-image scenarios with textual references.<n>However, their performance degrades when handling real-world applications that involve complex multi-image compositions and multi-modal instructions.<n>We adopt a Reinforcement Learning based post-training strategy to improve the reasoning of MLLMs in multi-image grounding tasks.
arXiv Detail & Related papers (2025-07-01T13:48:57Z)
Normalizing Flows are Capable Models for RL [24.876149287707847]
We propose a single Normalizing Flow architecture which integrates seamlessly into reinforcement learning algorithms.<n>Our approach leads to much simpler algorithms, and achieves higher performance in imitation learning, offline, goal conditioned RL and unsupervised RL.
arXiv Detail & Related papers (2025-05-29T15:06:22Z)
Self-Reflective Reinforcement Learning for Diffusion-based Image Reasoning Generation [24.247140501653547]
Diffusion models have recently demonstrated exceptional performance in image generation task.<n>We propose SRRL, a self-reflective RL algorithm for diffusion models to achieve reasoning generation of logical images.
arXiv Detail & Related papers (2025-05-28T14:37:21Z)
Degradation-Guided One-Step Image Super-Resolution with Diffusion Priors [75.24313405671433]
Diffusion-based image super-resolution (SR) methods have achieved remarkable success by leveraging large pre-trained text-to-image diffusion models as priors. We introduce a novel one-step SR model, which significantly addresses the efficiency issue of diffusion-based SR methods. Unlike existing fine-tuning strategies, we designed a degradation-guided Low-Rank Adaptation (LoRA) module specifically for SR.
arXiv Detail & Related papers (2024-09-25T16:15:21Z)
Deep Equilibrium Diffusion Restoration with Parallel Sampling [120.15039525209106]
Diffusion model-based image restoration (IR) aims to use diffusion models to recover high-quality (HQ) images from degraded images, achieving promising performance. Most existing methods need long serial sampling chains to restore HQ images step-by-step, resulting in expensive sampling time and high computation costs. In this work, we aim to rethink the diffusion model-based IR models through a different perspective, i.e., a deep equilibrium (DEQ) fixed point system, called DeqIR.
arXiv Detail & Related papers (2023-11-20T08:27:56Z)
Simplifying Model-based RL: Learning Representations, Latent-space Models, and Policies with One Objective [142.36200080384145]
We propose a single objective which jointly optimize a latent-space model and policy to achieve high returns while remaining self-consistent. We demonstrate that the resulting algorithm matches or improves the sample-efficiency of the best prior model-based and model-free RL methods.
arXiv Detail & Related papers (2022-09-18T03:51:58Z)
Sample-efficient Reinforcement Learning Representation Learning with Curiosity Contrastive Forward Dynamics Model [17.41484483119774]
This paper considers a learning framework for Curiosity Contrastive Forward Dynamics Model (CCFDM) in achieving a more sample-efficient reinforcement learning (RL) CCFDM incorporates a forward dynamics model (FDM) and performs contrastive learning to train its deep convolutional neural network-based image encoder (IE) During training, CCFDM provides intrinsic rewards, produced based on FDM prediction error, encourages the curiosity of the RL agent to improve exploration.
arXiv Detail & Related papers (2021-03-15T10:08:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.