LRPO: Enhancing Blind Face Restoration through Online Reinforcement Learning
- URL: http://arxiv.org/abs/2509.23339v1
- Date: Sat, 27 Sep 2025 14:42:29 GMT
- Title: LRPO: Enhancing Blind Face Restoration through Online Reinforcement Learning
- Authors: Bin Wu, Yahui Liu, Chi Zhang, Yao Zhao, Wei Wang,
- Abstract summary: Blind Face Restoration (BFR) encounters inherent challenges in exploring its large solution space.<n>We propose a Likelihood-Regularized Policy Optimization (LRPO) framework, the first to apply online reinforcement learning (RL) to the BFR task.<n>Our proposed LRPO significantly improves the face restoration quality over baseline methods and achieves state-of-the-art performance.
- Score: 54.51101908523586
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Blind Face Restoration (BFR) encounters inherent challenges in exploring its large solution space, leading to common artifacts like missing details and identity ambiguity in the restored images. To tackle these challenges, we propose a Likelihood-Regularized Policy Optimization (LRPO) framework, the first to apply online reinforcement learning (RL) to the BFR task. LRPO leverages rewards from sampled candidates to refine the policy network, increasing the likelihood of high-quality outputs while improving restoration performance on low-quality inputs. However, directly applying RL to BFR creates incompatibility issues, producing restoration results that deviate significantly from the ground truth. To balance perceptual quality and fidelity, we propose three key strategies: 1) a composite reward function tailored for face restoration assessment, 2) ground-truth guided likelihood regularization, and 3) noise-level advantage assignment. Extensive experiments demonstrate that our proposed LRPO significantly improves the face restoration quality over baseline methods and achieves state-of-the-art performance.
Related papers
- IRPO: Boosting Image Restoration via Post-training GRPO [59.588079259093035]
We propose IRPO, a low-level GRPO-based post-training paradigm.<n>We first explore a data formulation principle for low-level post-training paradigm.<n>We then model a reward-level criteria system that balances objective accuracy and human perceptual preference.
arXiv Detail & Related papers (2025-11-30T09:42:24Z) - Enhancing Diffusion-based Restoration Models via Difficulty-Adaptive Reinforcement Learning with IQA Reward [93.04811239892852]
Reinforcement Learning (RL) has recently been incorporated into diffusion models.<n>In this paper, we investigate how to effectively integrate RL into diffusion-based restoration models.
arXiv Detail & Related papers (2025-11-03T14:57:57Z) - Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback [59.078756231841574]
Critique-GRPO is an online RL framework that integrates both natural language and numerical feedback for effective policy optimization.<n>We show Critique-GRPO consistently outperforms supervised learning and RL-based fine-tuning methods across eight challenging mathematical, STEM, and general reasoning tasks.
arXiv Detail & Related papers (2025-06-03T17:39:02Z) - LAFR: Efficient Diffusion-based Blind Face Restoration via Latent Codebook Alignment Adapter [52.93785843453579]
Blind face restoration from low-quality (LQ) images is a challenging task that requires high-fidelity image reconstruction and the preservation of facial identity.<n>We propose LAFR, a novel codebook-based latent space adapter that aligns the latent distribution of LQ images with that of HQ counterparts.<n>We show that lightweight finetuning of diffusion prior on just 0.9% of FFHQ dataset is sufficient to achieve results comparable to state-of-the-art methods.
arXiv Detail & Related papers (2025-05-29T14:11:16Z) - IQPFR: An Image Quality Prior for Blind Face Restoration and Beyond [56.99331967165238]
Blind Face Restoration (BFR) addresses the challenge of reconstructing degraded low-quality (LQ) facial images into high-quality (HQ) outputs.<n>We propose a novel framework that incorporates an Image Quality Prior (IQP) derived from No-Reference Image Quality Assessment (NR-IQA) models.<n>Our method outperforms state-of-the-art techniques across multiple benchmarks.
arXiv Detail & Related papers (2025-03-12T11:39:51Z) - Beyond Human Preferences: Exploring Reinforcement Learning Trajectory Evaluation and Improvement through LLMs [12.572869123617783]
Reinforcement learning (RL) faces challenges in evaluating policy trajectories within intricate game tasks.
PbRL presents a pioneering framework that capitalizes on human preferences as pivotal reward signals.
We propose a LLM-enabled automatic preference generation framework named LLM4PG.
arXiv Detail & Related papers (2024-06-28T04:21:24Z) - Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards [26.40009657912622]
Reinforcement learning from human feedback (RLHF) is the mainstream paradigm used to align large language models (LLMs) with human preferences.
Yet existing RLHF heavily relies on accurate and informative reward models, which are vulnerable and sensitive to noise from various sources.
In this work, we improve the effectiveness of the reward model by introducing a penalty term on the reward, named as textitcontrastive rewards
arXiv Detail & Related papers (2024-03-12T14:51:57Z) - Implicit Subspace Prior Learning for Dual-Blind Face Restoration [66.67059961379923]
A novel implicit subspace prior learning (ISPL) framework is proposed as a generic solution to dual-blind face restoration.
Experimental results demonstrate significant perception-distortion improvement of ISPL against existing state-of-the-art methods.
arXiv Detail & Related papers (2020-10-12T08:04:24Z) - HiFaceGAN: Face Renovation via Collaborative Suppression and
Replenishment [63.333407973913374]
"Face Renovation"(FR) is a semantic-guided generation problem.
"HiFaceGAN" is a multi-stage framework containing several nested CSR units.
experiments on both synthetic and real face images have verified the superior performance of HiFaceGAN.
arXiv Detail & Related papers (2020-05-11T11:33:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.