Related papers: Refine-IQA: Multi-Stage Reinforcement Finetuning for Perceptual Image Quality Assessment

Refine-IQA: Multi-Stage Reinforcement Finetuning for Perceptual Image Quality Assessment

URL: http://arxiv.org/abs/2508.03763v1
Date: Mon, 04 Aug 2025 22:46:10 GMT
Title: Refine-IQA: Multi-Stage Reinforcement Finetuning for Perceptual Image Quality Assessment
Authors: Ziheng Jia, Jiaying Qian, Zicheng Zhang, Zijian Chen, Xiongkuo Min,
Abstract summary: Reinforcement fine-tuning (RFT) is a proliferating paradigm for LMM training.<n>We propose a multi-stage RFT IQA framework (-IQA)<n>The resulting Refine-IQA Series Models achieve outstanding performance on both perception and scoring tasks.
Score: 22.184690568393126
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reinforcement fine-tuning (RFT) is a proliferating paradigm for LMM training. Analogous to high-level reasoning tasks, RFT is similarly applicable to low-level vision domains, including image quality assessment (IQA). Existing RFT-based IQA methods typically use rule-based output rewards to verify the model's rollouts but provide no reward supervision for the "think" process, leaving its correctness and efficacy uncontrolled. Furthermore, these methods typically fine-tune directly on downstream IQA tasks without explicitly enhancing the model's native low-level visual quality perception, which may constrain its performance upper bound. In response to these gaps, we propose the multi-stage RFT IQA framework (Refine-IQA). In Stage-1, we build the Refine-Perception-20K dataset (with 12 main distortions, 20,907 locally-distorted images, and over 55K RFT samples) and design multi-task reward functions to strengthen the model's visual quality perception. In Stage-2, targeting the quality scoring task, we introduce a probability difference reward involved strategy for "think" process supervision. The resulting Refine-IQA Series Models achieve outstanding performance on both perception and scoring tasks-and, notably, our paradigm activates a robust "think" (quality interpreting) capability that also attains exceptional results on the corresponding quality interpreting benchmark.

Related papers

VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank [23.613534906344753]
We introduce VisualQuality-R1, a reasoning-induced no-reference IQA (NR-IQA) model.<n>We train it with reinforcement learning to rank, a learning algorithm tailored to the intrinsically relative nature of visual quality.<n>In experiments, VisualQuality-R1 consistently outperforms discriminative deep learning-based NR-IQA models.
arXiv Detail & Related papers (2025-05-20T14:56:50Z)
Q-Insight: Understanding Image Quality via Visual Reinforcement Learning [27.26829134776367]
Image quality assessment (IQA) focuses on the perceptual visual quality of images, playing a crucial role in downstream tasks such as image reconstruction, compression, and generation.<n>We propose Q-Insight, a reinforcement learning-based model built upon group relative policy optimization (GRPO)<n>We show that Q-Insight substantially outperforms existing state-of-the-art methods in both score regression and degradation perception tasks.
arXiv Detail & Related papers (2025-03-28T17:59:54Z)
IQPFR: An Image Quality Prior for Blind Face Restoration and Beyond [56.99331967165238]
Blind Face Restoration (BFR) addresses the challenge of reconstructing degraded low-quality (LQ) facial images into high-quality (HQ) outputs.<n>We propose a novel framework that incorporates an Image Quality Prior (IQP) derived from No-Reference Image Quality Assessment (NR-IQA) models.<n>Our method outperforms state-of-the-art techniques across multiple benchmarks.
arXiv Detail & Related papers (2025-03-12T11:39:51Z)
Teaching LMMs for Image Quality Scoring and Interpreting [71.1335005098584]
We propose Q-SiT (Quality Scoring and Interpreting joint Teaching), a unified framework that enables image quality scoring and interpreting simultaneously.<n>Q-SiT is the first model capable of simultaneously performing image quality scoring and interpreting tasks, along with its lightweight variant, Q-SiT-mini.<n> Experimental results demonstrate that Q-SiT achieves strong performance in both tasks with superior generalization IQA abilities.
arXiv Detail & Related papers (2025-03-12T09:39:33Z)
Boosting CLIP Adaptation for Image Quality Assessment via Meta-Prompt Learning and Gradient Regularization [55.09893295671917]
This paper introduces a novel Gradient-Regulated Meta-Prompt IQA Framework (GRMP-IQA) The GRMP-IQA comprises two key modules: Meta-Prompt Pre-training Module and Quality-Aware Gradient Regularization. Experiments on five standard BIQA datasets demonstrate the superior performance to the state-of-the-art BIQA methods under limited data setting.
arXiv Detail & Related papers (2024-09-09T07:26:21Z)
DP-IQA: Utilizing Diffusion Prior for Blind Image Quality Assessment in the Wild [73.6767681305851]
Blind image quality assessment (IQA) in the wild presents significant challenges.<n>Given the difficulty in collecting large-scale training data, leveraging limited data to develop a model with strong generalization remains an open problem.<n>Motivated by the robust image perception capabilities of pre-trained text-to-image (T2I) diffusion models, we propose a novel IQA method, diffusion priors-based IQA.
arXiv Detail & Related papers (2024-05-30T12:32:35Z)
When No-Reference Image Quality Models Meet MAP Estimation in Diffusion Latents [92.45867913876691]
No-reference image quality assessment (NR-IQA) models can effectively quantify perceived image quality.<n>We show that NR-IQA models can be plugged into the maximum a posteriori (MAP) estimation framework for image enhancement.
arXiv Detail & Related papers (2024-03-11T03:35:41Z)
A Lightweight Parallel Framework for Blind Image Quality Assessment [7.9562077122537875]
We propose a lightweight parallel framework (LPF) for blind image quality assessment (BIQA) First, we extract the visual features using a pre-trained feature extraction network. Furthermore, we construct a simple yet effective feature embedding network (FEN) to transform the visual features. We present two novel self-supervised subtasks, including a sample-level category prediction task and a batch-level quality comparison task.
arXiv Detail & Related papers (2024-02-19T10:56:58Z)
Learning Transformer Features for Image Quality Assessment [53.51379676690971]
We propose a unified IQA framework that utilizes CNN backbone and transformer encoder to extract features. The proposed framework is compatible with both FR and NR modes and allows for a joint training scheme.
arXiv Detail & Related papers (2021-12-01T13:23:00Z)
Norm-in-Norm Loss with Faster Convergence and Better Performance for Image Quality Assessment [20.288424566444224]
We explore normalization in the design of loss functions for image quality assessment (IQA) models. The resulting "Norm-in-Norm'' loss encourages the IQA model to make linear predictions with respect to subjective quality scores. Experiments on two relevant datasets show that, compared to MAE or MSE loss, the new loss enables the IQA model to converge about 10 times faster.
arXiv Detail & Related papers (2020-08-10T04:01:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.