Related papers: PreResQ-R1: Towards Fine-Grained Rank-and-Score Reinforcement Learning for Visual Quality Assessment via Preference-Response Disentangled Policy Optimization

PreResQ-R1: Towards Fine-Grained Rank-and-Score Reinforcement Learning for Visual Quality Assessment via Preference-Response Disentangled Policy Optimization

URL: http://arxiv.org/abs/2511.05393v1
Date: Fri, 07 Nov 2025 16:19:50 GMT
Title: PreResQ-R1: Towards Fine-Grained Rank-and-Score Reinforcement Learning for Visual Quality Assessment via Preference-Response Disentangled Policy Optimization
Authors: Zehui Feng, Tian Qiu, Tong Wu, Junxuan Li, Huayuan Xu, Ting Han,
Abstract summary: PreResQ-R1 is a Preference-Response Disentangled Reinforcement Learning framework.<n>It unifies absolute score regression and relative ranking consistency within a single reasoning-driven optimization scheme.<n>It achieves state-of-the-art results across 10 IQA and 5 VQA benchmarks under both SRCC and PLCC metrics.
Score: 12.993619998545633
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Visual Quality Assessment (QA) seeks to predict human perceptual judgments of visual fidelity. While recent multimodal large language models (MLLMs) show promise in reasoning about image and video quality, existing approaches mainly rely on supervised fine-tuning or rank-only objectives, resulting in shallow reasoning, poor score calibration, and limited cross-domain generalization. We propose PreResQ-R1, a Preference-Response Disentangled Reinforcement Learning framework that unifies absolute score regression and relative ranking consistency within a single reasoning-driven optimization scheme. Unlike prior QA methods, PreResQ-R1 introduces a dual-branch reward formulation that separately models intra-sample response coherence and inter-sample preference alignment, optimized via Group Relative Policy Optimization (GRPO). This design encourages fine-grained, stable, and interpretable chain-of-thought reasoning about perceptual quality. To extend beyond static imagery, we further design a global-temporal and local-spatial data flow strategy for Video Quality Assessment. Remarkably, with reinforcement fine-tuning on only 6K images and 28K videos, PreResQ-R1 achieves state-of-the-art results across 10 IQA and 5 VQA benchmarks under both SRCC and PLCC metrics, surpassing by margins of 5.30% and textbf2.15% in IQA task, respectively. Beyond quantitative gains, it produces human-aligned reasoning traces that reveal the perceptual cues underlying quality judgments. Code and model are available.

Related papers

Q-Hawkeye: Reliable Visual Policy Optimization for Image Quality Assessment [25.916354359994624]
We propose Q-Hawkeye, an RL-based reliable visual policy optimization framework.<n>Q-Hawkeye estimates predictive uncertainty using the variance of predicted scores across multiple rollouts.<n>We introduce an Implicit Perception Loss that constrains the model to ground its quality judgments in genuine visual evidence.
arXiv Detail & Related papers (2026-01-30T12:42:32Z)
Q-Save: Towards Scoring and Attribution for Generated Video Evaluation [65.83319736145869]
We present Q-Save, a new benchmark dataset and model for holistic evaluation of AI-generated video (AIGV) quality.<n>The dataset contains near 10000 videos, each annotated with a scalar mean opinion score (MOS) and fine-grained attribution labels.<n>We propose a unified evaluation model that jointly performs quality scoring and attribution-based explanation.
arXiv Detail & Related papers (2025-11-24T07:00:21Z)
OmniQuality-R: Advancing Reward Models Through All-Encompassing Quality Assessment [55.59322229889159]
We propose OmniQuality-R, a unified reward modeling framework that transforms multi-task quality reasoning into continuous and interpretable reward signals.<n>We use a reasoning-enhanced reward modeling dataset to form a reliable chain-of-thought dataset for supervised fine-tuning.<n>We evaluate OmniQuality-R on three key IQA tasks: aesthetic quality assessment, technical quality evaluation, and text-image alignment.
arXiv Detail & Related papers (2025-10-12T13:46:28Z)
Unlocking the Essence of Beauty: Advanced Aesthetic Reasoning with Relative-Absolute Policy Optimization [63.169050703903515]
We propose Aes-R1, a comprehensive aesthetic reasoning framework with reinforcement learning (RL)<n>Aes-R1 integrates a pipeline, AesCoT, to construct and filter high-quality chain-of-thought aesthetic reasoning data.<n>Experiments demonstrate that Aes-R1 improves the backbone's average PLCC/SRCC by 47.9%/34.8%.
arXiv Detail & Related papers (2025-09-26T04:55:00Z)
Perception-Consistency Multimodal Large Language Models Reasoning via Caption-Regularized Policy Optimization [72.30168853571216]
multimodal large language models excel at tasks that integrate visual perception with symbolic reasoning.<n>CapPO integrates two key mechanisms: (1) a caption-based consistency regularization, which minimizes the divergence between responses conditioned on raw images and those conditioned on captions, and (2) a KL-weighted advantage estimation scheme, which adaptively scales reinforcement signals to strengthen perceptually consistent trajectories.
arXiv Detail & Related papers (2025-09-26T04:32:26Z)
HiRQA: Hierarchical Ranking and Quality Alignment for Opinion-Unaware Image Quality Assessment [10.761579471650771]
HiRQA is a self-supervised, opinion-unaware framework that offers a hierarchical, quality-aware embedding through a combination of ranking and contrastive learning.<n>For real-time deployment, we introduce textbfHiRQA-S, a lightweight variant with an inference time of only 3.5 ms per image.
arXiv Detail & Related papers (2025-08-20T23:48:21Z)
VQAThinker: Exploring Generalizable and Explainable Video Quality Assessment via Reinforcement Learning [50.34205095371895]
Video quality assessment aims to objectively quantify perceptual quality degradation.<n>Existing VQA models suffer from two critical limitations.<n>We propose textbfVQAThinker, a reasoning-based VQA framework.
arXiv Detail & Related papers (2025-08-08T06:16:23Z)
Refine-IQA: Multi-Stage Reinforcement Finetuning for Perceptual Image Quality Assessment [22.184690568393126]
Reinforcement fine-tuning (RFT) is a proliferating paradigm for LMM training.<n>We propose a multi-stage RFT IQA framework (-IQA)<n>The resulting Refine-IQA Series Models achieve outstanding performance on both perception and scoring tasks.
arXiv Detail & Related papers (2025-08-04T22:46:10Z)
Q-Ponder: A Unified Training Pipeline for Reasoning-based Visual Quality Assessment [10.701522670464463]
multimodal large language models (MLLMs) can proficiently evaluate visual quality through interpretable assessments.<n>We propose a unified two-stage training framework comprising a cold-start stage and a reinforcement learning-based fine-tuning stage.<n>We designate the models derived from these two stages as Q-Ponder-CI and Q-Ponder.
arXiv Detail & Related papers (2025-06-03T10:11:51Z)
VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank [30.316630325648834]
We introduce VisualQuality-R1, a reasoning-induced no-reference IQA (NR-IQA) model, and we train it with reinforcement learning to rank.<n>We show that VisualQuality-R1 consistently outperforms discriminative deep learning-based NR-IQA models.<n>VisualQuality-R1 is capable of generating contextually rich, human-aligned quality descriptions.
arXiv Detail & Related papers (2025-05-20T14:56:50Z)
IQPFR: An Image Quality Prior for Blind Face Restoration and Beyond [56.99331967165238]
Blind Face Restoration (BFR) addresses the challenge of reconstructing degraded low-quality (LQ) facial images into high-quality (HQ) outputs.<n>We propose a novel framework that incorporates an Image Quality Prior (IQP) derived from No-Reference Image Quality Assessment (NR-IQA) models.<n>Our method outperforms state-of-the-art techniques across multiple benchmarks.
arXiv Detail & Related papers (2025-03-12T11:39:51Z)
When No-Reference Image Quality Models Meet MAP Estimation in Diffusion Latents [92.45867913876691]
No-reference image quality assessment (NR-IQA) models can effectively quantify perceived image quality.<n>We show that NR-IQA models can be plugged into the maximum a posteriori (MAP) estimation framework for image enhancement.
arXiv Detail & Related papers (2024-03-11T03:35:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.