Related papers: Guiding Perception-Reasoning Closer to Human in Blind Image Quality Assessment

Guiding Perception-Reasoning Closer to Human in Blind Image Quality Assessment

URL: http://arxiv.org/abs/2512.16484v1
Date: Thu, 18 Dec 2025 12:52:37 GMT
Title: Guiding Perception-Reasoning Closer to Human in Blind Image Quality Assessment
Authors: Yuan Li, Yahan Yu, Youyuan Lin, Yong-Hao Yang, Chenhui Chu, Shin'ya Nishida,
Abstract summary: We investigate how a model can acquire both human-like and self-consistent reasoning capability for blind image quality assessment (BIQA)<n>We first collect human evaluation data that capture several aspects of human perception-reasoning pipeline.<n>We adopt reinforcement learning, using human annotations as reward signals to guide the model toward human-like perception and reasoning.
Score: 24.713568842749222
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Humans assess image quality through a perception-reasoning cascade, integrating sensory cues with implicit reasoning to form self-consistent judgments. In this work, we investigate how a model can acquire both human-like and self-consistent reasoning capability for blind image quality assessment (BIQA). We first collect human evaluation data that capture several aspects of human perception-reasoning pipeline. Then, we adopt reinforcement learning, using human annotations as reward signals to guide the model toward human-like perception and reasoning. To enable the model to internalize self-consistent reasoning capability, we design a reward that drives the model to infer the image quality purely from self-generated descriptions. Empirically, our approach achieves score prediction performance comparable to state-of-the-art BIQA systems under general metrics, including Pearson and Spearman correlation coefficients. In addition to the rating score, we assess human-model alignment using ROUGE-1 to measure the similarity between model-generated and human perception-reasoning chains. On over 1,000 human-annotated samples, our model reaches a ROUGE-1 score of 0.512 (cf. 0.443 for baseline), indicating substantial coverage of human explanations and marking a step toward human-like interpretable reasoning in BIQA.

Related papers

Correcting Human Labels for Rater Effects in AI Evaluation: An Item Response Theory Approach [0.0]
This paper integrates psychometric rater models into the AI pipeline to improve the reliability and validity of conclusions drawn from human judgments.<n>We show how adjusting for rater severity produces corrected estimates of summary quality.<n>This perspective highlights a path toward more robust, interpretable, and construct-aligned practices for AI development and evaluation.
arXiv Detail & Related papers (2026-02-26T03:35:36Z)
The Catastrophic Paradox of Human Cognitive Frameworks in Large Language Model Evaluation: A Comprehensive Empirical Analysis of the CHC-LLM Incompatibility [0.0]
Models achieving above-average human IQ scores simultaneously exhibit binary accuracy rates approaching zero on crystallized knowledge tasks.<n>This disconnect appears most strongly in the crystallized intelligence domain.<n>We propose a framework for developing native machine cognition assessments that recognize the non-human nature of artificial intelligence.
arXiv Detail & Related papers (2025-11-23T05:49:57Z)
HumanPCR: Probing MLLM Capabilities in Diverse Human-Centric Scenes [72.26829188852139]
HumanPCR is an evaluation suite for probing MLLMs' capacity about human-related visual contexts.<n>Human-P, HumanThought-C, and Human-R feature over 6,000 human-verified multiple choice questions.<n>Human-R offers a challenging manually curated video reasoning test.
arXiv Detail & Related papers (2025-08-19T09:52:04Z)
Image Quality Assessment for Embodied AI [103.66095742463195]
Embodied AI has developed rapidly in recent years, but it is still mainly deployed in laboratories.<n>There is no IQA method to assess the usability of an image in embodied tasks, namely, the perceptual quality for robots.
arXiv Detail & Related papers (2025-05-22T15:51:07Z)
A Distributional Evaluation of Generative Image Models [2.520143908749992]
We focus on evaluating image generative models, where studies often treat human evaluation as the gold standard.<n>We propose the Embedded Characteristic Score (ECS), a comprehensive metric for evaluating the distributional match between the learned and target sample distributions.
arXiv Detail & Related papers (2025-01-01T06:23:18Z)
Beyond correlation: The Impact of Human Uncertainty in Measuring the Effectiveness of Automatic Evaluation and LLM-as-a-Judge [51.93909886542317]
We show how *relying on a single aggregate correlation score* can obscure fundamental differences between human labels and those from automatic evaluation.<n>We propose stratifying data by human label uncertainty to provide a more robust analysis of automatic evaluation performance.
arXiv Detail & Related papers (2024-10-03T03:08:29Z)
Mind the Uncertainty in Human Disagreement: Evaluating Discrepancies between Model Predictions and Human Responses in VQA [26.968874222330978]
This study focuses on the Visual Question Answering (VQA) task. We evaluate how well vision-language models correlate with the distribution of human responses.
arXiv Detail & Related papers (2024-09-17T13:44:25Z)
Opinion-Unaware Blind Image Quality Assessment using Multi-Scale Deep Feature Statistics [54.08757792080732]
We propose integrating deep features from pre-trained visual models with a statistical analysis model to achieve opinion-unaware BIQA (OU-BIQA) Our proposed model exhibits superior consistency with human visual perception compared to state-of-the-art BIQA models.
arXiv Detail & Related papers (2024-05-29T06:09:34Z)
Quality Assessment for AI Generated Images with Instruction Tuning [58.41087653543607]
We first establish a novel Image Quality Assessment (IQA) database for AIGIs, termed AIGCIQA2023+.<n>This paper presents a MINT-IQA model to evaluate and explain human preferences for AIGIs from Multi-perspectives with INstruction Tuning.
arXiv Detail & Related papers (2024-05-12T17:45:11Z)
It HAS to be Subjective: Human Annotator Simulation via Zero-shot Density Estimation [15.8765167340819]
Human annotator simulation (HAS) serves as a cost-effective substitute for human evaluation such as data annotation and system assessment. Human perception and behaviour during human evaluation exhibit inherent variability due to diverse cognitive processes and subjective interpretations. This paper introduces a novel meta-learning framework that treats HAS as a zero-shot density estimation problem.
arXiv Detail & Related papers (2023-09-30T20:54:59Z)
Perceptual Attacks of No-Reference Image Quality Models with Human-in-the-Loop [113.75573175709573]
We make one of the first attempts to examine the perceptual robustness of NR-IQA models. We test one knowledge-driven and three data-driven NR-IQA methods under four full-reference IQA models. We find that all four NR-IQA models are vulnerable to the proposed perceptual attack.
arXiv Detail & Related papers (2022-10-03T13:47:16Z)
Exploring Alignment of Representations with Human Perception [47.53970721813083]
We show that inputs that are mapped to similar representations by the model should be perceived similarly by humans. Our approach yields a measure of the extent to which a model is aligned with human perception. We find that various properties of a model like its architecture, training paradigm, training loss, and data augmentation play a significant role in learning representations that are aligned with human perception.
arXiv Detail & Related papers (2021-11-29T17:26:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.