Q-Bench-Portrait: Benchmarking Multimodal Large Language Models on Portrait Image Quality Perception
- URL: http://arxiv.org/abs/2601.18346v1
- Date: Mon, 26 Jan 2026 10:37:20 GMT
- Title: Q-Bench-Portrait: Benchmarking Multimodal Large Language Models on Portrait Image Quality Perception
- Authors: Sijing Wu, Yunhao Li, Zicheng Zhang, Qi Jia, Xinyue Li, Huiyu Duan, Xiongkuo Min, Guangtao Zhai,
- Abstract summary: multimodal large language models (MLLMs) have demonstrated impressive performance on existing low-level vision benchmarks.<n>We introduce Q-Bench-Portrait, the first holistic benchmark specifically designed for portrait image quality perception.
- Score: 101.76154325436544
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in multimodal large language models (MLLMs) have demonstrated impressive performance on existing low-level vision benchmarks, which primarily focus on generic images. However, their capabilities to perceive and assess portrait images, a domain characterized by distinct structural and perceptual properties, remain largely underexplored. To this end, we introduce Q-Bench-Portrait, the first holistic benchmark specifically designed for portrait image quality perception, comprising 2,765 image-question-answer triplets and featuring (1) diverse portrait image sources, including natural, synthetic distortion, AI-generated, artistic, and computer graphics images; (2) comprehensive quality dimensions, covering technical distortions, AIGC-specific distortions, and aesthetics; and (3) a range of question formats, including single-choice, multiple-choice, true/false, and open-ended questions, at both global and local levels. Based on Q-Bench-Portrait, we evaluate 20 open-source and 5 closed-source MLLMs, revealing that although current models demonstrate some competence in portrait image perception, their performance remains limited and imprecise, with a clear gap relative to human judgments. We hope that the proposed benchmark will foster further research into enhancing the portrait image perception capabilities of both general-purpose and domain-specific MLLMs.
Related papers
- UR-Bench: A Benchmark for Multi-Hop Reasoning over Ultra-High-Resolution Images [32.910783646241754]
We introduce Ultra-high-resolution Reasoning Benchmark (UR-Bench), a benchmark designed to evaluate the reasoning capabilities of MLLMs under extreme visual information.<n>UR-Bench comprises two major categories, Humanistic Scenes and Natural Scenes, covering four subsets of ultra-high-resolution images.<n>We propose an agent-based framework in which a language model performs reasoning by invoking external visual tools.
arXiv Detail & Related papers (2025-12-31T02:22:50Z) - M3-AGIQA: Multimodal, Multi-Round, Multi-Aspect AI-Generated Image Quality Assessment [65.3860007085689]
M3-AGIQA is a comprehensive framework that enables more human-aligned, holistic evaluation of AI-generated images.<n>By aligning model outputs more closely with human judgment, M3-AGIQA delivers robust and interpretable quality scores.
arXiv Detail & Related papers (2025-02-21T03:05:45Z) - IC-Portrait: In-Context Matching for View-Consistent Personalized Portrait [51.18967854258571]
IC-Portrait is a novel framework designed to accurately encode individual identities for personalized portrait generation.<n>Our key insight is that pre-trained diffusion models are fast learners for in-context dense correspondence matching.<n>We show that IC-Portrait consistently outperforms existing state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2025-01-28T18:59:03Z) - Dual-Branch Network for Portrait Image Quality Assessment [76.27716058987251]
We introduce a dual-branch network for portrait image quality assessment (PIQA)
We utilize two backbone networks (textiti.e., Swin Transformer-B) to extract the quality-aware features from the entire portrait image and the facial image cropped from it.
We leverage LIQE, an image scene classification and quality assessment model, to capture the quality-aware and scene-specific features as the auxiliary features.
arXiv Detail & Related papers (2024-05-14T12:43:43Z) - VisualCritic: Making LMMs Perceive Visual Quality Like Humans [65.59779450136399]
We present VisualCritic, the first LMM for broad-spectrum image subjective quality assessment.
VisualCritic can be used across diverse data right out of box, without any requirements of dataset-specific adaptation operations.
arXiv Detail & Related papers (2024-03-19T15:07:08Z) - A Comprehensive Study of Multimodal Large Language Models for Image Quality Assessment [46.55045595936298]
Multimodal Large Language Models (MLLMs) have experienced significant advancement in visual understanding and reasoning.
Their potential to serve as powerful, flexible, interpretable, and text-driven models for Image Quality Assessment (IQA) remains largely unexplored.
arXiv Detail & Related papers (2024-03-16T08:30:45Z) - Blind Multimodal Quality Assessment: A Brief Survey and A Case Study of
Low-light Images [73.27643795557778]
Blind image quality assessment (BIQA) aims at automatically and accurately forecasting objective scores for visual signals.
Recent developments in this field are dominated by unimodal solutions inconsistent with human subjective rating patterns.
We present a unique blind multimodal quality assessment (BMQA) of low-light images from subjective evaluation to objective score.
arXiv Detail & Related papers (2023-03-18T09:04:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.