Related papers: VQualA 2025 Challenge on Visual Quality Comparison for Large Multimodal Models: Methods and Results

VQualA 2025 Challenge on Visual Quality Comparison for Large Multimodal Models: Methods and Results

URL: http://arxiv.org/abs/2509.09190v1
Date: Thu, 11 Sep 2025 07:00:50 GMT
Title: VQualA 2025 Challenge on Visual Quality Comparison for Large Multimodal Models: Methods and Results
Authors: Hanwei Zhu, Haoning Wu, Zicheng Zhang, Lingyu Zhu, Yixuan Li, Peilin Chen, Shiqi Wang, Chris Wei Zhou, Linhan Cao, Wei Sun, Xiangyang Zhu, Weixia Zhang, Yucheng Zhu, Jing Liu, Dandan Zhu, Guangtao Zhai, Xiongkuo Min, Zhichao Zhang, Xinyue Li, Shubo Xu, Anh Dao, Yifan Li, Hongyuan Yu, Jiaojiao Yi, Yiding Tian, Yupeng Wu, Feiran Sun, Lijuan Liao, Song Jiang,
Abstract summary: VQualA 2025 Challenge on Visual Quality Comparison for Large Multimodal Models hosted as part of ICCV 2025 Workshop on Visual Quality Assessment.<n>Challenge aims to evaluate and enhance the ability of state-of-the-art LMMs to perform open-ended and detailed reasoning about visual quality differences across multiple images.<n>Around 100 participants submitted entries, with five models demonstrating the emerging capabilities of instruction-tuned LMMs on quality assessment.
Score: 106.15762208088985
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper presents a summary of the VQualA 2025 Challenge on Visual Quality Comparison for Large Multimodal Models (LMMs), hosted as part of the ICCV 2025 Workshop on Visual Quality Assessment. The challenge aims to evaluate and enhance the ability of state-of-the-art LMMs to perform open-ended and detailed reasoning about visual quality differences across multiple images. To this end, the competition introduces a novel benchmark comprising thousands of coarse-to-fine grained visual quality comparison tasks, spanning single images, pairs, and multi-image groups. Each task requires models to provide accurate quality judgments. The competition emphasizes holistic evaluation protocols, including 2AFC-based binary preference and multi-choice questions (MCQs). Around 100 participants submitted entries, with five models demonstrating the emerging capabilities of instruction-tuned LMMs on quality assessment. This challenge marks a significant step toward open-domain visual quality reasoning and comparison and serves as a catalyst for future research on interpretable and human-aligned quality evaluation systems.

Related papers

VQualA 2025 Challenge on Image Super-Resolution Generated Content Quality Assessment: Methods and Results [65.82676254264837]
This paper presents the ISRGC-Q Challenge, built upon the Image Super-Resolution Generated Content Quality Assessment dataset.<n>The primary goal of this challenge is to analyze the unique artifacts introduced by modern super-resolution techniques and to evaluate their perceptual quality effectively.
arXiv Detail & Related papers (2025-09-08T08:07:50Z)
VQualA 2025 Challenge on Face Image Quality Assessment: Methods and Results [96.54702713309052]
VQualA 2025 Challenge on Face Image Quality Assessment (FIQA)<n>This report summarizes the methodologies and findings for advancing the development of practical FIQA approaches.
arXiv Detail & Related papers (2025-08-25T19:48:52Z)
VQA$^2$: Visual Question Answering for Video Quality Assessment [76.81110038738699]
Video Quality Assessment (VQA) is a classic field in low-level visual perception.<n>Recent studies in the image domain have demonstrated that Visual Question Answering (VQA) can enhance markedly low-level visual quality evaluation.<n>We introduce the VQA2 Instruction dataset - the first visual question answering instruction dataset that focuses on video quality assessment.<n>The VQA2 series models interleave visual and motion tokens to enhance the perception of spatial-temporal quality details in videos.
arXiv Detail & Related papers (2024-11-06T09:39:52Z)
Q-Ground: Image Quality Grounding with Large Multi-modality Models [61.72022069880346]
We introduce Q-Ground, the first framework aimed at tackling fine-scale visual quality grounding. Q-Ground combines large multi-modality models with detailed visual quality analysis. Central to our contribution is the introduction of the QGround-100K dataset.
arXiv Detail & Related papers (2024-07-24T06:42:46Z)
QPT V2: Masked Image Modeling Advances Visual Scoring [14.494394623916714]
Masked image modeling (MIM) has achieved noteworthy advancements across various high-level tasks. In this work, we take on a novel perspective to investigate its capabilities in terms of quality- and aesthetics-awareness. We propose Quality- and aesthetics-aware pretraining (QPT V2), the first pretraining framework based on MIM that offers a unified solution to quality and aesthetics assessment.
arXiv Detail & Related papers (2024-07-23T14:53:47Z)
VisualCritic: Making LMMs Perceive Visual Quality Like Humans [65.59779450136399]
We present VisualCritic, the first LMM for broad-spectrum image subjective quality assessment. VisualCritic can be used across diverse data right out of box, without any requirements of dataset-specific adaptation operations.
arXiv Detail & Related papers (2024-03-19T15:07:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.