ViDA-UGC: Detailed Image Quality Analysis via Visual Distortion Assessment for UGC Images
- URL: http://arxiv.org/abs/2508.12605v1
- Date: Mon, 18 Aug 2025 04:02:58 GMT
- Title: ViDA-UGC: Detailed Image Quality Analysis via Visual Distortion Assessment for UGC Images
- Authors: Wenjie Liao, Jieyu Yuan, Yifang Xu, Chunle Guo, Zilong Zhang, Jihong Li, Jiachen Fu, Haotian Fan, Tao Li, Junhui Cui, Chongyi Li,
- Abstract summary: In this study, we establish the first large-scale Visual Distortion Assessment Instruction dataset for images, termed ViDA-UGC.<n>This dataset is constructed through a distortion-oriented pipeline, which involves human subject annotation and a Chain-of-Thought framework.<n>We select 476 images with corresponding 6,149 question answer pairs from ViDA-UGC and invite a professional team to ensure the accuracy and quality of GPT-generated information.<n> Experimental results demonstrate the effectiveness of the ViDA-UGC and CoT framework for consistently enhancing various image quality analysis abilities.
- Score: 27.448161376085658
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in Multimodal Large Language Models (MLLMs) have introduced a paradigm shift for Image Quality Assessment (IQA) from unexplainable image quality scoring to explainable IQA, demonstrating practical applications like quality control and optimization guidance. However, current explainable IQA methods not only inadequately use the same distortion criteria to evaluate both User-Generated Content (UGC) and AI-Generated Content (AIGC) images, but also lack detailed quality analysis for monitoring image quality and guiding image restoration. In this study, we establish the first large-scale Visual Distortion Assessment Instruction Tuning Dataset for UGC images, termed ViDA-UGC, which comprises 11K images with fine-grained quality grounding, detailed quality perception, and reasoning quality description data. This dataset is constructed through a distortion-oriented pipeline, which involves human subject annotation and a Chain-of-Thought (CoT) assessment framework. This framework guides GPT-4o to generate quality descriptions by identifying and analyzing UGC distortions, which helps capturing rich low-level visual features that inherently correlate with distortion patterns. Moreover, we carefully select 476 images with corresponding 6,149 question answer pairs from ViDA-UGC and invite a professional team to ensure the accuracy and quality of GPT-generated information. The selected and revised data further contribute to the first UGC distortion assessment benchmark, termed ViDA-UGC-Bench. Experimental results demonstrate the effectiveness of the ViDA-UGC and CoT framework for consistently enhancing various image quality analysis abilities across multiple base MLLMs on ViDA-UGC-Bench and Q-Bench, even surpassing GPT-4o.
Related papers
- A Unified Agentic Framework for Evaluating Conditional Image Generation [66.25099219134441]
Conditional image generation has gained significant attention for its ability to personalize content.<n>This paper introduces CIGEval, a unified agentic framework for comprehensive evaluation of conditional image generation tasks.
arXiv Detail & Related papers (2025-04-09T17:04:14Z) - Subjective Visual Quality Assessment for High-Fidelity Learning-Based Image Compression [2.296138318128071]
We present a comprehensive subjective visual quality assessment of JPEG AI-compressed images using the JPEG AIC-3 methodology.<n>We reconstructed JND-based quality scales using a unified model based on boosted and plain triplet comparisons.<n>The CVVDP metric achieved the overall highest performance; however, most metrics including CVDP were overly optimistic in predicting the quality of JPEG AI-compressed images.
arXiv Detail & Related papers (2025-04-07T15:16:58Z) - Benchmarking Multi-dimensional AIGC Video Quality Assessment: A Dataset and Unified Model [56.03592388332793]
We investigate the AIGC-VQA problem, considering both subjective and objective quality assessment perspectives.<n>For the subjective perspective, we construct the Large-scale Generated Video Quality assessment (LGVQ) dataset, consisting of 2,808 AIGC videos.<n>We evaluate the perceptual quality of AIGC videos from three critical dimensions: spatial quality, temporal quality, and text-video alignment.<n>We propose the Unify Generated Video Quality assessment (UGVQ) model, designed to accurately evaluate the multi-dimensional quality of AIGC videos.
arXiv Detail & Related papers (2024-07-31T07:54:26Z) - Q-Ground: Image Quality Grounding with Large Multi-modality Models [61.72022069880346]
We introduce Q-Ground, the first framework aimed at tackling fine-scale visual quality grounding.
Q-Ground combines large multi-modality models with detailed visual quality analysis.
Central to our contribution is the introduction of the QGround-100K dataset.
arXiv Detail & Related papers (2024-07-24T06:42:46Z) - DP-IQA: Utilizing Diffusion Prior for Blind Image Quality Assessment in the Wild [73.6767681305851]
Blind image quality assessment (IQA) in the wild presents significant challenges.<n>Given the difficulty in collecting large-scale training data, leveraging limited data to develop a model with strong generalization remains an open problem.<n>Motivated by the robust image perception capabilities of pre-trained text-to-image (T2I) diffusion models, we propose a novel IQA method, diffusion priors-based IQA.
arXiv Detail & Related papers (2024-05-30T12:32:35Z) - Opinion-Unaware Blind Image Quality Assessment using Multi-Scale Deep Feature Statistics [54.08757792080732]
We propose integrating deep features from pre-trained visual models with a statistical analysis model to achieve opinion-unaware BIQA (OU-BIQA)
Our proposed model exhibits superior consistency with human visual perception compared to state-of-the-art BIQA models.
arXiv Detail & Related papers (2024-05-29T06:09:34Z) - Pairwise Comparisons Are All You Need [22.798716660911833]
Blind image quality assessment (BIQA) approaches often fall short in real-world scenarios due to their reliance on a generic quality standard applied uniformly across diverse images.
This paper introduces PICNIQ, a pairwise comparison framework designed to bypass the limitations of conventional BIQA.
By employing psychometric scaling algorithms, PICNIQ transforms pairwise comparisons into just-objectionable-difference (JOD) quality scores, offering a granular and interpretable measure of image quality.
arXiv Detail & Related papers (2024-03-13T23:43:36Z) - PSCR: Patches Sampling-based Contrastive Regression for AIGC Image
Quality Assessment [1.1744028458220428]
We propose a contrastive regression framework to leverage differences among various generated images for learning a better representation space.
We conduct extensive experiments on three mainstream AIGCIQA databases including AGIQA-1K, AGIQA-3K and AIGCIQA2023.
Results show significant improvements in model performance with the introduction of our proposed PSCR framework.
arXiv Detail & Related papers (2023-12-10T14:18:53Z) - Helping Visually Impaired People Take Better Quality Pictures [52.03016269364854]
We develop tools to help visually impaired users minimize occurrences of common technical distortions.
We also create a prototype feedback system that helps to guide users to mitigate quality issues.
arXiv Detail & Related papers (2023-05-14T04:37:53Z) - Image Quality Assessment using Contrastive Learning [50.265638572116984]
We train a deep Convolutional Neural Network (CNN) using a contrastive pairwise objective to solve the auxiliary problem.
We show through extensive experiments that CONTRIQUE achieves competitive performance when compared to state-of-the-art NR image quality models.
Our results suggest that powerful quality representations with perceptual relevance can be obtained without requiring large labeled subjective image quality datasets.
arXiv Detail & Related papers (2021-10-25T21:01:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.