BPCLIP: A Bottom-up Image Quality Assessment from Distortion to Semantics Based on CLIP
- URL: http://arxiv.org/abs/2506.17969v1
- Date: Sun, 22 Jun 2025 09:56:57 GMT
- Title: BPCLIP: A Bottom-up Image Quality Assessment from Distortion to Semantics Based on CLIP
- Authors: Chenyue Song, Chen Hui, Wei Zhang, Haiqi Zhu, Shaohui Liu, Hong Huang, Feng Jiang,
- Abstract summary: We propose a bottom-up image quality assessment approach based on the Contrastive Language-Image Pre-training (CLIP)<n>Specifically, we utilize an encoder to extract multiscale features from the input image and introduce a bottom-up multiscale cross attention module.<n>By incorporating 40 image quality adjectives across six distinct dimensions, we enable the pre-trained CLIP text encoder to generate representations of the intrinsic quality of the image.
- Score: 18.25854559825818
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image Quality Assessment (IQA) aims to evaluate the perceptual quality of images based on human subjective perception. Existing methods generally combine multiscale features to achieve high performance, but most rely on straightforward linear fusion of these features, which may not adequately capture the impact of distortions on semantic content. To address this, we propose a bottom-up image quality assessment approach based on the Contrastive Language-Image Pre-training (CLIP, a recently proposed model that aligns images and text in a shared feature space), named BPCLIP, which progressively extracts the impact of low-level distortions on high-level semantics. Specifically, we utilize an encoder to extract multiscale features from the input image and introduce a bottom-up multiscale cross attention module designed to capture the relationships between shallow and deep features. In addition, by incorporating 40 image quality adjectives across six distinct dimensions, we enable the pre-trained CLIP text encoder to generate representations of the intrinsic quality of the image, thereby strengthening the connection between image quality perception and human language. Our method achieves superior results on most public Full-Reference (FR) and No-Reference (NR) IQA benchmarks, while demonstrating greater robustness.
Related papers
- Text-Visual Semantic Constrained AI-Generated Image Quality Assessment [47.575342788480505]
We propose a unified framework to enhance the comprehensive evaluation of both text-image consistency and perceptual distortion in AI-generated images.<n>Our approach integrates key capabilities from multiple models and tackles the aforementioned challenges by introducing two core modules.<n>Tests conducted on multiple benchmark datasets demonstrate that SC-AGIQA outperforms existing state-of-the-art methods.
arXiv Detail & Related papers (2025-07-14T16:21:05Z) - Q-Insight: Understanding Image Quality via Visual Reinforcement Learning [27.26829134776367]
Image quality assessment (IQA) focuses on the perceptual visual quality of images, playing a crucial role in downstream tasks such as image reconstruction, compression, and generation.<n>We propose Q-Insight, a reinforcement learning-based model built upon group relative policy optimization (GRPO)<n>We show that Q-Insight substantially outperforms existing state-of-the-art methods in both score regression and degradation perception tasks.
arXiv Detail & Related papers (2025-03-28T17:59:54Z) - Language-Guided Visual Perception Disentanglement for Image Quality Assessment and Conditional Image Generation [48.642826318384294]
Contrastive vision-language models, such as CLIP, have demonstrated excellent zero-shot capability across semantic recognition tasks.<n>This paper presents a new multimodal disentangled representation learning framework, which leverages disentangled text to guide image disentanglement.
arXiv Detail & Related papers (2025-03-04T02:36:48Z) - Extremely low-bitrate Image Compression Semantically Disentangled by LMMs from a Human Perception Perspective [2.542077227403488]
Inspired by human progressive perception mechanism, we propose a Semantically Disentangled Image Compression framework.<n>We leverage LMMs to extract essential semantic components, including overall descriptions, object detailed description, and semantic segmentation masks.<n>We propose a training-free Object Restoration model with Attention Guidance (ORAG) built on pre-trained ControlNet to restore object details conditioned by object-level text descriptions and semantic masks.
arXiv Detail & Related papers (2025-03-01T08:27:11Z) - DP-IQA: Utilizing Diffusion Prior for Blind Image Quality Assessment in the Wild [73.6767681305851]
Blind image quality assessment (IQA) in the wild presents significant challenges.<n>Given the difficulty in collecting large-scale training data, leveraging limited data to develop a model with strong generalization remains an open problem.<n>Motivated by the robust image perception capabilities of pre-trained text-to-image (T2I) diffusion models, we propose a novel IQA method, diffusion priors-based IQA.
arXiv Detail & Related papers (2024-05-30T12:32:35Z) - Multi-Modal Prompt Learning on Blind Image Quality Assessment [65.0676908930946]
Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly.
Traditional methods, hindered by a lack of sufficiently annotated data, have employed the CLIP image-text pretraining model as their backbone to gain semantic awareness.
Recent approaches have attempted to address this mismatch using prompt technology, but these solutions have shortcomings.
This paper introduces an innovative multi-modal prompt-based methodology for IQA.
arXiv Detail & Related papers (2024-04-23T11:45:32Z) - Quality-Aware Image-Text Alignment for Opinion-Unaware Image Quality Assessment [8.431867616409958]
No-Reference Image Quality Assessment (NR-IQA) focuses on designing methods to measure image quality in alignment with human perception when a high-quality reference image is unavailable.<n>Most state-of-the-art NR-IQA approaches are opinion-aware, i.e. they require human annotations for training.<n>We propose QualiCLIP, a CLIP-based self-supervised opinion-unaware approach that does not require human opinions.
arXiv Detail & Related papers (2024-03-17T11:32:18Z) - Contrastive Pre-Training with Multi-View Fusion for No-Reference Point Cloud Quality Assessment [49.36799270585947]
No-reference point cloud quality assessment (NR-PCQA) aims to automatically evaluate the perceptual quality of distorted point clouds without available reference.
We propose a novel contrastive pre-training framework tailored for PCQA (CoPA)
Our method outperforms the state-of-the-art PCQA methods on popular benchmarks.
arXiv Detail & Related papers (2024-03-15T07:16:07Z) - Re-IQA: Unsupervised Learning for Image Quality Assessment in the Wild [38.197794061203055]
We propose a Mixture of Experts approach to train two separate encoders to learn high-level content and low-level image quality features in an unsupervised setting.
We deploy the complementary low and high-level image representations obtained from the Re-IQA framework to train a linear regression model.
Our method achieves state-of-the-art performance on multiple large-scale image quality assessment databases.
arXiv Detail & Related papers (2023-04-02T05:06:51Z) - Exploring CLIP for Assessing the Look and Feel of Images [87.97623543523858]
We introduce Contrastive Language-Image Pre-training (CLIP) models for assessing both the quality perception (look) and abstract perception (feel) of images in a zero-shot manner.
Our results show that CLIP captures meaningful priors that generalize well to different perceptual assessments.
arXiv Detail & Related papers (2022-07-25T17:58:16Z) - Image Quality Assessment using Contrastive Learning [50.265638572116984]
We train a deep Convolutional Neural Network (CNN) using a contrastive pairwise objective to solve the auxiliary problem.
We show through extensive experiments that CONTRIQUE achieves competitive performance when compared to state-of-the-art NR image quality models.
Our results suggest that powerful quality representations with perceptual relevance can be obtained without requiring large labeled subjective image quality datasets.
arXiv Detail & Related papers (2021-10-25T21:01:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.