Related papers: Quality-Aware Image-Text Alignment for Real-World Image Quality Assessment

Quality-Aware Image-Text Alignment for Real-World Image Quality Assessment

URL: http://arxiv.org/abs/2403.11176v1
Date: Sun, 17 Mar 2024 11:32:18 GMT
Title: Quality-Aware Image-Text Alignment for Real-World Image Quality Assessment
Authors: Lorenzo Agnolucci, Leonardo Galteri, Marco Bertini,
Abstract summary: No-Reference Image Quality Assessment (NR-IQA) focuses on designing methods to measure image quality in alignment with human perception when a high-quality reference image is unavailable. The reliance on annotated Mean Opinion Scores (MOS) in the majority of state-of-the-art NR-IQA approaches limits their scalability and broader applicability to real-world scenarios. We propose QualiCLIP, a CLIP-based self-supervised opinion-unaware method that does not require labeled MOS.
Score: 8.431867616409958
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: No-Reference Image Quality Assessment (NR-IQA) focuses on designing methods to measure image quality in alignment with human perception when a high-quality reference image is unavailable. The reliance on annotated Mean Opinion Scores (MOS) in the majority of state-of-the-art NR-IQA approaches limits their scalability and broader applicability to real-world scenarios. To overcome this limitation, we propose QualiCLIP (Quality-aware CLIP), a CLIP-based self-supervised opinion-unaware method that does not require labeled MOS. In particular, we introduce a quality-aware image-text alignment strategy to make CLIP generate representations that correlate with the inherent quality of the images. Starting from pristine images, we synthetically degrade them with increasing levels of intensity. Then, we train CLIP to rank these degraded images based on their similarity to quality-related antonym text prompts, while guaranteeing consistent representations for images with comparable quality. Our method achieves state-of-the-art performance on several datasets with authentic distortions. Moreover, despite not requiring MOS, QualiCLIP outperforms supervised methods when their training dataset differs from the testing one, thus proving to be more suitable for real-world scenarios. Furthermore, our approach demonstrates greater robustness and improved explainability than competing methods. The code and the model are publicly available at https://github.com/miccunifi/QualiCLIP.

Related papers

Enhancing Reward Models for High-quality Image Generation: Beyond Text-Image Alignment [63.823383517957986]
We propose a novel evaluation score, ICT (Image-Contained-Text) score, that achieves and surpasses the objectives of text-image alignment.<n>We further train an HP (High-Preference) score model using solely the image modality to enhance image aesthetics and detail quality.
arXiv Detail & Related papers (2025-07-25T07:01:50Z)
BPCLIP: A Bottom-up Image Quality Assessment from Distortion to Semantics Based on CLIP [18.25854559825818]
We propose a bottom-up image quality assessment approach based on the Contrastive Language-Image Pre-training (CLIP)<n>Specifically, we utilize an encoder to extract multiscale features from the input image and introduce a bottom-up multiscale cross attention module.<n>By incorporating 40 image quality adjectives across six distinct dimensions, we enable the pre-trained CLIP text encoder to generate representations of the intrinsic quality of the image.
arXiv Detail & Related papers (2025-06-22T09:56:57Z)
CLIP-DQA: Blindly Evaluating Dehazed Images from Global and Local Perspectives Using CLIP [19.80268944768578]
Blind dehazed image quality assessment (BDQA) aims to accurately predict the visual quality of dehazed images without any reference information. We propose to adapt Contrastive Language-Image Pre-Training (CLIP), pre-trained on large-scale image-text pairs, to the BDQA task. We show that our proposed approach, named CLIP-DQA, achieves more accurate quality predictions over existing BDQA methods.
arXiv Detail & Related papers (2025-02-03T14:12:25Z)
Ranking-aware adapter for text-driven image ordering with CLIP [76.80965830448781]
We propose an effective yet efficient approach that reframes the CLIP model into a learning-to-rank task. Our approach incorporates learnable prompts to adapt to new instructions for ranking purposes. Our ranking-aware adapter consistently outperforms fine-tuned CLIPs on various tasks.
arXiv Detail & Related papers (2024-12-09T18:51:05Z)
TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives [65.82577305915643]
Contrastive Language-Image Pretraining (CLIP) models maximize the mutual information between text and visual modalities to learn representations. We show that generating hard'' negative captions via in-context learning and corresponding negative images with text-to-image generators offers a solution. We demonstrate that our method, named TripletCLIP, enhances the compositional capabilities of CLIP, resulting in an absolute improvement of over 9% on the SugarCrepe benchmark.
arXiv Detail & Related papers (2024-11-04T19:24:59Z)
ExIQA: Explainable Image Quality Assessment Using Distortion Attributes [0.3683202928838613]
We propose an explainable approach for distortion identification based on attribute learning. We generate a dataset consisting of 100,000 images for efficient training. Our approach achieves state-of-the-art (SOTA) performance across multiple datasets in both PLCC and SRCC metrics.
arXiv Detail & Related papers (2024-09-10T20:28:14Z)
Vision-Language Consistency Guided Multi-modal Prompt Learning for Blind AI Generated Image Quality Assessment [57.07360640784803]
We propose vision-language consistency guided multi-modal prompt learning for blind image quality assessment (AGIQA) Specifically, we introduce learnable textual and visual prompts in language and vision branches of Contrastive Language-Image Pre-training (CLIP) models. We design a text-to-image alignment quality prediction task, whose learned vision-language consistency knowledge is used to guide the optimization of the above multi-modal prompts.
arXiv Detail & Related papers (2024-06-24T13:45:31Z)
Descriptive Image Quality Assessment in the Wild [25.503311093471076]
VLM-based Image Quality Assessment (IQA) seeks to describe image quality linguistically to align with human expression. We introduce Depicted image Quality Assessment in the Wild (DepictQA-Wild) Our method includes a multi-functional IQA task paradigm that encompasses both assessment and comparison tasks, brief and detailed responses, full-reference and non-reference scenarios.
arXiv Detail & Related papers (2024-05-29T07:49:15Z)
Multi-Modal Prompt Learning on Blind Image Quality Assessment [65.0676908930946]
Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly. Traditional methods, hindered by a lack of sufficiently annotated data, have employed the CLIP image-text pretraining model as their backbone to gain semantic awareness. Recent approaches have attempted to address this mismatch using prompt technology, but these solutions have shortcomings. This paper introduces an innovative multi-modal prompt-based methodology for IQA.
arXiv Detail & Related papers (2024-04-23T11:45:32Z)
QGFace: Quality-Guided Joint Training For Mixed-Quality Face Recognition [2.8519768339207356]
We propose a novel quality-guided joint training approach for mixed-quality face recognition. Based on quality partition, classification-based method is employed for HQ data learning. For the LQ images which lack identity information, we learn them with self-supervised image-image contrastive learning.
arXiv Detail & Related papers (2023-12-29T06:56:22Z)
Re-IQA: Unsupervised Learning for Image Quality Assessment in the Wild [38.197794061203055]
We propose a Mixture of Experts approach to train two separate encoders to learn high-level content and low-level image quality features in an unsupervised setting. We deploy the complementary low and high-level image representations obtained from the Re-IQA framework to train a linear regression model. Our method achieves state-of-the-art performance on multiple large-scale image quality assessment databases.
arXiv Detail & Related papers (2023-04-02T05:06:51Z)
Exploring CLIP for Assessing the Look and Feel of Images [87.97623543523858]
We introduce Contrastive Language-Image Pre-training (CLIP) models for assessing both the quality perception (look) and abstract perception (feel) of images in a zero-shot manner. Our results show that CLIP captures meaningful priors that generalize well to different perceptual assessments.
arXiv Detail & Related papers (2022-07-25T17:58:16Z)
Conformer and Blind Noisy Students for Improved Image Quality Assessment [80.57006406834466]
Learning-based approaches for perceptual image quality assessment (IQA) usually require both the distorted and reference image for measuring the perceptual quality accurately. In this work, we explore the performance of transformer-based full-reference IQA models. We also propose a method for IQA based on semi-supervised knowledge distillation from full-reference teacher models into blind student models.
arXiv Detail & Related papers (2022-04-27T10:21:08Z)
Image Quality Assessment using Contrastive Learning [50.265638572116984]
We train a deep Convolutional Neural Network (CNN) using a contrastive pairwise objective to solve the auxiliary problem. We show through extensive experiments that CONTRIQUE achieves competitive performance when compared to state-of-the-art NR image quality models. Our results suggest that powerful quality representations with perceptual relevance can be obtained without requiring large labeled subjective image quality datasets.
arXiv Detail & Related papers (2021-10-25T21:01:00Z)
Learning Conditional Knowledge Distillation for Degraded-Reference Image Quality Assessment [157.1292674649519]
We propose a practical solution named degraded-reference IQA (DR-IQA) DR-IQA exploits the inputs of IR models, degraded images, as references. Our results can even be close to the performance of full-reference settings.
arXiv Detail & Related papers (2021-08-18T02:35:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.