Related papers: Beyond Cosine Similarity Magnitude-Aware CLIP for No-Reference Image Quality Assessment

Beyond Cosine Similarity Magnitude-Aware CLIP for No-Reference Image Quality Assessment

URL: http://arxiv.org/abs/2511.09948v1
Date: Fri, 14 Nov 2025 01:20:58 GMT
Title: Beyond Cosine Similarity Magnitude-Aware CLIP for No-Reference Image Quality Assessment
Authors: Zhicheng Liao, Dongxu Wu, Zhenshan Shi, Sijie Mai, Hanwei Zhu, Lingyu Zhu, Yuncheng Jiang, Baoliang Chen,
Abstract summary: We introduce a novel adaptive fusion framework that complements cosine similarity with a magnitude-aware quality cue.<n>Our method consistently outperforms standard CLIP-based IQA and state-of-the-art baselines, without any task-specific training.
Score: 25.104682483704
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent efforts have repurposed the Contrastive Language-Image Pre-training (CLIP) model for No-Reference Image Quality Assessment (NR-IQA) by measuring the cosine similarity between the image embedding and textual prompts such as "a good photo" or "a bad photo." However, this semantic similarity overlooks a critical yet underexplored cue: the magnitude of the CLIP image features, which we empirically find to exhibit a strong correlation with perceptual quality. In this work, we introduce a novel adaptive fusion framework that complements cosine similarity with a magnitude-aware quality cue. Specifically, we first extract the absolute CLIP image features and apply a Box-Cox transformation to statistically normalize the feature distribution and mitigate semantic sensitivity. The resulting scalar summary serves as a semantically-normalized auxiliary cue that complements cosine-based prompt matching. To integrate both cues effectively, we further design a confidence-guided fusion scheme that adaptively weighs each term according to its relative strength. Extensive experiments on multiple benchmark IQA datasets demonstrate that our method consistently outperforms standard CLIP-based IQA and state-of-the-art baselines, without any task-specific training.

Related papers

FoCLIP: A Feature-Space Misalignment Framework for CLIP-Based Image Manipulation and Detection [25.808813569367135]
We propose textbfFoCLIP, a feature-space misalignment framework for fooling CLIP-based image quality metric.<n>FoCLIP integrates three key components to construct fooling examples.<n>Experiments on ten artistic masterpiece prompts and ImageNet subsets demonstrate that optimized images can achieve significant improvement in CLIPscore.
arXiv Detail & Related papers (2025-11-10T10:54:35Z)
BPCLIP: A Bottom-up Image Quality Assessment from Distortion to Semantics Based on CLIP [18.25854559825818]
We propose a bottom-up image quality assessment approach based on the Contrastive Language-Image Pre-training (CLIP)<n>Specifically, we utilize an encoder to extract multiscale features from the input image and introduce a bottom-up multiscale cross attention module.<n>By incorporating 40 image quality adjectives across six distinct dimensions, we enable the pre-trained CLIP text encoder to generate representations of the intrinsic quality of the image.
arXiv Detail & Related papers (2025-06-22T09:56:57Z)
Hybrid Image Resolution Quality Metric (HIRQM):A Comprehensive Perceptual Image Quality Assessment Framework [0.0]
We propose the Hybrid Image Resolution Quality Metric (HIRQM) to integrate statistical, multi-scale, and deep learning methods for a comprehensive quality evaluation.<n>A dynamic weighting mechanism adapts component contributions based on image characteristics like brightness and variance, enhancing flexibility across distortion types.<n> evaluated on TID2013 and LIVE datasets, HIRQM Pearson and Spearman correlations of 0.92 and 0.90, outperforming traditional metrics.
arXiv Detail & Related papers (2025-05-04T06:14:10Z)
Contrastive Pre-Training with Multi-View Fusion for No-Reference Point Cloud Quality Assessment [49.36799270585947]
No-reference point cloud quality assessment (NR-PCQA) aims to automatically evaluate the perceptual quality of distorted point clouds without available reference. We propose a novel contrastive pre-training framework tailored for PCQA (CoPA) Our method outperforms the state-of-the-art PCQA methods on popular benchmarks.
arXiv Detail & Related papers (2024-03-15T07:16:07Z)
Pairwise Comparisons Are All You Need [22.798716660911833]
Blind image quality assessment (BIQA) approaches often fall short in real-world scenarios due to their reliance on a generic quality standard applied uniformly across diverse images. This paper introduces PICNIQ, a pairwise comparison framework designed to bypass the limitations of conventional BIQA. By employing psychometric scaling algorithms, PICNIQ transforms pairwise comparisons into just-objectionable-difference (JOD) quality scores, offering a granular and interpretable measure of image quality.
arXiv Detail & Related papers (2024-03-13T23:43:36Z)
Introspective Deep Metric Learning for Image Retrieval [80.29866561553483]
We argue that a good similarity model should consider the semantic discrepancies with caution to better deal with ambiguous images for more robust training. We propose to represent an image using not only a semantic embedding but also an accompanying uncertainty embedding, which describes the semantic characteristics and ambiguity of an image, respectively. The proposed IDML framework improves the performance of deep metric learning through uncertainty modeling and attains state-of-the-art results on the widely used CUB-200-2011, Cars196, and Stanford Online Products datasets.
arXiv Detail & Related papers (2022-05-09T17:51:44Z)
No Token Left Behind: Explainability-Aided Image Classification and Generation [79.4957965474334]
We present a novel explainability-based approach, which adds a loss term to ensure that CLIP focuses on all relevant semantic parts of the input. Our method yields an improvement in the recognition rate, without additional training or fine-tuning.
arXiv Detail & Related papers (2022-04-11T07:16:39Z)
Contextual Similarity Aggregation with Self-attention for Visual Re-ranking [96.55393026011811]
We propose a visual re-ranking method by contextual similarity aggregation with self-attention. We conduct comprehensive experiments on four benchmark datasets to demonstrate the generality and effectiveness of our proposed visual re-ranking method.
arXiv Detail & Related papers (2021-10-26T06:20:31Z)
Image Quality Assessment using Contrastive Learning [50.265638572116984]
We train a deep Convolutional Neural Network (CNN) using a contrastive pairwise objective to solve the auxiliary problem. We show through extensive experiments that CONTRIQUE achieves competitive performance when compared to state-of-the-art NR image quality models. Our results suggest that powerful quality representations with perceptual relevance can be obtained without requiring large labeled subjective image quality datasets.
arXiv Detail & Related papers (2021-10-25T21:01:00Z)
Learning Contrastive Representation for Semantic Correspondence [150.29135856909477]
We propose a multi-level contrastive learning approach for semantic matching. We show that image-level contrastive learning is a key component to encourage the convolutional features to find correspondence between similar objects.
arXiv Detail & Related papers (2021-09-22T18:34:14Z)
No-Reference Image Quality Assessment by Hallucinating Pristine Features [24.35220427707458]
We propose a no-reference (NR) image quality assessment (IQA) method via feature level pseudo-reference (PR) hallucination. The effectiveness of our proposed method is demonstrated on four popular IQA databases.
arXiv Detail & Related papers (2021-08-09T16:48:34Z)
Self-Calibration Supported Robust Projective Structure-from-Motion [80.15392629310507]
We propose a unified Structure-from-Motion (SfM) method, in which the matching process is supported by self-calibration constraints. We show experimental results demonstrating robust multiview matching and accurate camera calibration by exploiting these constraints.
arXiv Detail & Related papers (2020-07-04T08:47:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.