Foundation Models Boost Low-Level Perceptual Similarity Metrics
- URL: http://arxiv.org/abs/2409.07650v1
- Date: Wed, 11 Sep 2024 22:32:12 GMT
- Title: Foundation Models Boost Low-Level Perceptual Similarity Metrics
- Authors: Abhijay Ghildyal, Nabajeet Barman, Saman Zadtootaghaj,
- Abstract summary: For full-reference image quality assessment (FR-IQA) using deep-learning approaches, the perceptual similarity score between a distorted image and a reference image is typically computed as a distance measure between features extracted from a pretrained CNN or more recently, a Transformer network.
This work explores the potential of utilizing the intermediate features of these foundation models, which have largely been unexplored so far in the design of low-level perceptual similarity metrics.
- Score: 6.226609932118124
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: For full-reference image quality assessment (FR-IQA) using deep-learning approaches, the perceptual similarity score between a distorted image and a reference image is typically computed as a distance measure between features extracted from a pretrained CNN or more recently, a Transformer network. Often, these intermediate features require further fine-tuning or processing with additional neural network layers to align the final similarity scores with human judgments. So far, most IQA models based on foundation models have primarily relied on the final layer or the embedding for the quality score estimation. In contrast, this work explores the potential of utilizing the intermediate features of these foundation models, which have largely been unexplored so far in the design of low-level perceptual similarity metrics. We demonstrate that the intermediate features are comparatively more effective. Moreover, without requiring any training, these metrics can outperform both traditional and state-of-the-art learned metrics by utilizing distance measures between the features.
Related papers
- On Layer-wise Representation Similarity: Application for Multi-Exit Models with a Single Classifier [20.17288970927518]
We study the similarity of representations between the hidden layers of individual transformers.
We propose an aligned training approach to enhance the similarity between internal representations.
arXiv Detail & Related papers (2024-06-20T16:41:09Z) - TOPIQ: A Top-down Approach from Semantics to Distortions for Image
Quality Assessment [53.72721476803585]
Image Quality Assessment (IQA) is a fundamental task in computer vision that has witnessed remarkable progress with deep neural networks.
We propose a top-down approach that uses high-level semantics to guide the IQA network to focus on semantically important local distortion regions.
A key component of our approach is the proposed cross-scale attention mechanism, which calculates attention maps for lower level features.
arXiv Detail & Related papers (2023-08-06T09:08:37Z) - Layer-wise Linear Mode Connectivity [52.6945036534469]
Averaging neural network parameters is an intuitive method for the knowledge of two independent models.
It is most prominently used in federated learning.
We analyse the performance of the models that result from averaging single, or groups.
arXiv Detail & Related papers (2023-07-13T09:39:10Z) - DeepDC: Deep Distance Correlation as a Perceptual Image Quality
Evaluator [53.57431705309919]
ImageNet pre-trained deep neural networks (DNNs) show notable transferability for building effective image quality assessment (IQA) models.
We develop a novel full-reference IQA (FR-IQA) model based exclusively on pre-trained DNN features.
We conduct comprehensive experiments to demonstrate the superiority of the proposed quality model on five standard IQA datasets.
arXiv Detail & Related papers (2022-11-09T14:57:27Z) - CONVIQT: Contrastive Video Quality Estimator [63.749184706461826]
Perceptual video quality assessment (VQA) is an integral component of many streaming and video sharing platforms.
Here we consider the problem of learning perceptually relevant video quality representations in a self-supervised manner.
Our results indicate that compelling representations with perceptual bearing can be obtained using self-supervised learning.
arXiv Detail & Related papers (2022-06-29T15:22:01Z) - Learning Transformer Features for Image Quality Assessment [53.51379676690971]
We propose a unified IQA framework that utilizes CNN backbone and transformer encoder to extract features.
The proposed framework is compatible with both FR and NR modes and allows for a joint training scheme.
arXiv Detail & Related papers (2021-12-01T13:23:00Z) - Who Explains the Explanation? Quantitatively Assessing Feature
Attribution Methods [0.0]
We propose a novel evaluation metric -- the Focus -- designed to quantify the faithfulness of explanations.
We show the robustness of the metric through randomization experiments, and then use Focus to evaluate and compare three popular explainability techniques.
Our results find LRP and GradCAM to be consistent and reliable, while the latter remains most competitive even when applied to poorly performing models.
arXiv Detail & Related papers (2021-09-28T07:10:24Z) - A combined full-reference image quality assessment approach based on
convolutional activation maps [0.0]
The goal of full-reference image quality assessment (FR-IQA) is to predict the quality of an image as perceived by human observers with using its pristine, reference counterpart.
In this study, we explore a novel, combined approach which predicts the perceptual quality of a distorted image by compiling a feature vector from convolutional activation maps.
arXiv Detail & Related papers (2020-10-19T10:00:29Z) - Eigen-CAM: Class Activation Map using Principal Components [1.2691047660244335]
This paper builds on previous ideas to cope with the increasing demand for interpretable, robust, and transparent models.
The proposed Eigen-CAM computes and visualizes the principle components of the learned features/representations from the convolutional layers.
arXiv Detail & Related papers (2020-08-01T17:14:13Z) - Deep Keypoint-Based Camera Pose Estimation with Geometric Constraints [80.60538408386016]
Estimating relative camera poses from consecutive frames is a fundamental problem in visual odometry.
We propose an end-to-end trainable framework consisting of learnable modules for detection, feature extraction, matching and outlier rejection.
arXiv Detail & Related papers (2020-07-29T21:41:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.