Related papers: Foundation Models Boost Low-Level Perceptual Similarity Metrics

Foundation Models Boost Low-Level Perceptual Similarity Metrics

URL: http://arxiv.org/abs/2409.07650v1
Date: Wed, 11 Sep 2024 22:32:12 GMT
Title: Foundation Models Boost Low-Level Perceptual Similarity Metrics
Authors: Abhijay Ghildyal, Nabajeet Barman, Saman Zadtootaghaj,
Abstract summary: For full-reference image quality assessment (FR-IQA) using deep-learning approaches, the perceptual similarity score between a distorted image and a reference image is typically computed as a distance measure between features extracted from a pretrained CNN or more recently, a Transformer network. This work explores the potential of utilizing the intermediate features of these foundation models, which have largely been unexplored so far in the design of low-level perceptual similarity metrics.
Score: 6.226609932118124
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: For full-reference image quality assessment (FR-IQA) using deep-learning approaches, the perceptual similarity score between a distorted image and a reference image is typically computed as a distance measure between features extracted from a pretrained CNN or more recently, a Transformer network. Often, these intermediate features require further fine-tuning or processing with additional neural network layers to align the final similarity scores with human judgments. So far, most IQA models based on foundation models have primarily relied on the final layer or the embedding for the quality score estimation. In contrast, this work explores the potential of utilizing the intermediate features of these foundation models, which have largely been unexplored so far in the design of low-level perceptual similarity metrics. We demonstrate that the intermediate features are comparatively more effective. Moreover, without requiring any training, these metrics can outperform both traditional and state-of-the-art learned metrics by utilizing distance measures between the features.

Related papers

Visual Explanation via Similar Feature Activation for Metric Learning [23.559106251249872]
Class activation maps (CAM) have been extensively employed to explore the interpretability of softmax-based convolutional neural networks.<n>We propose a novel visual explanation method termed Similar Feature Activation Map (SFAM)<n>SFAM provides highly promising interpretable visual explanations for CNN models using Euclidean distance or cosine similarity as the similarity metric.
arXiv Detail & Related papers (2025-06-02T13:14:37Z)
Scene Perceived Image Perceptual Score (SPIPS): combining global and local perception for image quality assessment [0.0]
We propose a novel IQA approach that bridges the gap between deep learning methods and human perception. Our model disentangles deep features into high-level semantic information and low-level perceptual details, treating each stream separately. This hybrid design enables the model to assess both global context and intricate image details, better reflecting the human visual process.
arXiv Detail & Related papers (2025-04-24T04:06:07Z)
DiffSim: Taming Diffusion Models for Evaluating Visual Similarity [19.989551230170584]
This paper introduces the DiffSim method to measure visual similarity in generative models. By aligning features in the attention layers of the denoising U-Net, DiffSim evaluates both appearance and style similarity. We also introduce the Sref and IP benchmarks to evaluate visual similarity at the level of style and instance.
arXiv Detail & Related papers (2024-12-19T07:00:03Z)
On Layer-wise Representation Similarity: Application for Multi-Exit Models with a Single Classifier [20.17288970927518]
We study the similarity of representations between the hidden layers of individual transformers. We propose an aligned training approach to enhance the similarity between internal representations.
arXiv Detail & Related papers (2024-06-20T16:41:09Z)
When No-Reference Image Quality Models Meet MAP Estimation in Diffusion Latents [92.45867913876691]
No-reference image quality assessment (NR-IQA) models can effectively quantify perceived image quality. We show that NR-IQA models can be plugged into the maximum a posteriori (MAP) estimation framework for image enhancement.
arXiv Detail & Related papers (2024-03-11T03:35:41Z)
TOPIQ: A Top-down Approach from Semantics to Distortions for Image Quality Assessment [53.72721476803585]
Image Quality Assessment (IQA) is a fundamental task in computer vision that has witnessed remarkable progress with deep neural networks. We propose a top-down approach that uses high-level semantics to guide the IQA network to focus on semantically important local distortion regions. A key component of our approach is the proposed cross-scale attention mechanism, which calculates attention maps for lower level features.
arXiv Detail & Related papers (2023-08-06T09:08:37Z)
Layer-wise Linear Mode Connectivity [52.6945036534469]
Averaging neural network parameters is an intuitive method for the knowledge of two independent models. It is most prominently used in federated learning. We analyse the performance of the models that result from averaging single, or groups.
arXiv Detail & Related papers (2023-07-13T09:39:10Z)
DeepDC: Deep Distance Correlation as a Perceptual Image Quality Evaluator [53.57431705309919]
ImageNet pre-trained deep neural networks (DNNs) show notable transferability for building effective image quality assessment (IQA) models. We develop a novel full-reference IQA (FR-IQA) model based exclusively on pre-trained DNN features. We conduct comprehensive experiments to demonstrate the superiority of the proposed quality model on five standard IQA datasets.
arXiv Detail & Related papers (2022-11-09T14:57:27Z)
CONVIQT: Contrastive Video Quality Estimator [63.749184706461826]
Perceptual video quality assessment (VQA) is an integral component of many streaming and video sharing platforms. Here we consider the problem of learning perceptually relevant video quality representations in a self-supervised manner. Our results indicate that compelling representations with perceptual bearing can be obtained using self-supervised learning.
arXiv Detail & Related papers (2022-06-29T15:22:01Z)
Learning Transformer Features for Image Quality Assessment [53.51379676690971]
We propose a unified IQA framework that utilizes CNN backbone and transformer encoder to extract features. The proposed framework is compatible with both FR and NR modes and allows for a joint training scheme.
arXiv Detail & Related papers (2021-12-01T13:23:00Z)
Who Explains the Explanation? Quantitatively Assessing Feature Attribution Methods [0.0]
We propose a novel evaluation metric -- the Focus -- designed to quantify the faithfulness of explanations. We show the robustness of the metric through randomization experiments, and then use Focus to evaluate and compare three popular explainability techniques. Our results find LRP and GradCAM to be consistent and reliable, while the latter remains most competitive even when applied to poorly performing models.
arXiv Detail & Related papers (2021-09-28T07:10:24Z)
A combined full-reference image quality assessment approach based on convolutional activation maps [0.0]
The goal of full-reference image quality assessment (FR-IQA) is to predict the quality of an image as perceived by human observers with using its pristine, reference counterpart. In this study, we explore a novel, combined approach which predicts the perceptual quality of a distorted image by compiling a feature vector from convolutional activation maps.
arXiv Detail & Related papers (2020-10-19T10:00:29Z)
Eigen-CAM: Class Activation Map using Principal Components [1.2691047660244335]
This paper builds on previous ideas to cope with the increasing demand for interpretable, robust, and transparent models. The proposed Eigen-CAM computes and visualizes the principle components of the learned features/representations from the convolutional layers.
arXiv Detail & Related papers (2020-08-01T17:14:13Z)
Deep Keypoint-Based Camera Pose Estimation with Geometric Constraints [80.60538408386016]
Estimating relative camera poses from consecutive frames is a fundamental problem in visual odometry. We propose an end-to-end trainable framework consisting of learnable modules for detection, feature extraction, matching and outlier rejection.
arXiv Detail & Related papers (2020-07-29T21:41:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.