Invariance of deep image quality metrics to affine transformations
- URL: http://arxiv.org/abs/2407.17927v1
- Date: Thu, 25 Jul 2024 10:24:54 GMT
- Title: Invariance of deep image quality metrics to affine transformations
- Authors: Nuria Alabau-Bosque, Paula Daudén-Oliver, Jorge Vila-Tomás, Valero Laparra, Jesús Malo,
- Abstract summary: We evaluate state-of-the-art deep image quality metrics by assessing their invariance to affine transformations.
We psychophysically measure an absolute detection threshold in that common representation and express it in the physical units of each affine transform.
We find that none of the state-of-the-art metrics shows human-like results under this strong test based on invisibility thresholds.
- Score: 0.932065750652415
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep architectures are the current state-of-the-art in predicting subjective image quality. Usually, these models are evaluated according to their ability to correlate with human opinion in databases with a range of distortions that may appear in digital media. However, these oversee affine transformations which may represent better the changes in the images actually happening in natural conditions. Humans can be particularly invariant to these natural transformations, as opposed to the digital ones. In this work, we evaluate state-of-the-art deep image quality metrics by assessing their invariance to affine transformations, specifically: rotation, translation, scaling, and changes in spectral illumination. We propose a methodology to assign invisibility thresholds for any perceptual metric. This methodology involves transforming the distance measured by an arbitrary metric to a common distance representation based on available subjectively rated databases. We psychophysically measure an absolute detection threshold in that common representation and express it in the physical units of each affine transform for each metric. By doing so, we allow the analyzed metrics to be directly comparable with actual human thresholds. We find that none of the state-of-the-art metrics shows human-like results under this strong test based on invisibility thresholds. This means that tuning the models exclusively to predict the visibility of generic distortions may disregard other properties of human vision as for instance invariances or invisibility thresholds.
Related papers
- Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images [68.42215385041114]
This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection.
Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels.
Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models.
arXiv Detail & Related papers (2024-03-19T09:28:19Z) - Describing Images $\textit{Fast and Slow}$: Quantifying and Predicting
the Variation in Human Signals during Visuo-Linguistic Processes [4.518404103861656]
We study the nature of variation in visuo-linguistic signals, and find that they correlate with each other.
Given this result, we hypothesize that variation stems partly from the properties of the images, and explore whether image representations encoded by pretrained vision encoders can capture such variation.
Our results indicate that pretrained models do so to a weak-to-moderate degree, suggesting that the models lack biases about what makes a stimulus complex for humans and what leads to variations in human outputs.
arXiv Detail & Related papers (2024-02-02T12:11:16Z) - Perceptual Scales Predicted by Fisher Information Metrics [0.6906005491572401]
Perception is often viewed as a process that transforms physical variables, external to an observer, into internal psychological variables.
The perceptual scale can be deduced from psychophysical measurements that consist in comparing the relative differences between stimuli.
Here, we demonstrate the value of measuring the perceptual scale of classical (spatial frequency, orientation) and less classical physical variables.
arXiv Detail & Related papers (2023-10-18T07:31:47Z) - Subjective Face Transform using Human First Impressions [5.026535087391025]
This work uses generative models to find semantically meaningful edits to a face image that change perceived attributes.
We train on real and synthetic faces, evaluate for in-domain and out-of-domain images using predictive models and human ratings.
arXiv Detail & Related papers (2023-09-27T03:21:07Z) - Privacy Assessment on Reconstructed Images: Are Existing Evaluation
Metrics Faithful to Human Perception? [86.58989831070426]
We study the faithfulness of hand-crafted metrics to human perception of privacy information from reconstructed images.
We propose a learning-based measure called SemSim to evaluate the Semantic Similarity between the original and reconstructed images.
arXiv Detail & Related papers (2023-09-22T17:58:04Z) - DreamSim: Learning New Dimensions of Human Visual Similarity using
Synthetic Data [43.247597420676044]
Current perceptual similarity metrics operate at the level of pixels and patches.
These metrics compare images in terms of their low-level colors and textures, but fail to capture mid-level similarities and differences in image layout, object pose, and semantic content.
We develop a perceptual metric that assesses images holistically.
arXiv Detail & Related papers (2023-06-15T17:59:50Z) - Unsupervised Learning Facial Parameter Regressor for Action Unit
Intensity Estimation via Differentiable Renderer [51.926868759681014]
We present a framework to predict the facial parameters based on a bone-driven face model (BDFM) under different views.
The proposed framework consists of a feature extractor, a generator, and a facial parameter regressor.
arXiv Detail & Related papers (2020-08-20T09:49:13Z) - Adversarial Semantic Data Augmentation for Human Pose Estimation [96.75411357541438]
We propose Semantic Data Augmentation (SDA), a method that augments images by pasting segmented body parts with various semantic granularity.
We also propose Adversarial Semantic Data Augmentation (ASDA), which exploits a generative network to dynamiclly predict tailored pasting configuration.
State-of-the-art results are achieved on challenging benchmarks.
arXiv Detail & Related papers (2020-08-03T07:56:04Z) - Learning Disentangled Representations with Latent Variation
Predictability [102.4163768995288]
This paper defines the variation predictability of latent disentangled representations.
Within an adversarial generation process, we encourage variation predictability by maximizing the mutual information between latent variations and corresponding image pairs.
We develop an evaluation metric that does not rely on the ground-truth generative factors to measure the disentanglement of latent representations.
arXiv Detail & Related papers (2020-07-25T08:54:26Z) - NiLBS: Neural Inverse Linear Blend Skinning [59.22647012489496]
We introduce a method to invert the deformations undergone via traditional skinning techniques via a neural network parameterized by pose.
The ability to invert these deformations allows values (e.g., distance function, signed distance function, occupancy) to be pre-computed at rest pose, and then efficiently queried when the character is deformed.
arXiv Detail & Related papers (2020-04-06T20:46:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.