Invariance of deep image quality metrics to affine transformations
- URL: http://arxiv.org/abs/2407.17927v2
- Date: Mon, 29 Jul 2024 11:55:53 GMT
- Title: Invariance of deep image quality metrics to affine transformations
- Authors: Nuria Alabau-Bosque, Paula Daudén-Oliver, Jorge Vila-Tomás, Valero Laparra, Jesús Malo,
- Abstract summary: We evaluate state-of-the-art deep image quality metrics by assessing their invariance to affine transformations.
We propose a methodology to assign such invisibility thresholds for any perceptual metric.
We find that none of the state-of-the-art metrics shows human-like results under this strong test based on invisibility thresholds.
- Score: 0.932065750652415
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep architectures are the current state-of-the-art in predicting subjective image quality. Usually, these models are evaluated according to their ability to correlate with human opinion in databases with a range of distortions that may appear in digital media. However, these oversee affine transformations which may represent better the changes in the images actually happening in natural conditions. Humans can be particularly invariant to these natural transformations, as opposed to the digital ones. In this work, we evaluate state-of-the-art deep image quality metrics by assessing their invariance to affine transformations, specifically: rotation, translation, scaling, and changes in spectral illumination. Here invariance of a metric refers to the fact that certain distances should be neglected (considered to be zero) if their values are below a threshold. This is what we call invisibility threshold of a metric. We propose a methodology to assign such invisibility thresholds for any perceptual metric. This methodology involves transformations to a distance space common to any metric, and psychophysical measurements of thresholds in this common space. By doing so, we allow the analyzed metrics to be directly comparable with actual human thresholds. We find that none of the state-of-the-art metrics shows human-like results under this strong test based on invisibility thresholds. This means that tuning the models exclusively to predict the visibility of generic distortions may disregard other properties of human vision as for instance invariances or invisibility thresholds.
Related papers
- Perceptual Scales Predicted by Fisher Information Metrics [0.6906005491572401]
Perception is often viewed as a process that transforms physical variables, external to an observer, into internal psychological variables.
The perceptual scale can be deduced from psychophysical measurements that consist in comparing the relative differences between stimuli.
Here, we demonstrate the value of measuring the perceptual scale of classical (spatial frequency, orientation) and less classical physical variables.
arXiv Detail & Related papers (2023-10-18T07:31:47Z) - Subjective Face Transform using Human First Impressions [5.026535087391025]
This work uses generative models to find semantically meaningful edits to a face image that change perceived attributes.
We train on real and synthetic faces, evaluate for in-domain and out-of-domain images using predictive models and human ratings.
arXiv Detail & Related papers (2023-09-27T03:21:07Z) - Privacy Assessment on Reconstructed Images: Are Existing Evaluation
Metrics Faithful to Human Perception? [86.58989831070426]
We study the faithfulness of hand-crafted metrics to human perception of privacy information from reconstructed images.
We propose a learning-based measure called SemSim to evaluate the Semantic Similarity between the original and reconstructed images.
arXiv Detail & Related papers (2023-09-22T17:58:04Z) - Learning Transformations To Reduce the Geometric Shift in Object
Detection [60.20931827772482]
We tackle geometric shifts emerging from variations in the image capture process.
We introduce a self-training approach that learns a set of geometric transformations to minimize these shifts.
We evaluate our method on two different shifts, i.e., a camera's field of view (FoV) change and a viewpoint change.
arXiv Detail & Related papers (2023-01-13T11:55:30Z) - Shift-tolerant Perceptual Similarity Metric [5.326626090397465]
Existing perceptual similarity metrics assume an image and its reference are well aligned.
This paper studies the effect of small misalignment, specifically a small shift between the input and reference image, on existing metrics.
We develop a new deep neural network-based perceptual similarity metric.
arXiv Detail & Related papers (2022-07-27T17:55:04Z) - Unsupervised Learning Facial Parameter Regressor for Action Unit
Intensity Estimation via Differentiable Renderer [51.926868759681014]
We present a framework to predict the facial parameters based on a bone-driven face model (BDFM) under different views.
The proposed framework consists of a feature extractor, a generator, and a facial parameter regressor.
arXiv Detail & Related papers (2020-08-20T09:49:13Z) - Shift Equivariance in Object Detection [8.03777903218606]
Recent works have shown that CNN-based classifiers are not shift invariant.
It is unclear to what extent this could impact object detection, mainly because of the architectural differences between the two and the dimensionality of the prediction space of modern detectors.
We propose an evaluation metric, built upon a greedy search of the lower and upper bounds of the mean average precision on a shifted image set.
arXiv Detail & Related papers (2020-08-13T10:02:02Z) - Adversarial Semantic Data Augmentation for Human Pose Estimation [96.75411357541438]
We propose Semantic Data Augmentation (SDA), a method that augments images by pasting segmented body parts with various semantic granularity.
We also propose Adversarial Semantic Data Augmentation (ASDA), which exploits a generative network to dynamiclly predict tailored pasting configuration.
State-of-the-art results are achieved on challenging benchmarks.
arXiv Detail & Related papers (2020-08-03T07:56:04Z) - Learning Disentangled Representations with Latent Variation
Predictability [102.4163768995288]
This paper defines the variation predictability of latent disentangled representations.
Within an adversarial generation process, we encourage variation predictability by maximizing the mutual information between latent variations and corresponding image pairs.
We develop an evaluation metric that does not rely on the ground-truth generative factors to measure the disentanglement of latent representations.
arXiv Detail & Related papers (2020-07-25T08:54:26Z) - NiLBS: Neural Inverse Linear Blend Skinning [59.22647012489496]
We introduce a method to invert the deformations undergone via traditional skinning techniques via a neural network parameterized by pose.
The ability to invert these deformations allows values (e.g., distance function, signed distance function, occupancy) to be pre-computed at rest pose, and then efficiently queried when the character is deformed.
arXiv Detail & Related papers (2020-04-06T20:46:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.