Using Skew to Assess the Quality of GAN-generated Image Features
- URL: http://arxiv.org/abs/2310.20636v2
- Date: Mon, 29 Apr 2024 23:13:32 GMT
- Title: Using Skew to Assess the Quality of GAN-generated Image Features
- Authors: Lorenzo Luzi, Helen Jenne, Ryan Murray, Carlos Ortiz Marrero,
- Abstract summary: The Fr'echetInception Distance (FID) has been widely adopted due to its conceptual simplicity, fast computation time, and strong correlation with human perception.
In this paper we explore the importance of third-moments in image feature data and use this information to define a new measure, which we call the Skew Inception Distance (SID)
- Score: 3.300324211572204
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid advancement of Generative Adversarial Networks (GANs) necessitates the need to robustly evaluate these models. Among the established evaluation criteria, the Fr\'{e}chetInception Distance (FID) has been widely adopted due to its conceptual simplicity, fast computation time, and strong correlation with human perception. However, FID has inherent limitations, mainly stemming from its assumption that feature embeddings follow a Gaussian distribution, and therefore can be defined by their first two moments. As this does not hold in practice, in this paper we explore the importance of third-moments in image feature data and use this information to define a new measure, which we call the Skew Inception Distance (SID). We prove that SID is a pseudometric on probability distributions, show how it extends FID, and present a practical method for its computation. Our numerical experiments support that SID either tracks with FID or, in some cases, aligns more closely with human perception when evaluating image features of ImageNet data. Our work also shows that principal component analysis can be used to speed up the computation time of both FID and SID. Although we focus on using SID on image features for GAN evaluation, SID is applicable much more generally, including for the evaluation of other generative models.
Related papers
- Analyzing the Feature Extractor Networks for Face Image Synthesis [0.0]
This study investigates the behavior of diverse feature extractors -- InceptionV3, CLIP, DINOv2, and ArcFace -- considering a variety of metrics -- FID, KID, Precision&Recall.
Experiments include deep-down analysis of the features: $L$ normalization, model attention during extraction, and domain distributions in the feature space.
arXiv Detail & Related papers (2024-06-04T09:41:40Z) - Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval [50.72924579220149]
Composed Image Retrieval (CIR) is a task that retrieves images similar to a query, based on a provided textual modification.
Current techniques rely on supervised learning for CIR models using labeled triplets of the reference image, text, target image.
We propose a new semi-supervised CIR approach where we search for a reference and its related target images in auxiliary data.
arXiv Detail & Related papers (2024-04-23T21:00:22Z) - Reviewing FID and SID Metrics on Generative Adversarial Networks [0.0]
The growth of generative adversarial network (GAN) models has increased the ability of image processing.
Previous research has shown the Fr'echet Inception Distance (FID) to be an effective metric when testing these image-to-image GANs in real-world applications.
This paper uses public datasets that consist of faccades, cityscapes, and maps within Pix2Pix and CycleGAN models.
After training these models are evaluated on both distance metrics which measure the generating performance of the trained models.
arXiv Detail & Related papers (2024-02-06T03:02:39Z) - Rethinking FID: Towards a Better Evaluation Metric for Image Generation [43.66036053597747]
Inception Distance estimates the distance between a distribution of Inception-v3 features of real images, and those of images generated by the algorithm.
We highlight important drawbacks of FID: Inception's poor representation of the rich and varied content generated by modern text-to-image models, incorrect normality assumptions, and poor sample complexity.
We propose an alternative new metric, CMMD, based on richer CLIP embeddings and the maximum mean discrepancy distance with the Gaussian RBF kernel.
arXiv Detail & Related papers (2023-11-30T19:11:01Z) - Semantics Meets Temporal Correspondence: Self-supervised Object-centric Learning in Videos [63.94040814459116]
Self-supervised methods have shown remarkable progress in learning high-level semantics and low-level temporal correspondence.
We propose a novel semantic-aware masked slot attention on top of the fused semantic features and correspondence maps.
We adopt semantic- and instance-level temporal consistency as self-supervision to encourage temporally coherent object-centric representations.
arXiv Detail & Related papers (2023-08-19T09:12:13Z) - The Role of ImageNet Classes in Fr\'echet Inception Distance [33.47601032254247]
Inception Distance (FID) is a metric for quantifying the distance between two distributions of images.
We observe that FID is essentially a distance between sets of ImageNet class probabilities.
Our results suggest caution against over-interpreting FID improvements, and underline the need for distribution metrics that are more perceptually uniform.
arXiv Detail & Related papers (2022-03-11T15:50:06Z) - Inter-class Discrepancy Alignment for Face Recognition [55.578063356210144]
We propose a unified framework calledInter-class DiscrepancyAlignment(IDA)
IDA-DAO is used to align the similarity scores considering the discrepancy between the images and its neighbors.
IDA-SSE can provide convincing inter-class neighbors by introducing virtual candidate images generated with GAN.
arXiv Detail & Related papers (2021-03-02T08:20:08Z) - Semantics-aware Adaptive Knowledge Distillation for Sensor-to-Vision
Action Recognition [131.6328804788164]
We propose a framework, named Semantics-aware Adaptive Knowledge Distillation Networks (SAKDN), to enhance action recognition in vision-sensor modality (videos)
The SAKDN uses multiple wearable-sensors as teacher modalities and uses RGB videos as student modality.
arXiv Detail & Related papers (2020-09-01T03:38:31Z) - Identity-Aware Attribute Recognition via Real-Time Distributed Inference
in Mobile Edge Clouds [53.07042574352251]
We design novel models for pedestrian attribute recognition with re-ID in an MEC-enabled camera monitoring system.
We propose a novel inference framework with a set of distributed modules, by jointly considering the attribute recognition and person re-ID.
We then devise a learning-based algorithm for the distributions of the modules of the proposed distributed inference framework.
arXiv Detail & Related papers (2020-08-12T12:03:27Z) - Ventral-Dorsal Neural Networks: Object Detection via Selective Attention [51.79577908317031]
We propose a new framework called Ventral-Dorsal Networks (VDNets)
Inspired by the structure of the human visual system, we propose the integration of a "Ventral Network" and a "Dorsal Network"
Our experimental results reveal that the proposed method outperforms state-of-the-art object detection approaches.
arXiv Detail & Related papers (2020-05-15T23:57:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.