Reliable Fidelity and Diversity Metrics for Generative Models
- URL: http://arxiv.org/abs/2002.09797v2
- Date: Sun, 28 Jun 2020 20:37:50 GMT
- Title: Reliable Fidelity and Diversity Metrics for Generative Models
- Authors: Muhammad Ferjad Naeem, Seong Joon Oh, Youngjung Uh, Yunjey Choi,
Jaejun Yoo
- Abstract summary: The most widely used metric for measuring the similarity between real and generated images has been the Fr'echet Inception Distance (FID) score.
We show that even the latest version of the precision and recall metrics are not reliable yet.
We propose density and coverage metrics that solve the above issues.
- Score: 30.941563781926202
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Devising indicative evaluation metrics for the image generation task remains
an open problem. The most widely used metric for measuring the similarity
between real and generated images has been the Fr\'echet Inception Distance
(FID) score. Because it does not differentiate the fidelity and diversity
aspects of the generated images, recent papers have introduced variants of
precision and recall metrics to diagnose those properties separately. In this
paper, we show that even the latest version of the precision and recall metrics
are not reliable yet. For example, they fail to detect the match between two
identical distributions, they are not robust against outliers, and the
evaluation hyperparameters are selected arbitrarily. We propose density and
coverage metrics that solve the above issues. We analytically and
experimentally show that density and coverage provide more interpretable and
reliable signals for practitioners than the existing metrics. Code:
https://github.com/clovaai/generative-evaluation-prdc.
Related papers
- Semi-supervised Counting via Pixel-by-pixel Density Distribution
Modelling [135.66138766927716]
This paper focuses on semi-supervised crowd counting, where only a small portion of the training data are labeled.
We formulate the pixel-wise density value to regress as a probability distribution, instead of a single deterministic value.
Our method clearly outperforms the competitors by a large margin under various labeled ratio settings.
arXiv Detail & Related papers (2024-02-23T12:48:02Z) - Cobra Effect in Reference-Free Image Captioning Metrics [58.438648377314436]
A proliferation of reference-free methods, leveraging visual-language pre-trained models (VLMs), has emerged.
In this paper, we study if there are any deficiencies in reference-free metrics.
We employ GPT-4V as an evaluative tool to assess generated sentences and the result reveals that our approach achieves state-of-the-art (SOTA) performance.
arXiv Detail & Related papers (2024-02-18T12:36:23Z) - Rethinking FID: Towards a Better Evaluation Metric for Image Generation [43.66036053597747]
Inception Distance estimates the distance between a distribution of Inception-v3 features of real images, and those of images generated by the algorithm.
We highlight important drawbacks of FID: Inception's poor representation of the rich and varied content generated by modern text-to-image models, incorrect normality assumptions, and poor sample complexity.
We propose an alternative new metric, CMMD, based on richer CLIP embeddings and the maximum mean discrepancy distance with the Gaussian RBF kernel.
arXiv Detail & Related papers (2023-11-30T19:11:01Z) - Probabilistic Precision and Recall Towards Reliable Evaluation of
Generative Models [7.770029179741429]
We propose P-precision and P-recall (PP&PR), based on a probabilistic approach that address the problems.
We show that our PP&PR provide more reliable estimates for comparing fidelity and diversity than the existing metrics.
arXiv Detail & Related papers (2023-09-04T13:19:17Z) - The Treasure Beneath Multiple Annotations: An Uncertainty-aware Edge
Detector [70.43599299422813]
Existing methods fuse multiple annotations using a simple voting process, ignoring the inherent ambiguity of edges and labeling bias of annotators.
We propose a novel uncertainty-aware edge detector (UAED), which employs uncertainty to investigate the subjectivity and ambiguity of diverse annotations.
UAED achieves superior performance consistently across multiple edge detection benchmarks.
arXiv Detail & Related papers (2023-03-21T13:14:36Z) - Identifying and Mitigating Flaws of Deep Perceptual Similarity Metrics [1.484528358552186]
This work investigates the benefits and flaws of the Deep Perceptual Similarity (DPS) metric.
The metrics are analyzed in-depth to understand the strengths and weaknesses of the metrics.
This work contributes with new insights into the flaws of DPS, and further suggests improvements to the metrics.
arXiv Detail & Related papers (2022-07-06T08:28:39Z) - Rarity Score : A New Metric to Evaluate the Uncommonness of Synthesized
Images [32.94581354719927]
We propose a new evaluation metric, called rarity score', to measure the individual rarity of each image.
Code will be publicly available online for the research community.
arXiv Detail & Related papers (2022-06-17T05:16:16Z) - On the Relation between Quality-Diversity Evaluation and
Distribution-Fitting Goal in Text Generation [86.11292297348622]
We show that a linear combination of quality and diversity constitutes a divergence metric between the generated distribution and the real distribution.
We propose CR/NRR as a substitute for quality/diversity metric pair.
arXiv Detail & Related papers (2020-07-03T04:06:59Z) - Learning to Evaluate Perception Models Using Planner-Centric Metrics [104.33349410009161]
We propose a principled metric for 3D object detection specifically for the task of self-driving.
We find that our metric penalizes many of the mistakes that other metrics penalize by design.
For human evaluation, we generate scenes in which standard metrics and our metric disagree and find that humans side with our metric 79% of the time.
arXiv Detail & Related papers (2020-04-19T02:14:00Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.