Feature Likelihood Divergence: Evaluating the Generalization of
Generative Models Using Samples
- URL: http://arxiv.org/abs/2302.04440v4
- Date: Wed, 13 Mar 2024 00:48:30 GMT
- Title: Feature Likelihood Divergence: Evaluating the Generalization of
Generative Models Using Samples
- Authors: Marco Jiralerspong, Avishek Joey Bose, Ian Gemp, Chongli Qin, Yoram
Bachrach, Gauthier Gidel
- Abstract summary: Feature Likelihood Divergence provides a comprehensive trichotomic evaluation of generative models.
We empirically demonstrate the ability of FLD to identify overfitting problem cases, even when previously proposed metrics fail.
- Score: 25.657798631897908
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The past few years have seen impressive progress in the development of deep
generative models capable of producing high-dimensional, complex, and
photo-realistic data. However, current methods for evaluating such models
remain incomplete: standard likelihood-based metrics do not always apply and
rarely correlate with perceptual fidelity, while sample-based metrics, such as
FID, are insensitive to overfitting, i.e., inability to generalize beyond the
training set. To address these limitations, we propose a new metric called the
Feature Likelihood Divergence (FLD), a parametric sample-based metric that uses
density estimation to provide a comprehensive trichotomic evaluation accounting
for novelty (i.e., different from the training samples), fidelity, and
diversity of generated samples. We empirically demonstrate the ability of FLD
to identify overfitting problem cases, even when previously proposed metrics
fail. We also extensively evaluate FLD on various image datasets and model
classes, demonstrating its ability to match intuitions of previous metrics like
FID while offering a more comprehensive evaluation of generative models. Code
is available at https://github.com/marcojira/fld.
Related papers
- MeLIAD: Interpretable Few-Shot Anomaly Detection with Metric Learning and Entropy-based Scoring [2.394081903745099]
We propose MeLIAD, a novel methodology for interpretable anomaly detection.
MeLIAD is based on metric learning and achieves interpretability by design without relying on any prior distribution assumptions of true anomalies.
Experiments on five public benchmark datasets, including quantitative and qualitative evaluation of interpretability, demonstrate that MeLIAD achieves improved anomaly detection and localization performance.
arXiv Detail & Related papers (2024-09-20T16:01:43Z) - Exposing flaws of generative model evaluation metrics and their unfair
treatment of diffusion models [14.330863905963442]
We compare 17 modern metrics for evaluating the overall performance, fidelity, diversity, rarity, and memorization of generative models.
We find that the state-of-the-art perceptual realism of diffusion models as judged by humans is not reflected in commonly reported metrics such as FID.
Next, we investigate data memorization, and find that generative models do memorize training examples on simple, smaller datasets like CIFAR10, but not necessarily on more complex datasets like ImageNet.
arXiv Detail & Related papers (2023-06-07T18:00:00Z) - A Study on the Evaluation of Generative Models [19.18642459565609]
Implicit generative models, which do not return likelihood values, have become prevalent in recent years.
In this work, we study the evaluation metrics of generative models by generating a high-quality synthetic dataset.
Our study shows that while FID and IS do correlate to several f-divergences, their ranking of close models can vary considerably.
arXiv Detail & Related papers (2022-06-22T09:27:31Z) - Contrastive Model Inversion for Data-Free Knowledge Distillation [60.08025054715192]
We propose Contrastive Model Inversion, where the data diversity is explicitly modeled as an optimizable objective.
Our main observation is that, under the constraint of the same amount of data, higher data diversity usually indicates stronger instance discrimination.
Experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet demonstrate that CMI achieves significantly superior performance when the generated data are used for knowledge distillation.
arXiv Detail & Related papers (2021-05-18T15:13:00Z) - How Faithful is your Synthetic Data? Sample-level Metrics for Evaluating
and Auditing Generative Models [95.8037674226622]
We introduce a 3-dimensional evaluation metric that characterizes the fidelity, diversity and generalization performance of any generative model in a domain-agnostic fashion.
Our metric unifies statistical divergence measures with precision-recall analysis, enabling sample- and distribution-level diagnoses of model fidelity and diversity.
arXiv Detail & Related papers (2021-02-17T18:25:30Z) - Evaluating the Disentanglement of Deep Generative Models through
Manifold Topology [66.06153115971732]
We present a method for quantifying disentanglement that only uses the generative model.
We empirically evaluate several state-of-the-art models across multiple datasets.
arXiv Detail & Related papers (2020-06-05T20:54:11Z) - Evaluation Metrics for Conditional Image Generation [100.69766435176557]
We present two new metrics for evaluating generative models in the class-conditional image generation setting.
A theoretical analysis shows the motivation behind each proposed metric and links the novel metrics to their unconditional counterparts.
We provide an extensive empirical evaluation, comparing the metrics to their unconditional variants and to other metrics, and utilize them to analyze existing generative models.
arXiv Detail & Related papers (2020-04-26T12:15:16Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z) - Towards GAN Benchmarks Which Require Generalization [48.075521136623564]
We argue that estimating the function must require a large sample from the model.
We turn to neural network divergences (NNDs) which are defined in terms of a neural network trained to distinguish between distributions.
The resulting benchmarks cannot be "won" by training set memorization, while still being perceptually correlated and computable only from samples.
arXiv Detail & Related papers (2020-01-10T20:18:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.