A Distributional Evaluation of Generative Image Models
- URL: http://arxiv.org/abs/2501.00744v1
- Date: Wed, 01 Jan 2025 06:23:18 GMT
- Title: A Distributional Evaluation of Generative Image Models
- Authors: Edric Tam, Barbara E Engelhardt,
- Abstract summary: We focus on evaluating image generative models, where studies often treat human evaluation as the gold standard.
We propose the Embedded Characteristic Score (ECS), a comprehensive metric for evaluating the distributional match between the learned and target sample distributions.
- Score: 2.520143908749992
- License:
- Abstract: Generative models are ubiquitous in modern artificial intelligence (AI) applications. Recent advances have led to a variety of generative modeling approaches that are capable of synthesizing highly realistic samples. Despite these developments, evaluating the distributional match between the synthetic samples and the target distribution in a statistically principled way remains a core challenge. We focus on evaluating image generative models, where studies often treat human evaluation as the gold standard. Commonly adopted metrics, such as the Fr\'echet Inception Distance (FID), do not sufficiently capture the differences between the learned and target distributions, because the assumption of normality ignores differences in the tails. We propose the Embedded Characteristic Score (ECS), a comprehensive metric for evaluating the distributional match between the learned and target sample distributions, and explore its connection with moments and tail behavior. We derive natural properties of ECS and show its practical use via simulations and an empirical study.
Related papers
- Analyzing Generative Models by Manifold Entropic Metrics [8.477943884416023]
We introduce a novel set of tractable information-theoretic evaluation metrics.
We compare various normalizing flow architectures and $beta$-VAEs on the EMNIST dataset.
The most interesting finding of our experiments is a ranking of model architectures and training procedures in terms of their inductive bias to converge to aligned and disentangled representations during training.
arXiv Detail & Related papers (2024-10-25T09:35:00Z) - MITA: Bridging the Gap between Model and Data for Test-time Adaptation [68.62509948690698]
Test-Time Adaptation (TTA) has emerged as a promising paradigm for enhancing the generalizability of models.
We propose Meet-In-The-Middle based MITA, which introduces energy-based optimization to encourage mutual adaptation of the model and data from opposing directions.
arXiv Detail & Related papers (2024-10-12T07:02:33Z) - On the Out of Distribution Robustness of Foundation Models in Medical
Image Segmentation [47.95611203419802]
Foundations for vision and language, pre-trained on extensive sets of natural image and text data, have emerged as a promising approach.
We compare the generalization performance to unseen domains of various pre-trained models after being fine-tuned on the same in-distribution dataset.
We further developed a new Bayesian uncertainty estimation for frozen models and used them as an indicator to characterize the model's performance on out-of-distribution data.
arXiv Detail & Related papers (2023-11-18T14:52:10Z) - Revisiting the Evaluation of Image Synthesis with GANs [55.72247435112475]
This study presents an empirical investigation into the evaluation of synthesis performance, with generative adversarial networks (GANs) as a representative of generative models.
In particular, we make in-depth analyses of various factors, including how to represent a data point in the representation space, how to calculate a fair distance using selected samples, and how many instances to use from each set.
arXiv Detail & Related papers (2023-04-04T17:54:32Z) - MAUVE Scores for Generative Models: Theory and Practice [95.86006777961182]
We present MAUVE, a family of comparison measures between pairs of distributions such as those encountered in the generative modeling of text or images.
We find that MAUVE can quantify the gaps between the distributions of human-written text and those of modern neural language models.
We demonstrate in the vision domain that MAUVE can identify known properties of generated images on par with or better than existing metrics.
arXiv Detail & Related papers (2022-12-30T07:37:40Z) - General Greedy De-bias Learning [163.65789778416172]
We propose a General Greedy De-bias learning framework (GGD), which greedily trains the biased models and the base model like gradient descent in functional space.
GGD can learn a more robust base model under the settings of both task-specific biased models with prior knowledge and self-ensemble biased model without prior knowledge.
arXiv Detail & Related papers (2021-12-20T14:47:32Z) - How Faithful is your Synthetic Data? Sample-level Metrics for Evaluating
and Auditing Generative Models [95.8037674226622]
We introduce a 3-dimensional evaluation metric that characterizes the fidelity, diversity and generalization performance of any generative model in a domain-agnostic fashion.
Our metric unifies statistical divergence measures with precision-recall analysis, enabling sample- and distribution-level diagnoses of model fidelity and diversity.
arXiv Detail & Related papers (2021-02-17T18:25:30Z) - Natural vs Balanced Distribution in Deep Learning on Whole Slide Images
for Cancer Detection [14.34071893713379]
We analyze the impact of natural and balanced distributions of the training set in deep learning (DL) models applied on histological images.
We show that keeping the WSIs data in their usual distribution for DL training produces fewer false positives (FPs) with comparable false negatives (FNs) than the artificially-obtained balanced distribution.
arXiv Detail & Related papers (2020-12-21T21:18:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.