Statistical Model Criticism of Variational Auto-Encoders
- URL: http://arxiv.org/abs/2204.03030v1
- Date: Wed, 6 Apr 2022 18:19:29 GMT
- Title: Statistical Model Criticism of Variational Auto-Encoders
- Authors: Claartje Barkhof and Wilker Aziz
- Abstract summary: We propose a framework for the statistical evaluation of variational auto-encoders (VAEs)
We test two instances of this framework in the context of modelling images of handwritten digits and a corpus of English text.
- Score: 15.005894753472894
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a framework for the statistical evaluation of variational
auto-encoders (VAEs) and test two instances of this framework in the context of
modelling images of handwritten digits and a corpus of English text. Our take
on evaluation is based on the idea of statistical model criticism, popular in
Bayesian data analysis, whereby a statistical model is evaluated in terms of
its ability to reproduce statistics of an unknown data generating process from
which we can obtain samples. A VAE learns not one, but two joint distributions
over a shared sample space, each exploiting a choice of factorisation that
makes sampling tractable in one of two directions (latent-to-data,
data-to-latent). We evaluate samples from these distributions, assessing their
(marginal) fit to the observed data and our choice of prior, and we also
evaluate samples through a pipeline that connects the two distributions
starting from a data sample, assessing whether together they exploit and reveal
latent factors of variation that are useful to a practitioner. We show that
this methodology offers possibilities for model selection qualitatively beyond
intrinsic evaluation metrics and at a finer granularity than commonly used
statistics can offer.
Related papers
- PQMass: Probabilistic Assessment of the Quality of Generative Models
using Probability Mass Estimation [8.527898482146103]
We propose a comprehensive sample-based method for assessing the quality of generative models.
The proposed approach enables the estimation of the probability that two sets of samples are drawn from the same distribution.
arXiv Detail & Related papers (2024-02-06T19:39:26Z) - On the Distributed Evaluation of Generative Models [15.629121946912088]
We focus on the widely-used distance-based evaluation metrics, Fr'echet Inception Distance (FID) and Kernel Inception Distance (KID)
In the case of KID metric, we prove that scoring a group of generative models using the clients' averaged KID score will result in the same ranking as that of a centralized KID evaluation over a collective reference set containing all the clients' data.
We provide examples in which two generative models are assigned the same FID score by each client in a distributed setting, while the centralized FID scores of the two models are significantly different.
arXiv Detail & Related papers (2023-10-18T05:06:04Z) - Revisiting the Evaluation of Image Synthesis with GANs [55.72247435112475]
This study presents an empirical investigation into the evaluation of synthesis performance, with generative adversarial networks (GANs) as a representative of generative models.
In particular, we make in-depth analyses of various factors, including how to represent a data point in the representation space, how to calculate a fair distance using selected samples, and how many instances to use from each set.
arXiv Detail & Related papers (2023-04-04T17:54:32Z) - MAUVE Scores for Generative Models: Theory and Practice [95.86006777961182]
We present MAUVE, a family of comparison measures between pairs of distributions such as those encountered in the generative modeling of text or images.
We find that MAUVE can quantify the gaps between the distributions of human-written text and those of modern neural language models.
We demonstrate in the vision domain that MAUVE can identify known properties of generated images on par with or better than existing metrics.
arXiv Detail & Related papers (2022-12-30T07:37:40Z) - fAux: Testing Individual Fairness via Gradient Alignment [2.5329739965085785]
We describe a new approach for testing individual fairness that does not have either requirement.
We show that the proposed method effectively identifies discrimination on both synthetic and real-world datasets.
arXiv Detail & Related papers (2022-10-10T21:27:20Z) - Data-SUITE: Data-centric identification of in-distribution incongruous
examples [81.21462458089142]
Data-SUITE is a data-centric framework to identify incongruous regions of in-distribution (ID) data.
We empirically validate Data-SUITE's performance and coverage guarantees.
arXiv Detail & Related papers (2022-02-17T18:58:31Z) - A Unified Statistical Learning Model for Rankings and Scores with
Application to Grant Panel Review [1.240096657086732]
Rankings and scores are two common data types used by judges to express preferences and/or perceptions of quality in a collection of objects.
Numerous models exist to study data of each type separately, but no unified statistical model captures both data types simultaneously.
We propose the Mallows-Binomial model to close this gap, which combines a Mallows' $phi$ ranking model with Binomial score models.
arXiv Detail & Related papers (2022-01-07T16:56:52Z) - CARMS: Categorical-Antithetic-REINFORCE Multi-Sample Gradient Estimator [60.799183326613395]
We propose an unbiased estimator for categorical random variables based on multiple mutually negatively correlated (jointly antithetic) samples.
CARMS combines REINFORCE with copula based sampling to avoid duplicate samples and reduce its variance, while keeping the estimator unbiased using importance sampling.
We evaluate CARMS on several benchmark datasets on a generative modeling task, as well as a structured output prediction task, and find it to outperform competing methods including a strong self-control baseline.
arXiv Detail & Related papers (2021-10-26T20:14:30Z) - Sampling from Arbitrary Functions via PSD Models [55.41644538483948]
We take a two-step approach by first modeling the probability distribution and then sampling from that model.
We show that these models can approximate a large class of densities concisely using few evaluations, and present a simple algorithm to effectively sample from these models.
arXiv Detail & Related papers (2021-10-20T12:25:22Z) - How Faithful is your Synthetic Data? Sample-level Metrics for Evaluating
and Auditing Generative Models [95.8037674226622]
We introduce a 3-dimensional evaluation metric that characterizes the fidelity, diversity and generalization performance of any generative model in a domain-agnostic fashion.
Our metric unifies statistical divergence measures with precision-recall analysis, enabling sample- and distribution-level diagnoses of model fidelity and diversity.
arXiv Detail & Related papers (2021-02-17T18:25:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.