MAUVE Scores for Generative Models: Theory and Practice
- URL: http://arxiv.org/abs/2212.14578v2
- Date: Thu, 7 Dec 2023 06:38:10 GMT
- Title: MAUVE Scores for Generative Models: Theory and Practice
- Authors: Krishna Pillutla, Lang Liu, John Thickstun, Sean Welleck, Swabha
Swayamdipta, Rowan Zellers, Sewoong Oh, Yejin Choi, Zaid Harchaoui
- Abstract summary: We present MAUVE, a family of comparison measures between pairs of distributions such as those encountered in the generative modeling of text or images.
We find that MAUVE can quantify the gaps between the distributions of human-written text and those of modern neural language models.
We demonstrate in the vision domain that MAUVE can identify known properties of generated images on par with or better than existing metrics.
- Score: 95.86006777961182
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative artificial intelligence has made significant strides, producing
text indistinguishable from human prose and remarkably photorealistic images.
Automatically measuring how close the generated data distribution is to the
target distribution is central to diagnosing existing models and developing
better ones. We present MAUVE, a family of comparison measures between pairs of
distributions such as those encountered in the generative modeling of text or
images. These scores are statistical summaries of divergence frontiers
capturing two types of errors in generative modeling. We explore three
approaches to statistically estimate these scores: vector quantization,
non-parametric estimation, and classifier-based estimation. We provide
statistical bounds for the vector quantization approach.
Empirically, we find that the proposed scores paired with a range of
$f$-divergences and statistical estimation methods can quantify the gaps
between the distributions of human-written text and those of modern neural
language models by correlating with human judgments and identifying known
properties of the generated texts. We demonstrate in the vision domain that
MAUVE can identify known properties of generated images on par with or better
than existing metrics. In conclusion, we present practical recommendations for
using MAUVE effectively with language and image modalities.
Related papers
- Sub-graph Based Diffusion Model for Link Prediction [43.15741675617231]
Denoising Diffusion Probabilistic Models (DDPMs) represent a contemporary class of generative models with exceptional qualities.
We build a novel generative model for link prediction using a dedicated design to decompose the likelihood estimation process via the Bayesian formula.
Our proposed method presents numerous advantages: (1) transferability across datasets without retraining, (2) promising generalization on limited training data, and (3) robustness against graph adversarial attacks.
arXiv Detail & Related papers (2024-09-13T02:23:55Z) - Improving Explainability of Softmax Classifiers Using a Prototype-Based Joint Embedding Method [0.0]
We propose a prototype-based approach for improving explainability of softmax classifiers.
By modifying the model architecture and training, we acquire the ability to sample for prototypical examples that contributed to the prediction.
We obtain a metric for uncertainty that is better able to detect out of distribution data than softmax confidence.
arXiv Detail & Related papers (2024-07-02T13:59:09Z) - A Practical Guide to Sample-based Statistical Distances for Evaluating Generative Models in Science [7.2447605934304375]
We focus on four commonly used notions of statistical distances representing different methodologies.
We highlight the intuition behind each distance and explain their merits, scalability, complexity, and pitfalls.
We evaluate generative models from different scientific domains, namely a model of decision-making and a model generating medical images.
arXiv Detail & Related papers (2024-03-19T11:16:14Z) - Open-Domain Text Evaluation via Contrastive Distribution Methods [75.59039812868681]
We introduce a novel method for evaluating open-domain text generation called Contrastive Distribution Methods.
Our experiments on coherence evaluation for multi-turn dialogue and commonsense evaluation for controllable generation demonstrate CDM's superior correlate with human judgment.
arXiv Detail & Related papers (2023-06-20T20:37:54Z) - Mutual Information Divergence: A Unified Metric for Multimodal
Generative Models [19.520177195241704]
We propose the negative Gaussian cross-mutual information using the CLIP features as a unified metric, coined by Mutual Information Divergence (MID)
We extensively compare it with competing metrics using carefully-generated or human-annotated judgments in text-to-image generation and image captioning tasks.
The proposed MID significantly outperforms the competitive methods by having consistency across benchmarks, sample parsimony, and robustness toward the exploited CLIP model.
arXiv Detail & Related papers (2022-05-25T09:34:37Z) - Bayesian Graph Contrastive Learning [55.36652660268726]
We propose a novel perspective of graph contrastive learning methods showing random augmentations leads to encoders.
Our proposed method represents each node by a distribution in the latent space in contrast to existing techniques which embed each node to a deterministic vector.
We show a considerable improvement in performance compared to existing state-of-the-art methods on several benchmark datasets.
arXiv Detail & Related papers (2021-12-15T01:45:32Z) - Distributional Depth-Based Estimation of Object Articulation Models [21.046351215949525]
We propose a method that efficiently learns distributions over articulation model parameters directly from depth images.
Our core contributions include a novel representation for distributions over rigid body transformations.
We introduce a novel deep learning based approach, DUST-net, that performs category-independent articulation model estimation.
arXiv Detail & Related papers (2021-08-12T17:44:51Z) - A comprehensive comparative evaluation and analysis of Distributional
Semantic Models [61.41800660636555]
We perform a comprehensive evaluation of type distributional vectors, either produced by static DSMs or obtained by averaging the contextualized vectors generated by BERT.
The results show that the alleged superiority of predict based models is more apparent than real, and surely not ubiquitous.
We borrow from cognitive neuroscience the methodology of Representational Similarity Analysis (RSA) to inspect the semantic spaces generated by distributional models.
arXiv Detail & Related papers (2021-05-20T15:18:06Z) - How Faithful is your Synthetic Data? Sample-level Metrics for Evaluating
and Auditing Generative Models [95.8037674226622]
We introduce a 3-dimensional evaluation metric that characterizes the fidelity, diversity and generalization performance of any generative model in a domain-agnostic fashion.
Our metric unifies statistical divergence measures with precision-recall analysis, enabling sample- and distribution-level diagnoses of model fidelity and diversity.
arXiv Detail & Related papers (2021-02-17T18:25:30Z) - Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors.
We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method.
Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.