Related papers: Robust model training and generalisation with Studentising flows

Robust model training and generalisation with Studentising flows

URL: http://arxiv.org/abs/2006.06599v2
Date: Sat, 11 Jul 2020 12:50:13 GMT
Title: Robust model training and generalisation with Studentising flows
Authors: Simon Alexanderson, Gustav Eje Henter
Abstract summary: We discuss how these methods can be further improved based on insights from robust (in particular, resistant) statistics. We propose to endow flow-based models with fat-tailed latent distributions as a simple drop-in replacement for the Gaussian distribution. Experiments on several different datasets confirm the efficacy of the proposed approach.
Score: 22.757298187704745
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Normalising flows are tractable probabilistic models that leverage the power of deep learning to describe a wide parametric family of distributions, all while remaining trainable using maximum likelihood. We discuss how these methods can be further improved based on insights from robust (in particular, resistant) statistics. Specifically, we propose to endow flow-based models with fat-tailed latent distributions such as multivariate Student's $t$, as a simple drop-in replacement for the Gaussian distribution used by conventional normalising flows. While robustness brings many advantages, this paper explores two of them: 1) We describe how using fatter-tailed base distributions can give benefits similar to gradient clipping, but without compromising the asymptotic consistency of the method. 2) We also discuss how robust ideas lead to models with reduced generalisation gap and improved held-out data likelihood. Experiments on several different datasets confirm the efficacy of the proposed approach in both regards.

Related papers

Efficient, Multimodal, and Derivative-Free Bayesian Inference With Fisher-Rao Gradient Flows [10.153270126742369]
We study efficient approximate sampling for probability distributions known up to normalization constants. We specifically focus on a problem class arising in Bayesian inference for large-scale inverse problems in science and engineering applications.
arXiv Detail & Related papers (2024-06-25T04:07:22Z)
Generative Assignment Flows for Representing and Learning Joint Distributions of Discrete Data [2.6499018693213316]
We introduce a novel generative model for the representation of joint probability distributions of a possibly large number of discrete random variables. The embedding of the flow via the Segre map in the meta-simplex of all discrete joint distributions ensures that any target distribution can be represented in principle. Our approach has strong motivation from first principles of modeling coupled discrete variables.
arXiv Detail & Related papers (2024-06-06T21:58:33Z)
Theoretical Insights for Diffusion Guidance: A Case Study for Gaussian Mixture Models [59.331993845831946]
Diffusion models benefit from instillation of task-specific information into the score function to steer the sample generation towards desired properties. This paper provides the first theoretical study towards understanding the influence of guidance on diffusion models in the context of Gaussian mixture models.
arXiv Detail & Related papers (2024-03-03T23:15:48Z)
Structural Pruning for Diffusion Models [65.02607075556742]
We present Diff-Pruning, an efficient compression method tailored for learning lightweight diffusion models from pre-existing ones. Our empirical assessment, undertaken across several datasets highlights two primary benefits of our proposed method.
arXiv Detail & Related papers (2023-05-18T12:38:21Z)
Bayesian Hierarchical Models for Counterfactual Estimation [12.159830463756341]
We propose a probabilistic paradigm to estimate a diverse set of counterfactuals. We treat the perturbations as random variables endowed with prior distribution functions. A gradient based sampler with superior convergence characteristics efficiently computes the posterior samples.
arXiv Detail & Related papers (2023-01-21T00:21:11Z)
Sliced-Wasserstein normalizing flows: beyond maximum likelihood training [12.91637880428221]
normalizing flows generally suffer from several shortcomings including their tendency to generate unrealistic data. This paper proposes a new training paradigm based on a hybrid objective function combining the maximum likelihood principle (MLE) and a sliced-Wasserstein distance.
arXiv Detail & Related papers (2022-07-12T11:29:49Z)
Resampling Base Distributions of Normalizing Flows [0.0]
We introduce a base distribution for normalizing flows based on learned rejection sampling. We develop suitable learning algorithms using both maximizing the log-likelihood and the optimization of the reverse Kullback-Leibler divergence.
arXiv Detail & Related papers (2021-10-29T14:44:44Z)
Regularizing Variational Autoencoder with Diversity and Uncertainty Awareness [61.827054365139645]
Variational Autoencoder (VAE) approximates the posterior of latent variables based on amortized variational inference. We propose an alternative model, DU-VAE, for learning a more Diverse and less Uncertain latent space.
arXiv Detail & Related papers (2021-10-24T07:58:13Z)
Scalable Personalised Item Ranking through Parametric Density Estimation [53.44830012414444]
Learning from implicit feedback is challenging because of the difficult nature of the one-class problem. Most conventional methods use a pairwise ranking approach and negative samplers to cope with the one-class problem. We propose a learning-to-rank approach, which achieves convergence speed comparable to the pointwise counterpart.
arXiv Detail & Related papers (2021-05-11T03:38:16Z)
Learning Diverse Representations for Fast Adaptation to Distribution Shift [78.83747601814669]
We present a method for learning multiple models, incorporating an objective that pressures each to learn a distinct way to solve the task. We demonstrate our framework's ability to facilitate rapid adaptation to distribution shift.
arXiv Detail & Related papers (2020-06-12T12:23:50Z)
Bayesian Deep Learning and a Probabilistic Perspective of Generalization [56.69671152009899]
We show that deep ensembles provide an effective mechanism for approximate Bayesian marginalization. We also propose a related approach that further improves the predictive distribution by marginalizing within basins of attraction.
arXiv Detail & Related papers (2020-02-20T15:13:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.