Robust model training and generalisation with Studentising flows
- URL: http://arxiv.org/abs/2006.06599v2
- Date: Sat, 11 Jul 2020 12:50:13 GMT
- Title: Robust model training and generalisation with Studentising flows
- Authors: Simon Alexanderson, Gustav Eje Henter
- Abstract summary: We discuss how these methods can be further improved based on insights from robust (in particular, resistant) statistics.
We propose to endow flow-based models with fat-tailed latent distributions as a simple drop-in replacement for the Gaussian distribution.
Experiments on several different datasets confirm the efficacy of the proposed approach.
- Score: 22.757298187704745
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Normalising flows are tractable probabilistic models that leverage the power
of deep learning to describe a wide parametric family of distributions, all
while remaining trainable using maximum likelihood. We discuss how these
methods can be further improved based on insights from robust (in particular,
resistant) statistics. Specifically, we propose to endow flow-based models with
fat-tailed latent distributions such as multivariate Student's $t$, as a simple
drop-in replacement for the Gaussian distribution used by conventional
normalising flows. While robustness brings many advantages, this paper explores
two of them: 1) We describe how using fatter-tailed base distributions can give
benefits similar to gradient clipping, but without compromising the asymptotic
consistency of the method. 2) We also discuss how robust ideas lead to models
with reduced generalisation gap and improved held-out data likelihood.
Experiments on several different datasets confirm the efficacy of the proposed
approach in both regards.
Related papers
- Is There a Better Source Distribution than Gaussian? Exploring Source Distributions for Image Flow Matching [27.47409979324549]
Flow matching has emerged as a powerful generative modeling approach with flexible choices of source distribution.<n>We propose a novel 2D simulation that captures high-dimensional geometric properties in an interpretable 2D setting.<n>We propose a framework that combines norm-aligned training with directionally-pruned sampling.
arXiv Detail & Related papers (2025-12-20T02:44:54Z) - Discrete Diffusion Models: Novel Analysis and New Sampler Guarantees [70.88473359544084]
We introduce a new analytical approach for discrete diffusion models that removes the need for regularity assumptions.<n>For the standard $tau$-leaping method, we establish convergence guarantees in KL divergence that scale linearly with vocabulary size.<n>Our approach is also more broadly applicable: it provides the first convergence guarantees for other widely used samplers.
arXiv Detail & Related papers (2025-09-20T17:42:29Z) - Exploring Representation Invariance in Finetuning [51.19872959859021]
Foundation models pretrained on large-scale natural images are widely adapted to various cross-domain low-resource downstream tasks.<n>We argue that such tasks can be effectively adapted without sacrificing the benefits of pretrained representations.<n>We introduce textitRepresentation Invariance FineTuning (RIFT), a regularization that maximizes the representation similarity between pretrained and finetuned models.
arXiv Detail & Related papers (2025-03-10T14:44:37Z) - Efficient, Multimodal, and Derivative-Free Bayesian Inference With Fisher-Rao Gradient Flows [10.153270126742369]
We study efficient approximate sampling for probability distributions known up to normalization constants.
We specifically focus on a problem class arising in Bayesian inference for large-scale inverse problems in science and engineering applications.
arXiv Detail & Related papers (2024-06-25T04:07:22Z) - Generative Assignment Flows for Representing and Learning Joint Distributions of Discrete Data [2.6499018693213316]
We introduce a novel generative model for the representation of joint probability distributions of a possibly large number of discrete random variables.
The embedding of the flow via the Segre map in the meta-simplex of all discrete joint distributions ensures that any target distribution can be represented in principle.
Our approach has strong motivation from first principles of modeling coupled discrete variables.
arXiv Detail & Related papers (2024-06-06T21:58:33Z) - Theoretical Insights for Diffusion Guidance: A Case Study for Gaussian
Mixture Models [59.331993845831946]
Diffusion models benefit from instillation of task-specific information into the score function to steer the sample generation towards desired properties.
This paper provides the first theoretical study towards understanding the influence of guidance on diffusion models in the context of Gaussian mixture models.
arXiv Detail & Related papers (2024-03-03T23:15:48Z) - Structural Pruning for Diffusion Models [65.02607075556742]
We present Diff-Pruning, an efficient compression method tailored for learning lightweight diffusion models from pre-existing ones.
Our empirical assessment, undertaken across several datasets highlights two primary benefits of our proposed method.
arXiv Detail & Related papers (2023-05-18T12:38:21Z) - Bayesian Hierarchical Models for Counterfactual Estimation [12.159830463756341]
We propose a probabilistic paradigm to estimate a diverse set of counterfactuals.
We treat the perturbations as random variables endowed with prior distribution functions.
A gradient based sampler with superior convergence characteristics efficiently computes the posterior samples.
arXiv Detail & Related papers (2023-01-21T00:21:11Z) - Sliced-Wasserstein normalizing flows: beyond maximum likelihood training [12.91637880428221]
normalizing flows generally suffer from several shortcomings including their tendency to generate unrealistic data.
This paper proposes a new training paradigm based on a hybrid objective function combining the maximum likelihood principle (MLE) and a sliced-Wasserstein distance.
arXiv Detail & Related papers (2022-07-12T11:29:49Z) - Resampling Base Distributions of Normalizing Flows [0.0]
We introduce a base distribution for normalizing flows based on learned rejection sampling.
We develop suitable learning algorithms using both maximizing the log-likelihood and the optimization of the reverse Kullback-Leibler divergence.
arXiv Detail & Related papers (2021-10-29T14:44:44Z) - Regularizing Variational Autoencoder with Diversity and Uncertainty
Awareness [61.827054365139645]
Variational Autoencoder (VAE) approximates the posterior of latent variables based on amortized variational inference.
We propose an alternative model, DU-VAE, for learning a more Diverse and less Uncertain latent space.
arXiv Detail & Related papers (2021-10-24T07:58:13Z) - Scalable Personalised Item Ranking through Parametric Density Estimation [53.44830012414444]
Learning from implicit feedback is challenging because of the difficult nature of the one-class problem.
Most conventional methods use a pairwise ranking approach and negative samplers to cope with the one-class problem.
We propose a learning-to-rank approach, which achieves convergence speed comparable to the pointwise counterpart.
arXiv Detail & Related papers (2021-05-11T03:38:16Z) - Learning Diverse Representations for Fast Adaptation to Distribution
Shift [78.83747601814669]
We present a method for learning multiple models, incorporating an objective that pressures each to learn a distinct way to solve the task.
We demonstrate our framework's ability to facilitate rapid adaptation to distribution shift.
arXiv Detail & Related papers (2020-06-12T12:23:50Z) - Bayesian Deep Learning and a Probabilistic Perspective of Generalization [56.69671152009899]
We show that deep ensembles provide an effective mechanism for approximate Bayesian marginalization.
We also propose a related approach that further improves the predictive distribution by marginalizing within basins of attraction.
arXiv Detail & Related papers (2020-02-20T15:13:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.