Related papers: Sequential Function-Space Variational Inference via Gaussian Mixture Approximation

Sequential Function-Space Variational Inference via Gaussian Mixture Approximation

URL: http://arxiv.org/abs/2503.07114v2
Date: Tue, 27 May 2025 12:22:46 GMT
Title: Sequential Function-Space Variational Inference via Gaussian Mixture Approximation
Authors: Menghao Waiyan William Zhu, Pengcheng Hao, Ercan Engin Kuruoğlu,
Abstract summary: Continual learning in neural networks aims to learn new tasks without forgetting old tasks.<n>We propose an SFSVI method based on a Gaussian mixture variational distribution.<n>We find that in terms of final average accuracy, likelihood-focused Gaussian mixture SFSVI outperforms other sequential variational inference methods.
Score: 0.6827423171182154
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Continual learning in neural networks aims to learn new tasks without forgetting old tasks. Sequential function-space variational inference (SFSVI) uses a Gaussian variational distribution to approximate the distribution of the outputs of the neural network corresponding to a finite number of selected inducing points. Since the posterior distribution of a neural network is multi-modal, a Gaussian distribution could only match one mode of the posterior distribution, and a Gaussian mixture distribution could be used to better approximate the posterior distribution. We propose an SFSVI method based on a Gaussian mixture variational distribution. We also compare different types of variational inference methods with a fixed pre-trained feature extractor (where continual learning is performed on the final layer) and without a fixed pre-trained feature extractor (where continual learning is performed on all layers). We find that in terms of final average accuracy, likelihood-focused Gaussian mixture SFSVI outperforms other sequential variational inference methods, especially in the latter case.

Related papers

Sampling in High-Dimensions using Stochastic Interpolants and Forward-Backward Stochastic Differential Equations [8.509310102094512]
We present a class of diffusion-based algorithms to draw samples from high-dimensional probability distributions.<n>Our approach relies on the interpolants framework to define a time-indexed collection of probability densities.<n>We demonstrate that our algorithm can effectively draw samples from distributions that conventional methods struggle to handle.
arXiv Detail & Related papers (2025-02-01T07:27:11Z)
Generative Conditional Distributions by Neural (Entropic) Optimal Transport [12.152228552335798]
We introduce a novel neural entropic optimal transport method designed to learn generative models of conditional distributions. Our method relies on the minimax training of two neural networks. Our experiments on real-world datasets show the effectiveness of our algorithm compared to state-of-the-art conditional distribution learning techniques.
arXiv Detail & Related papers (2024-06-04T13:45:35Z)
Learning Mixtures of Gaussians Using Diffusion Models [9.118706387430883]
We give a new algorithm for learning mixtures of $k$ Gaussians to TV error $varepsilon$, with quasi-polynomial ($O(ntextpoly,logleft(fracn+kvarepsilonright))$) time and sample complexity.<n>Results extend to continuous mixtures of Gaussians where the mixing distribution is supported on a union of $k$ balls of constant radius.
arXiv Detail & Related papers (2024-04-29T17:00:20Z)
MGF: Mixed Gaussian Flow for Diverse Trajectory Prediction [72.70572835589158]
We propose constructing a mixed Gaussian prior for a normalizing flow model for trajectory prediction.<n>Our method achieves state-of-the-art performance in the evaluation of both trajectory alignment and diversity on the popular UCY/ETH and SDD datasets.
arXiv Detail & Related papers (2024-02-19T15:48:55Z)
Uncertainty Quantification via Stable Distribution Propagation [60.065272548502]
We propose a new approach for propagating stable probability distributions through neural networks. Our method is based on local linearization, which we show to be an optimal approximation in terms of total variation distance for the ReLU non-linearity.
arXiv Detail & Related papers (2024-02-13T09:40:19Z)
Distribution Shift Inversion for Out-of-Distribution Prediction [57.22301285120695]
We propose a portable Distribution Shift Inversion algorithm for Out-of-Distribution (OoD) prediction. We show that our method provides a general performance gain when plugged into a wide range of commonly used OoD algorithms.
arXiv Detail & Related papers (2023-06-14T08:00:49Z)
Tensorizing flows: a tool for variational inference [0.0]
We introduce an extension of normalizing flows in which the Gaussian reference is replaced with a reference distribution constructed via a tensor network. We show that by combining flows with tensor networks on difficult variational inference tasks, we can improve on the results obtained by using either tool without the other.
arXiv Detail & Related papers (2023-05-03T23:42:22Z)
Compound Batch Normalization for Long-tailed Image Classification [77.42829178064807]
We propose a compound batch normalization method based on a Gaussian mixture. It can model the feature space more comprehensively and reduce the dominance of head classes. The proposed method outperforms existing methods on long-tailed image classification.
arXiv Detail & Related papers (2022-12-02T07:31:39Z)
Natural Gradient Variational Inference with Gaussian Mixture Models [1.7948767405202701]
Variational Inference (VI) methods approximate the posterior with a distribution usually chosen from a simple family using optimization. The main contribution of this work is described is a set of update rules for natural gradient variational inference with mixture of Gaussians.
arXiv Detail & Related papers (2021-11-15T20:04:32Z)
Sampling-free Variational Inference for Neural Networks with Multiplicative Activation Noise [51.080620762639434]
We propose a more efficient parameterization of the posterior approximation for sampling-free variational inference. Our approach yields competitive results for standard regression problems and scales well to large-scale image classification tasks.
arXiv Detail & Related papers (2021-03-15T16:16:18Z)
Sampling in Combinatorial Spaces with SurVAE Flow Augmented MCMC [83.48593305367523]
Hybrid Monte Carlo is a powerful Markov Chain Monte Carlo method for sampling from complex continuous distributions. We introduce a new approach based on augmenting Monte Carlo methods with SurVAE Flows to sample from discrete distributions. We demonstrate the efficacy of our algorithm on a range of examples from statistics, computational physics and machine learning, and observe improvements compared to alternative algorithms.
arXiv Detail & Related papers (2021-02-04T02:21:08Z)
Embedding Propagation: Smoother Manifold for Few-Shot Classification [131.81692677836202]
We propose to use embedding propagation as an unsupervised non-parametric regularizer for manifold smoothing in few-shot classification. We empirically show that embedding propagation yields a smoother embedding manifold. We show that embedding propagation consistently improves the accuracy of the models in multiple semi-supervised learning scenarios by up to 16% points.
arXiv Detail & Related papers (2020-03-09T13:51:09Z)
Gaussianization Flows [113.79542218282282]
We propose a new type of normalizing flow model that enables both efficient iteration of likelihoods and efficient inversion for sample generation. Because of this guaranteed expressivity, they can capture multimodal target distributions without compromising the efficiency of sample generation.
arXiv Detail & Related papers (2020-03-04T08:15:06Z)
Stein Variational Inference for Discrete Distributions [70.19352762933259]
We propose a simple yet general framework that transforms discrete distributions to equivalent piecewise continuous distributions. Our method outperforms traditional algorithms such as Gibbs sampling and discontinuous Hamiltonian Monte Carlo. We demonstrate that our method provides a promising tool for learning ensembles of binarized neural network (BNN) In addition, such transform can be straightforwardly employed in gradient-free kernelized Stein discrepancy to perform goodness-of-fit (GOF) test on discrete distributions.
arXiv Detail & Related papers (2020-03-01T22:45:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.