Permutation Invariant Learning with High-Dimensional Particle Filters
- URL: http://arxiv.org/abs/2410.22695v1
- Date: Wed, 30 Oct 2024 05:06:55 GMT
- Title: Permutation Invariant Learning with High-Dimensional Particle Filters
- Authors: Akhilan Boopathy, Aneesh Muppidi, Peggy Yang, Abhiram Iyer, William Yue, Ila Fiete,
- Abstract summary: Sequential learning in deep models often suffers from challenges such as catastrophic forgetting and loss of plasticity.
We introduce a novel permutation-invariant learning framework based on high-dimensional particle filters.
- Score: 8.878254892409005
- License:
- Abstract: Sequential learning in deep models often suffers from challenges such as catastrophic forgetting and loss of plasticity, largely due to the permutation dependence of gradient-based algorithms, where the order of training data impacts the learning outcome. In this work, we introduce a novel permutation-invariant learning framework based on high-dimensional particle filters. We theoretically demonstrate that particle filters are invariant to the sequential ordering of training minibatches or tasks, offering a principled solution to mitigate catastrophic forgetting and loss-of-plasticity. We develop an efficient particle filter for optimizing high-dimensional models, combining the strengths of Bayesian methods with gradient-based optimization. Through extensive experiments on continual supervised and reinforcement learning benchmarks, including SplitMNIST, SplitCIFAR100, and ProcGen, we empirically show that our method consistently improves performance, while reducing variance compared to standard baselines.
Related papers
- Differentiable Interacting Multiple Model Particle Filtering [24.26220422457388]
We propose a sequential Monte Carlo algorithm for parameter learning when the studied model exhibits random discontinuous jumps in behaviour.
We adopt the emerging framework of differentiable particle filtering, wherein parameters are trained by gradient descent.
We establish new theoretical results of the presented algorithms and demonstrate superior numerical performance compared to the previous state-of-the-art algorithms.
arXiv Detail & Related papers (2024-10-01T12:05:18Z) - Regime Learning for Differentiable Particle Filters [19.35021771863565]
Differentiable particle filters are an emerging class of models that combine sequential Monte Carlo techniques with the flexibility of neural networks to perform state space inference.
No prior approaches effectively learn both the individual regimes and the switching process simultaneously.
We propose the neural network based regime learning differentiable particle filter (RLPF) to address this problem.
arXiv Detail & Related papers (2024-05-08T07:43:43Z) - Learning Differentiable Particle Filter on the Fly [18.466658684464598]
Differentiable particle filters are an emerging class of sequential Bayesian inference techniques.
We propose an online learning framework for differentiable particle filters so that model parameters can be updated as data arrive.
arXiv Detail & Related papers (2023-12-10T17:54:40Z) - Uncovering mesa-optimization algorithms in Transformers [61.06055590704677]
Some autoregressive models can learn as an input sequence is processed, without undergoing any parameter changes, and without being explicitly trained to do so.
We show that standard next-token prediction error minimization gives rise to a subsidiary learning algorithm that adjusts the model as new inputs are revealed.
Our findings explain in-context learning as a product of autoregressive loss minimization and inform the design of new optimization-based Transformer layers.
arXiv Detail & Related papers (2023-09-11T22:42:50Z) - Low-rank extended Kalman filtering for online learning of neural
networks from streaming data [71.97861600347959]
We propose an efficient online approximate Bayesian inference algorithm for estimating the parameters of a nonlinear function from a potentially non-stationary data stream.
The method is based on the extended Kalman filter (EKF), but uses a novel low-rank plus diagonal decomposition of the posterior matrix.
In contrast to methods based on variational inference, our method is fully deterministic, and does not require step-size tuning.
arXiv Detail & Related papers (2023-05-31T03:48:49Z) - An overview of differentiable particle filters for data-adaptive
sequential Bayesian inference [19.09640071505051]
Particle filters (PFs) provide an efficient mechanism for solving non-linear sequential state estimation problems.
An emerging trend involves constructing components of particle filters using neural networks and optimising them by gradient descent.
Differentiable particle filters are a promising computational tool for performing inference on sequential data in complex, high-dimensional tasks.
arXiv Detail & Related papers (2023-02-19T18:03:53Z) - Computational Doob's h-transforms for Online Filtering of Discretely
Observed Diffusions [65.74069050283998]
We propose a computational framework to approximate Doob's $h$-transforms.
The proposed approach can be orders of magnitude more efficient than state-of-the-art particle filters.
arXiv Detail & Related papers (2022-06-07T15:03:05Z) - Deep Learning for the Benes Filter [91.3755431537592]
We present a new numerical method based on the mesh-free neural network representation of the density of the solution of the Benes model.
We discuss the role of nonlinearity in the filtering model equations for the choice of the domain of the neural network.
arXiv Detail & Related papers (2022-03-09T14:08:38Z) - Initialization and Regularization of Factorized Neural Layers [23.875225732697142]
We show how to initialize and regularize Factorized layers in deep nets.
We show how these schemes lead to improved performance on both translation and unsupervised pre-training.
arXiv Detail & Related papers (2021-05-03T17:28:07Z) - Deep Shells: Unsupervised Shape Correspondence with Optimal Transport [52.646396621449]
We propose a novel unsupervised learning approach to 3D shape correspondence.
We show that the proposed method significantly improves over the state-of-the-art on multiple datasets.
arXiv Detail & Related papers (2020-10-28T22:24:07Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.