Low-rank extended Kalman filtering for online learning of neural
networks from streaming data
- URL: http://arxiv.org/abs/2305.19535v3
- Date: Wed, 28 Jun 2023 00:44:24 GMT
- Title: Low-rank extended Kalman filtering for online learning of neural
networks from streaming data
- Authors: Peter G. Chang, Gerardo Dur\'an-Mart\'in, Alexander Y Shestopaloff,
Matt Jones, Kevin Murphy
- Abstract summary: We propose an efficient online approximate Bayesian inference algorithm for estimating the parameters of a nonlinear function from a potentially non-stationary data stream.
The method is based on the extended Kalman filter (EKF), but uses a novel low-rank plus diagonal decomposition of the posterior matrix.
In contrast to methods based on variational inference, our method is fully deterministic, and does not require step-size tuning.
- Score: 71.97861600347959
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose an efficient online approximate Bayesian inference algorithm for
estimating the parameters of a nonlinear function from a potentially
non-stationary data stream. The method is based on the extended Kalman filter
(EKF), but uses a novel low-rank plus diagonal decomposition of the posterior
precision matrix, which gives a cost per step which is linear in the number of
model parameters. In contrast to methods based on stochastic variational
inference, our method is fully deterministic, and does not require step-size
tuning. We show experimentally that this results in much faster (more sample
efficient) learning, which results in more rapid adaptation to changing
distributions, and faster accumulation of reward when used as part of a
contextual bandit algorithm.
Related papers
- Particle-based Online Bayesian Sampling [24.290436348629452]
We study an Online Particle-based Variational Inference (OPVI) algorithm that uses a set of particles to represent the approximating distribution.
To reduce the gradient error caused by the use of approximation, we include a sublinear increasing batch-size method to reduce the variance.
Experiments show that the proposed algorithm achieves better results than naively applying existing Bayesian sampling methods in the online setting.
arXiv Detail & Related papers (2023-02-28T17:46:32Z) - Sparse high-dimensional linear regression with a partitioned empirical
Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression.
Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates.
The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z) - Computational Doob's h-transforms for Online Filtering of Discretely
Observed Diffusions [65.74069050283998]
We propose a computational framework to approximate Doob's $h$-transforms.
The proposed approach can be orders of magnitude more efficient than state-of-the-art particle filters.
arXiv Detail & Related papers (2022-06-07T15:03:05Z) - Memory-Efficient Backpropagation through Large Linear Layers [107.20037639738433]
In modern neural networks like Transformers, linear layers require significant memory to store activations during backward pass.
This study proposes a memory reduction approach to perform backpropagation through linear layers.
arXiv Detail & Related papers (2022-01-31T13:02:41Z) - Learning Linearized Assignment Flows for Image Labeling [70.540936204654]
We introduce a novel algorithm for estimating optimal parameters of linearized assignment flows for image labeling.
We show how to efficiently evaluate this formula using a Krylov subspace and a low-rank approximation.
arXiv Detail & Related papers (2021-08-02T13:38:09Z) - KaFiStO: A Kalman Filtering Framework for Stochastic Optimization [27.64040983559736]
We show that when training neural networks the loss function changes over (iteration) time due to the randomized selection of a subset of the samples.
This randomization turns the optimization problem into an optimum one.
We propose to consider the loss as a noisy observation with respect to some reference.
arXiv Detail & Related papers (2021-07-07T16:13:57Z) - Fast and Robust Online Inference with Stochastic Gradient Descent via
Random Scaling [0.9806910643086042]
We develop a new method of online inference for a vector of parameters estimated by the Polyak-Rtupper averaging procedure of gradient descent algorithms.
Our approach is fully operational with online data and is rigorously underpinned by a functional central limit theorem.
arXiv Detail & Related papers (2021-06-06T15:38:37Z) - SLIP: Learning to Predict in Unknown Dynamical Systems with Long-Term
Memory [21.09861411069719]
We present an efficient and practical (polynomial time) algorithm for online prediction in unknown and partially observed linear dynamical systems.
Our algorithm competes with Kalman filter in hindsight with only logarithmic regret.
Our theoretical and experimental results shed light on the conditions required for efficient probably approximately correct (PAC) learning of the Kalman filter from partially observed data.
arXiv Detail & Related papers (2020-10-12T17:50:21Z) - Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated.
We propose a new method for this estimation problem combining sampling and analytic approximation steps.
We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z) - High-dimensional, multiscale online changepoint detection [7.502070498889449]
We introduce a new method for high-dimensional, online changepoint detection in settings where a $p$- Gaussian data stream may undergo a change in mean.
The algorithm is online in the sense that both its storage requirements and worst-case computational complexity per new observation are independent of the number of previous observations.
Simulations confirm the practical effectiveness of our proposal, which is implemented in the R package 'ocd', and we also demonstrate its utility on a seismology data set.
arXiv Detail & Related papers (2020-03-07T21:54:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.