Related papers: Low-rank extended Kalman filtering for online learning of neural networks from streaming data

Low-rank extended Kalman filtering for online learning of neural networks from streaming data

URL: http://arxiv.org/abs/2305.19535v3
Date: Wed, 28 Jun 2023 00:44:24 GMT
Title: Low-rank extended Kalman filtering for online learning of neural networks from streaming data
Authors: Peter G. Chang, Gerardo Dur\'an-Mart\'in, Alexander Y Shestopaloff, Matt Jones, Kevin Murphy
Abstract summary: We propose an efficient online approximate Bayesian inference algorithm for estimating the parameters of a nonlinear function from a potentially non-stationary data stream. The method is based on the extended Kalman filter (EKF), but uses a novel low-rank plus diagonal decomposition of the posterior matrix. In contrast to methods based on variational inference, our method is fully deterministic, and does not require step-size tuning.
Score: 71.97861600347959
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose an efficient online approximate Bayesian inference algorithm for estimating the parameters of a nonlinear function from a potentially non-stationary data stream. The method is based on the extended Kalman filter (EKF), but uses a novel low-rank plus diagonal decomposition of the posterior precision matrix, which gives a cost per step which is linear in the number of model parameters. In contrast to methods based on stochastic variational inference, our method is fully deterministic, and does not require step-size tuning. We show experimentally that this results in much faster (more sample efficient) learning, which results in more rapid adaptation to changing distributions, and faster accumulation of reward when used as part of a contextual bandit algorithm.

Related papers

Particle-based Online Bayesian Sampling [24.290436348629452]
We study an Online Particle-based Variational Inference (OPVI) algorithm that uses a set of particles to represent the approximating distribution. To reduce the gradient error caused by the use of approximation, we include a sublinear increasing batch-size method to reduce the variance. Experiments show that the proposed algorithm achieves better results than naively applying existing Bayesian sampling methods in the online setting.
arXiv Detail & Related papers (2023-02-28T17:46:32Z)
Sparse high-dimensional linear regression with a partitioned empirical Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression. Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates. The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z)
Computational Doob's h-transforms for Online Filtering of Discretely Observed Diffusions [65.74069050283998]
We propose a computational framework to approximate Doob's $h$-transforms. The proposed approach can be orders of magnitude more efficient than state-of-the-art particle filters.
arXiv Detail & Related papers (2022-06-07T15:03:05Z)
Memory-Efficient Backpropagation through Large Linear Layers [107.20037639738433]
In modern neural networks like Transformers, linear layers require significant memory to store activations during backward pass. This study proposes a memory reduction approach to perform backpropagation through linear layers.
arXiv Detail & Related papers (2022-01-31T13:02:41Z)
Tensor Network Kalman Filtering for Large-Scale LS-SVMs [17.36231167296782]
Least squares support vector machines are used for nonlinear regression and classification. A framework based on tensor networks and the Kalman filter is presented to alleviate the demanding memory and computational complexities. Results show that our method can achieve high performance and is particularly useful when alternative methods are computationally infeasible.
arXiv Detail & Related papers (2021-10-26T08:54:03Z)
Learning Linearized Assignment Flows for Image Labeling [70.540936204654]
We introduce a novel algorithm for estimating optimal parameters of linearized assignment flows for image labeling. We show how to efficiently evaluate this formula using a Krylov subspace and a low-rank approximation.
arXiv Detail & Related papers (2021-08-02T13:38:09Z)
KaFiStO: A Kalman Filtering Framework for Stochastic Optimization [27.64040983559736]
We show that when training neural networks the loss function changes over (iteration) time due to the randomized selection of a subset of the samples. This randomization turns the optimization problem into an optimum one. We propose to consider the loss as a noisy observation with respect to some reference.
arXiv Detail & Related papers (2021-07-07T16:13:57Z)
Fast and Robust Online Inference with Stochastic Gradient Descent via Random Scaling [0.9806910643086042]
We develop a new method of online inference for a vector of parameters estimated by the Polyak-Rtupper averaging procedure of gradient descent algorithms. Our approach is fully operational with online data and is rigorously underpinned by a functional central limit theorem.
arXiv Detail & Related papers (2021-06-06T15:38:37Z)
SLIP: Learning to Predict in Unknown Dynamical Systems with Long-Term Memory [21.09861411069719]
We present an efficient and practical (polynomial time) algorithm for online prediction in unknown and partially observed linear dynamical systems. Our algorithm competes with Kalman filter in hindsight with only logarithmic regret. Our theoretical and experimental results shed light on the conditions required for efficient probably approximately correct (PAC) learning of the Kalman filter from partially observed data.
arXiv Detail & Related papers (2020-10-12T17:50:21Z)
Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated. We propose a new method for this estimation problem combining sampling and analytic approximation steps. We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z)
High-dimensional, multiscale online changepoint detection [7.502070498889449]
We introduce a new method for high-dimensional, online changepoint detection in settings where a $p$- Gaussian data stream may undergo a change in mean. The algorithm is online in the sense that both its storage requirements and worst-case computational complexity per new observation are independent of the number of previous observations. Simulations confirm the practical effectiveness of our proposal, which is implemented in the R package 'ocd', and we also demonstrate its utility on a seismology data set.
arXiv Detail & Related papers (2020-03-07T21:54:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.