KaFiStO: A Kalman Filtering Framework for Stochastic Optimization
- URL: http://arxiv.org/abs/2107.03331v1
- Date: Wed, 7 Jul 2021 16:13:57 GMT
- Title: KaFiStO: A Kalman Filtering Framework for Stochastic Optimization
- Authors: Aram Davtyan, Sepehr Sameni, Llukman Cerkezi, Givi Meishvilli, Adam
Bielski, Paolo Favaro
- Abstract summary: We show that when training neural networks the loss function changes over (iteration) time due to the randomized selection of a subset of the samples.
This randomization turns the optimization problem into an optimum one.
We propose to consider the loss as a noisy observation with respect to some reference.
- Score: 27.64040983559736
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Optimization is often cast as a deterministic problem, where the solution is
found through some iterative procedure such as gradient descent. However, when
training neural networks the loss function changes over (iteration) time due to
the randomized selection of a subset of the samples. This randomization turns
the optimization problem into a stochastic one. We propose to consider the loss
as a noisy observation with respect to some reference optimum. This
interpretation of the loss allows us to adopt Kalman filtering as an optimizer,
as its recursive formulation is designed to estimate unknown parameters from
noisy measurements. Moreover, we show that the Kalman Filter dynamical model
for the evolution of the unknown parameters can be used to capture the gradient
dynamics of advanced methods such as Momentum and Adam. We call this stochastic
optimization method KaFiStO. KaFiStO is an easy to implement, scalable, and
efficient method to train neural networks. We show that it also yields
parameter estimates that are on par with or better than existing optimization
algorithms across several neural network architectures and machine learning
tasks, such as computer vision and language modeling.
Related papers
- Fine-Tuning Adaptive Stochastic Optimizers: Determining the Optimal Hyperparameter $ε$ via Gradient Magnitude Histogram Analysis [0.7366405857677226]
We introduce a new framework based on the empirical probability density function of the loss's magnitude, termed the "gradient magnitude histogram"
We propose a novel algorithm using gradient magnitude histograms to automatically estimate a refined and accurate search space for the optimal safeguard.
arXiv Detail & Related papers (2023-11-20T04:34:19Z) - Low-rank extended Kalman filtering for online learning of neural
networks from streaming data [71.97861600347959]
We propose an efficient online approximate Bayesian inference algorithm for estimating the parameters of a nonlinear function from a potentially non-stationary data stream.
The method is based on the extended Kalman filter (EKF), but uses a novel low-rank plus diagonal decomposition of the posterior matrix.
In contrast to methods based on variational inference, our method is fully deterministic, and does not require step-size tuning.
arXiv Detail & Related papers (2023-05-31T03:48:49Z) - Optimal Rates for Random Order Online Optimization [60.011653053877126]
We study the citetgarber 2020online, where the loss functions may be chosen by an adversary, but are then presented online in a uniformly random order.
We show that citetgarber 2020online algorithms achieve the optimal bounds and significantly improve their stability.
arXiv Detail & Related papers (2021-06-29T09:48:46Z) - Adaptive Importance Sampling for Finite-Sum Optimization and Sampling
with Decreasing Step-Sizes [4.355567556995855]
We propose Avare, a simple and efficient algorithm for adaptive importance sampling for finite-sum optimization and sampling with decreasing step-sizes.
Under standard technical conditions, we show that Avare achieves $mathcalO(T2/3)$ and $mathcalO(T5/6)$ dynamic regret for SGD and SGLD respectively when run with $mathcalO(T5/6)$ step sizes.
arXiv Detail & Related papers (2021-03-23T00:28:15Z) - Meta-Solver for Neural Ordinary Differential Equations [77.8918415523446]
We investigate how the variability in solvers' space can improve neural ODEs performance.
We show that the right choice of solver parameterization can significantly affect neural ODEs models in terms of robustness to adversarial attacks.
arXiv Detail & Related papers (2021-03-15T17:26:34Z) - Stochastic Gradient Variance Reduction by Solving a Filtering Problem [0.951828574518325]
Deep neural networks (DNN) are typically using optimized gradient descent (SGD)
The estimation of the gradient using samples tends to be noisy and unreliable, resulting in large gradient variance and bad convergence.
We propose textbfFilter Gradient Decent(FGD), an efficient optimization algorithm that makes the consistent estimation of gradient.
arXiv Detail & Related papers (2020-12-22T23:48:42Z) - Self-Tuning Stochastic Optimization with Curvature-Aware Gradient
Filtering [53.523517926927894]
We explore the use of exact per-sample Hessian-vector products and gradients to construct self-tuning quadratics.
We prove that our model-based procedure converges in noisy gradient setting.
This is an interesting step for constructing self-tuning quadratics.
arXiv Detail & Related papers (2020-11-09T22:07:30Z) - Robust, Accurate Stochastic Optimization for Variational Inference [68.83746081733464]
We show that common optimization methods lead to poor variational approximations if the problem is moderately large.
Motivated by these findings, we develop a more robust and accurate optimization framework by viewing the underlying algorithm as producing a Markov chain.
arXiv Detail & Related papers (2020-09-01T19:12:11Z) - NOVAS: Non-convex Optimization via Adaptive Stochastic Search for
End-to-End Learning and Control [22.120942106939122]
We propose the use of adaptive search as a building block for general, non- neural optimization operations.
We benchmark it against two existing alternatives on a synthetic energy-based structured task, and showcase its use in optimal control applications.
arXiv Detail & Related papers (2020-06-22T03:40:36Z) - Stochastic batch size for adaptive regularization in deep network
optimization [63.68104397173262]
We propose a first-order optimization algorithm incorporating adaptive regularization applicable to machine learning problems in deep learning framework.
We empirically demonstrate the effectiveness of our algorithm using an image classification task based on conventional network models applied to commonly used benchmark datasets.
arXiv Detail & Related papers (2020-04-14T07:54:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.