Multiplicative noise and heavy tails in stochastic optimization
- URL: http://arxiv.org/abs/2006.06293v1
- Date: Thu, 11 Jun 2020 09:58:01 GMT
- Title: Multiplicative noise and heavy tails in stochastic optimization
- Authors: Liam Hodgkinson, Michael W. Mahoney
- Abstract summary: empirical optimization is central to modern machine learning, but its role in its success is still unclear.
We show that it commonly arises in parameters of discrete multiplicative noise due to variance.
A detailed analysis is conducted in which we describe on key factors, including recent step size, and data, all exhibit similar results on state-of-the-art neural network models.
- Score: 62.993432503309485
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although stochastic optimization is central to modern machine learning, the
precise mechanisms underlying its success, and in particular, the precise role
of the stochasticity, still remain unclear. Modelling stochastic optimization
algorithms as discrete random recurrence relations, we show that multiplicative
noise, as it commonly arises due to variance in local rates of convergence,
results in heavy-tailed stationary behaviour in the parameters. A detailed
analysis is conducted for SGD applied to a simple linear regression problem,
followed by theoretical results for a much larger class of models (including
non-linear and non-convex) and optimizers (including momentum, Adam, and
stochastic Newton), demonstrating that our qualitative results hold much more
generally. In each case, we describe dependence on key factors, including step
size, batch size, and data variability, all of which exhibit similar
qualitative behavior to recent empirical results on state-of-the-art neural
network models from computer vision and natural language processing.
Furthermore, we empirically demonstrate how multiplicative noise and
heavy-tailed structure improve capacity for basin hopping and exploration of
non-convex loss surfaces, over commonly-considered stochastic dynamics with
only additive noise and light-tailed structure.
Related papers
- Minimax Optimal and Computationally Efficient Algorithms for Distributionally Robust Offline Reinforcement Learning [6.969949986864736]
Distributionally robust offline reinforcement learning (RL) seeks robust policy training against environment perturbation by modeling dynamics uncertainty.
We propose minimax optimal and computationally efficient algorithms realizing function approximation.
Our results uncover that function approximation in robust offline RL is essentially distinct from and probably harder than that in standard offline RL.
arXiv Detail & Related papers (2024-03-14T17:55:10Z) - Learning minimal representations of stochastic processes with
variational autoencoders [52.99137594502433]
We introduce an unsupervised machine learning approach to determine the minimal set of parameters required to describe a process.
Our approach enables for the autonomous discovery of unknown parameters describing processes.
arXiv Detail & Related papers (2023-07-21T14:25:06Z) - Adaptive Conditional Quantile Neural Processes [9.066817971329899]
Conditional Quantile Neural Processes (CQNPs) are a new member of the neural processes family.
We introduce an extension of quantile regression where the model learns to focus on estimating informative quantiles.
Experiments with real and synthetic datasets demonstrate substantial improvements in predictive performance.
arXiv Detail & Related papers (2023-05-30T06:19:19Z) - A Causality-Based Learning Approach for Discovering the Underlying
Dynamics of Complex Systems from Partial Observations with Stochastic
Parameterization [1.2882319878552302]
This paper develops a new iterative learning algorithm for complex turbulent systems with partial observations.
It alternates between identifying model structures, recovering unobserved variables, and estimating parameters.
Numerical experiments show that the new algorithm succeeds in identifying the model structure and providing suitable parameterizations for many complex nonlinear systems.
arXiv Detail & Related papers (2022-08-19T00:35:03Z) - The curse of overparametrization in adversarial training: Precise
analysis of robust generalization for random features regression [34.35440701530876]
We show that for adversarially trained random features models, high overparametrization can hurt robust generalization.
Our developed theory reveals the nontrivial effect of overparametrization on robustness and indicates that for adversarially trained random features models, high overparametrization can hurt robust generalization.
arXiv Detail & Related papers (2022-01-13T18:57:30Z) - Estimation of Bivariate Structural Causal Models by Variational Gaussian
Process Regression Under Likelihoods Parametrised by Normalising Flows [74.85071867225533]
Causal mechanisms can be described by structural causal models.
One major drawback of state-of-the-art artificial intelligence is its lack of explainability.
arXiv Detail & Related papers (2021-09-06T14:52:58Z) - Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples.
We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z) - Compositional Modeling of Nonlinear Dynamical Systems with ODE-based
Random Features [0.0]
We present a novel, domain-agnostic approach to tackling this problem.
We use compositions of physics-informed random features, derived from ordinary differential equations.
We find that our approach achieves comparable performance to a number of other probabilistic models on benchmark regression tasks.
arXiv Detail & Related papers (2021-06-10T17:55:13Z) - Fractal Structure and Generalization Properties of Stochastic
Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure.
We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z) - Instability, Computational Efficiency and Statistical Accuracy [101.32305022521024]
We develop a framework that yields statistical accuracy based on interplay between the deterministic convergence rate of the algorithm at the population level, and its degree of (instability) when applied to an empirical object based on $n$ samples.
We provide applications of our general results to several concrete classes of models, including Gaussian mixture estimation, non-linear regression models, and informative non-response models.
arXiv Detail & Related papers (2020-05-22T22:30:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.