On the Theoretical Properties of Noise Correlation in Stochastic
Optimization
- URL: http://arxiv.org/abs/2209.09162v1
- Date: Mon, 19 Sep 2022 16:32:22 GMT
- Title: On the Theoretical Properties of Noise Correlation in Stochastic
Optimization
- Authors: Aurelien Lucchi, Frank Proske, Antonio Orvieto, Francis Bach, Hans
Kersting
- Abstract summary: We show that fPGD possesses exploration abilities favorable over PGD and Anti-PGD.
These results open the field to novel ways to exploit noise for machine learning models.
- Score: 6.970991851511823
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Studying the properties of stochastic noise to optimize complex non-convex
functions has been an active area of research in the field of machine learning.
Prior work has shown that the noise of stochastic gradient descent improves
optimization by overcoming undesirable obstacles in the landscape. Moreover,
injecting artificial Gaussian noise has become a popular idea to quickly escape
saddle points. Indeed, in the absence of reliable gradient information, the
noise is used to explore the landscape, but it is unclear what type of noise is
optimal in terms of exploration ability. In order to narrow this gap in our
knowledge, we study a general type of continuous-time non-Markovian process,
based on fractional Brownian motion, that allows for the increments of the
process to be correlated. This generalizes processes based on Brownian motion,
such as the Ornstein-Uhlenbeck process. We demonstrate how to discretize such
processes which gives rise to the new algorithm fPGD. This method is a
generalization of the known algorithms PGD and Anti-PGD. We study the
properties of fPGD both theoretically and empirically, demonstrating that it
possesses exploration abilities that, in some cases, are favorable over PGD and
Anti-PGD. These results open the field to novel ways to exploit noise for
training machine learning models.
Related papers
- Random Exploration in Bayesian Optimization: Order-Optimal Regret and
Computational Efficiency [18.17090625880964]
We study the methodology of exploring the domain using random samples drawn from a distribution.
We show that this random exploration approach achieves the optimal error rates.
arXiv Detail & Related papers (2023-10-23T20:30:44Z) - Representing Noisy Image Without Denoising [91.73819173191076]
Fractional-order Moments in Radon space (FMR) is designed to derive robust representation directly from noisy images.
Unlike earlier integer-order methods, our work is a more generic design taking such classical methods as special cases.
arXiv Detail & Related papers (2023-01-18T10:13:29Z) - Anticorrelated Noise Injection for Improved Generalization [6.970991851511823]
Injecting artificial noise into gradient descent (GD) is commonly employed to improve the performance of machine learning models.
It is, however, not known if this is optimal or whether other types of noise could provide better generalization performance.
We consider a variety of objective functions for which we find that GD with anticorrelated perturbations ("Anti-PGD") generalizes significantly better than GD and standard (uncorrelated) PGD.
arXiv Detail & Related papers (2022-02-06T18:52:21Z) - Optimizing Information-theoretical Generalization Bounds via Anisotropic
Noise in SGLD [73.55632827932101]
We optimize the information-theoretical generalization bound by manipulating the noise structure in SGLD.
We prove that with constraint to guarantee low empirical risk, the optimal noise covariance is the square root of the expected gradient covariance.
arXiv Detail & Related papers (2021-10-26T15:02:27Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Stochastic gradient descent with noise of machine learning type. Part
II: Continuous time analysis [0.0]
We show that in a certain noise regime, the optimization algorithm prefers 'flat' minima of the objective function in a sense which is different from the flat minimum selection of continuous time SGD with homogeneous noise.
arXiv Detail & Related papers (2021-06-04T16:34:32Z) - Stochastic gradient descent with noise of machine learning type. Part I:
Discrete time analysis [0.0]
gradient descent (SGD) is one of the most popular algorithms in modern machine learning.
In this article, we discuss some of the common properties of energy landscapes and the noise encountered in machine learning problems.
arXiv Detail & Related papers (2021-05-04T17:52:20Z) - Shape Matters: Understanding the Implicit Bias of the Noise Covariance [76.54300276636982]
Noise in gradient descent provides a crucial implicit regularization effect for training over parameterized models.
We show that parameter-dependent noise -- induced by mini-batches or label perturbation -- is far more effective than Gaussian noise.
Our analysis reveals that parameter-dependent noise introduces a bias towards local minima with smaller noise variance, whereas spherical Gaussian noise does not.
arXiv Detail & Related papers (2020-06-15T18:31:02Z) - Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient
Clipping [69.9674326582747]
We propose a new accelerated first-order method called clipped-SSTM for smooth convex optimization with heavy-tailed distributed noise in gradients.
We prove new complexity that outperform state-of-the-art results in this case.
We derive the first non-trivial high-probability complexity bounds for SGD with clipping without light-tails assumption on the noise.
arXiv Detail & Related papers (2020-05-21T17:05:27Z) - Using deep learning to understand and mitigate the qubit noise
environment [0.0]
We propose to address the challenge of extracting accurate noise spectra from time-dynamics measurements on qubits.
We demonstrate a neural network based methodology that allows for extraction of the noise spectrum associated with any qubit surrounded by an arbitrary bath.
Our results can be applied to a wide range of qubit platforms and provide a framework for improving qubit performance.
arXiv Detail & Related papers (2020-05-03T17:13:14Z) - Active Model Estimation in Markov Decision Processes [108.46146218973189]
We study the problem of efficient exploration in order to learn an accurate model of an environment, modeled as a Markov decision process (MDP)
We show that our Markov-based algorithm outperforms both our original algorithm and the maximum entropy algorithm in the small sample regime.
arXiv Detail & Related papers (2020-03-06T16:17:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.