Generalization Error Bounds for Noisy, Iterative Algorithms via Maximal
Leakage
- URL: http://arxiv.org/abs/2302.14518v2
- Date: Wed, 19 Jul 2023 13:48:46 GMT
- Title: Generalization Error Bounds for Noisy, Iterative Algorithms via Maximal
Leakage
- Authors: Ibrahim Issa, Amedeo Roberto Esposito, Michael Gastpar
- Abstract summary: We adopt an information-theoretic framework to analyze the generalization behavior of a class of noisy learning algorithms.
We show how the assumptions on the update function affect the optimal choice of the noise.
- Score: 24.40306100502023
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We adopt an information-theoretic framework to analyze the generalization
behavior of the class of iterative, noisy learning algorithms. This class is
particularly suitable for study under information-theoretic metrics as the
algorithms are inherently randomized, and it includes commonly used algorithms
such as Stochastic Gradient Langevin Dynamics (SGLD). Herein, we use the
maximal leakage (equivalently, the Sibson mutual information of order infinity)
metric, as it is simple to analyze, and it implies both bounds on the
probability of having a large generalization error and on its expected value.
We show that, if the update function (e.g., gradient) is bounded in $L_2$-norm
and the additive noise is isotropic Gaussian noise, then one can obtain an
upper-bound on maximal leakage in semi-closed form. Furthermore, we demonstrate
how the assumptions on the update function affect the optimal (in the sense of
minimizing the induced maximal leakage) choice of the noise. Finally, we
compute explicit tight upper bounds on the induced maximal leakage for other
scenarios of interest.
Related papers
- Information limits and Thouless-Anderson-Palmer equations for spiked matrix models with structured noise [19.496063739638924]
We consider a saturate problem of Bayesian inference for a structured spiked model.
We show how to predict the statistical limits using an efficient algorithm inspired by the theory of adaptive Thouless-Anderson-Palmer equations.
arXiv Detail & Related papers (2024-05-31T16:38:35Z) - Gradient-free optimization of highly smooth functions: improved analysis
and a new algorithm [87.22224691317766]
This work studies problems with zero-order noisy oracle information under the assumption that the objective function is highly smooth.
We consider two kinds of zero-order projected gradient descent algorithms.
arXiv Detail & Related papers (2023-06-03T17:05:13Z) - Smoothed Analysis of Sequential Probability Assignment [16.090378928208885]
We study information-theoretically optimal minmax rates as well as a framework for algorithmic reduction involving the maximum likelihood estimator oracle.
Our approach establishes a general-purpose reduction from minimax rates for sequential probability assignment for smoothed adversaries to minimax rates for transductive learning.
arXiv Detail & Related papers (2023-03-08T19:25:57Z) - An Algebraically Converging Stochastic Gradient Descent Algorithm for
Global Optimization [14.336473214524663]
A key component in the algorithm is the randomness based on the value of the objective function.
We prove the convergence of the algorithm with an algebra and tuning in the parameter space.
We present several numerical examples to demonstrate the efficiency and robustness of the algorithm.
arXiv Detail & Related papers (2022-04-12T16:27:49Z) - Mirror Descent Strikes Again: Optimal Stochastic Convex Optimization
under Infinite Noise Variance [17.199063087458907]
We quantify the convergence rate of the Mirror Descent algorithm with a class of uniformly convex mirror maps.
This algorithm does not require any explicit gradient clipping or normalization.
We complement our results with information-theoretic lower bounds showing that no other algorithm using only first-order oracles can achieve improved rates.
arXiv Detail & Related papers (2022-02-23T17:08:40Z) - Optimizing Information-theoretical Generalization Bounds via Anisotropic
Noise in SGLD [73.55632827932101]
We optimize the information-theoretical generalization bound by manipulating the noise structure in SGLD.
We prove that with constraint to guarantee low empirical risk, the optimal noise covariance is the square root of the expected gradient covariance.
arXiv Detail & Related papers (2021-10-26T15:02:27Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - ROOT-SGD: Sharp Nonasymptotics and Near-Optimal Asymptotics in a Single Algorithm [71.13558000599839]
We study the problem of solving strongly convex and smooth unconstrained optimization problems using first-order algorithms.
We devise a novel, referred to as Recursive One-Over-T SGD, based on an easily implementable, averaging of past gradients.
We prove that it simultaneously achieves state-of-the-art performance in both a finite-sample, nonasymptotic sense and an sense.
arXiv Detail & Related papers (2020-08-28T14:46:56Z) - Exploiting Higher Order Smoothness in Derivative-free Optimization and
Continuous Bandits [99.70167985955352]
We study the problem of zero-order optimization of a strongly convex function.
We consider a randomized approximation of the projected gradient descent algorithm.
Our results imply that the zero-order algorithm is nearly optimal in terms of sample complexity and the problem parameters.
arXiv Detail & Related papers (2020-06-14T10:42:23Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z) - Stochastic Approximation with Markov Noise: Analysis and applications in
reinforcement learning [0.0]
We present for the first time an convergence analysis of two time-scale approximation driven by "controlled" Markov noise.
We analyze the behavior of our framework by relating it to limiting differential inclusions in both time scales.
We obtain the first informative error bounds on function approximation for the policy evaluation algorithm.
arXiv Detail & Related papers (2020-04-08T03:59:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.