Limitations of Information-Theoretic Generalization Bounds for Gradient
Descent Methods in Stochastic Convex Optimization
- URL: http://arxiv.org/abs/2212.13556v1
- Date: Tue, 27 Dec 2022 17:16:48 GMT
- Title: Limitations of Information-Theoretic Generalization Bounds for Gradient
Descent Methods in Stochastic Convex Optimization
- Authors: Mahdi Haghifam, Borja Rodr\'iguez-G\'alvez, Ragnar Thobaben, Mikael
Skoglund, Daniel M. Roy, Gintare Karolina Dziugaite
- Abstract summary: We consider the prospect of establishing minimax rates for gradient descent in the setting of convex optimization.
We consider a common tactic employed in studying gradient methods, whereby the final iterate is corrupted by Gaussian noise, producing a noisy "surrogate" algorithm.
Our results suggest that new ideas are required to analyze gradient descent using information-theoretic techniques.
- Score: 48.12845778927164
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: To date, no "information-theoretic" frameworks for reasoning about
generalization error have been shown to establish minimax rates for gradient
descent in the setting of stochastic convex optimization. In this work, we
consider the prospect of establishing such rates via several existing
information-theoretic frameworks: input-output mutual information bounds,
conditional mutual information bounds and variants, PAC-Bayes bounds, and
recent conditional variants thereof. We prove that none of these bounds are
able to establish minimax rates. We then consider a common tactic employed in
studying gradient methods, whereby the final iterate is corrupted by Gaussian
noise, producing a noisy "surrogate" algorithm. We prove that minimax rates
cannot be established via the analysis of such surrogates. Our results suggest
that new ideas are required to analyze gradient descent using
information-theoretic techniques.
Related papers
- A KL-based Analysis Framework with Applications to Non-Descent Optimization Methods [5.779838187603272]
We propose a novel framework for non-descent-type optimization methodologies in non-descent-type scenarios based on the Kurdyka-Lojasiewicz property.
arXiv Detail & Related papers (2024-06-04T12:49:46Z) - Riemannian stochastic optimization methods avoid strict saddle points [68.80251170757647]
We show that policies under study avoid strict saddle points / submanifolds with probability 1.
This result provides an important sanity check as it shows that, almost always, the limit state of an algorithm can only be a local minimizer.
arXiv Detail & Related papers (2023-11-04T11:12:24Z) - Convex and Non-convex Optimization Under Generalized Smoothness [69.69521650503431]
An analysis of convex and non- optimization methods often requires the Lipsitzness gradient, which limits the analysis by this trajectorys.
Recent work generalizes the gradient setting via the non-uniform smoothness condition.
arXiv Detail & Related papers (2023-06-02T04:21:59Z) - Robust empirical risk minimization via Newton's method [9.797319790710711]
A new variant of Newton's method for empirical risk minimization is studied.
The gradient and Hessian of the objective function are replaced by robust estimators.
An algorithm for obtaining robust Newton directions based on the conjugate gradient method is also proposed.
arXiv Detail & Related papers (2023-01-30T18:54:54Z) - Faster One-Sample Stochastic Conditional Gradient Method for Composite
Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms.
The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z) - Optimizing Information-theoretical Generalization Bounds via Anisotropic
Noise in SGLD [73.55632827932101]
We optimize the information-theoretical generalization bound by manipulating the noise structure in SGLD.
We prove that with constraint to guarantee low empirical risk, the optimal noise covariance is the square root of the expected gradient covariance.
arXiv Detail & Related papers (2021-10-26T15:02:27Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Incremental Without Replacement Sampling in Nonconvex Optimization [0.0]
Minibatch decomposition methods for empirical risk are commonly analysed in an approximation setting, also known as sampling with replacement.
On the other hands modern implementations of such techniques are incremental: they rely on sampling without replacement, for which available analysis are much scarcer.
We provide convergence guaranties for the latter variant by analysing a versatile incremental gradient scheme.
arXiv Detail & Related papers (2020-07-15T09:17:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.