Related papers: The Statistical Complexity of Early-Stopped Mirror Descent

The Statistical Complexity of Early-Stopped Mirror Descent

URL: http://arxiv.org/abs/2002.00189v2
Date: Thu, 27 Aug 2020 15:45:06 GMT
Title: The Statistical Complexity of Early-Stopped Mirror Descent
Authors: Tomas Va\v{s}kevi\v{c}ius, Varun Kanade, Patrick Rebeschini
Abstract summary: We study the statistical guarantees on the excess risk achieved by early-stopped unconstrained mirror descent algorithms. By completing an inequality that characterizes convexity for the squared loss, we identify an intrinsic link between offset Rademacher complexities and potential-based convergence analysis of mirror descent methods.
Score: 27.393821783237186
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently there has been a surge of interest in understanding implicit regularization properties of iterative gradient-based optimization algorithms. In this paper, we study the statistical guarantees on the excess risk achieved by early-stopped unconstrained mirror descent algorithms applied to the unregularized empirical risk with the squared loss for linear models and kernel methods. By completing an inequality that characterizes convexity for the squared loss, we identify an intrinsic link between offset Rademacher complexities and potential-based convergence analysis of mirror descent methods. Our observation immediately yields excess risk guarantees for the path traced by the iterates of mirror descent in terms of offset complexities of certain function classes depending only on the choice of the mirror map, initialization point, step-size, and the number of iterations. We apply our theory to recover, in a clean and elegant manner via rather short proofs, some of the recent results in the implicit regularization literature, while also showing how to improve upon them in some settings.

Related papers

Early-Stopped Mirror Descent for Linear Regression over Convex Bodies [14.30754799752932]
We study the setting of high-dimensional linear regression under additive Gaussian noise. We show that the worst-case risk of unconstrained early-stopped mirror descent with an appropriate potential is at most that of the least squares estimator constrained to the convex body.
arXiv Detail & Related papers (2025-03-05T11:59:31Z)
A Mirror Descent-Based Algorithm for Corruption-Tolerant Distributed Gradient Descent [57.64826450787237]
We show how to analyze the behavior of distributed gradient descent algorithms in the presence of adversarial corruptions. We show how to use ideas from (lazy) mirror descent to design a corruption-tolerant distributed optimization algorithm. Experiments based on linear regression, support vector classification, and softmax classification on the MNIST dataset corroborate our theoretical findings.
arXiv Detail & Related papers (2024-07-19T08:29:12Z)
Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training. We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z)
Implicit Regularization for Group Sparsity [33.487964460794764]
We show that gradient descent over the squared regression loss, without any explicit regularization, biases towards solutions with a group sparsity structure. We analyze the gradient dynamics of the corresponding regression problem in the general noise setting and obtain minimax-optimal error rates. In the degenerate case of size-one groups, our approach gives rise to a new algorithm for sparse linear regression.
arXiv Detail & Related papers (2023-01-29T20:54:03Z)
Vector-Valued Least-Squares Regression under Output Regularity Assumptions [73.99064151691597]
We propose and analyse a reduced-rank method for solving least-squares regression problems with infinite dimensional output. We derive learning bounds for our method, and study under which setting statistical performance is improved in comparison to full-rank method.
arXiv Detail & Related papers (2022-11-16T15:07:00Z)
Robust Imitation via Mirror Descent Inverse Reinforcement Learning [18.941048578572577]
This paper proposes to predict a sequence of reward functions, which are iterative solutions for a constrained convex problem. We prove that the proposed mirror descent update rule ensures robust minimization of a Bregman divergence. Our IRL method was applied on top of an adversarial framework, and it outperformed existing adversarial methods in an extensive suite of benchmarks.
arXiv Detail & Related papers (2022-10-20T12:25:21Z)
Optimal Rates for Random Order Online Optimization [60.011653053877126]
We study the citetgarber 2020online, where the loss functions may be chosen by an adversary, but are then presented online in a uniformly random order. We show that citetgarber 2020online algorithms achieve the optimal bounds and significantly improve their stability.
arXiv Detail & Related papers (2021-06-29T09:48:46Z)
Implicit Regularization in Matrix Sensing via Mirror Descent [29.206451882562867]
We study discrete-time mirror descent applied to the unregularized empirical risk in matrix sensing. We show that gradient descent with full-rank factorized parametrization is a first-order approximation to mirror descent.
arXiv Detail & Related papers (2021-05-28T13:46:47Z)
Nearly Minimax-Optimal Rates for Noisy Sparse Phase Retrieval via Early-Stopped Mirror Descent [29.206451882562867]
This paper studies early-stopped mirror descent applied to noisy noisy phase retrieval. We find that a simple algorithm does not rely on explicit regularization or threshold steps to promote sparsity.
arXiv Detail & Related papers (2021-05-08T11:22:19Z)
Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically. This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression. We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z)
A Continuous-Time Mirror Descent Approach to Sparse Phase Retrieval [24.17778927729799]
We analyze continuous-time mirror applied to sparse phase retrieval. It is the problem of recovering sparse signals from a set of only measurements. We provide a convergence analysis algorithm for this problem.
arXiv Detail & Related papers (2020-10-20T10:03:44Z)
Understanding Implicit Regularization in Over-Parameterized Single Index Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model. We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.