Improved Learning Rates for Stochastic Optimization: Two Theoretical
Viewpoints
- URL: http://arxiv.org/abs/2107.08686v1
- Date: Mon, 19 Jul 2021 08:46:14 GMT
- Title: Improved Learning Rates for Stochastic Optimization: Two Theoretical
Viewpoints
- Authors: Shaojie Li and Yong Liu
- Abstract summary: Generalization performance of optimization stands a central place in machine learning.
In this paper, we investigate the excess towards improved learning rates for two popular approaches of optimization.
Motivated by these problems, we aim to provide improved rates under milder assumptions in convex learning and derive faster rates non learning.
- Score: 7.33244617309908
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generalization performance of stochastic optimization stands a central place
in machine learning. In this paper, we investigate the excess risk performance
and towards improved learning rates for two popular approaches of stochastic
optimization: empirical risk minimization (ERM) and stochastic gradient descent
(SGD). Although there exists plentiful generalization analysis of ERM and SGD
for supervised learning, current theoretical understandings of ERM and SGD are
either have stronger assumptions in convex learning, e.g., strong convexity
condition, or show slow rates and less studied in nonconvex learning. Motivated
by these problems, we aim to provide improved rates under milder assumptions in
convex learning and derive faster rates in nonconvex learning. It is notable
that our analysis span two popular theoretical viewpoints: stability and
uniform convergence. To be specific, in stability regime, we present high
probability rates of order $\mathcal{O} (1/n)$ w.r.t. the sample size $n$ for
ERM and SGD with milder assumptions in convex learning and similar high
probability rates of order $\mathcal{O} (1/n)$ in nonconvex learning, rather
than in expectation. Furthermore, this type of learning rate is improved to
faster order $\mathcal{O} (1/n^2)$ in uniform convergence regime. To the best
of our knowledge, for ERM and SGD, the learning rates presented in this paper
are all state-of-the-art.
Related papers
- A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning.
These problems are often formalized as Bi-Level optimizations (BLO)
We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z) - Beyond Expectations: Learning with Stochastic Dominance Made Practical [88.06211893690964]
dominance models risk-averse preferences for decision making with uncertain outcomes.
Despite theoretically appealing, the application of dominance in machine learning has been scarce.
We first generalize the dominance concept to enable feasible comparisons between any arbitrary pair of random variables.
We then develop a simple and efficient approach for finding the optimal solution in terms of dominance.
arXiv Detail & Related papers (2024-02-05T03:21:23Z) - PROMISE: Preconditioned Stochastic Optimization Methods by Incorporating Scalable Curvature Estimates [17.777466668123886]
We introduce PROMISE ($textbfPr$econditioned $textbfO$ptimization $textbfM$ethods by $textbfI$ncorporating $textbfS$calable Curvature $textbfE$stimates), a suite of sketching-based preconditioned gradient algorithms.
PROMISE includes preconditioned versions of SVRG, SAGA, and Katyusha.
arXiv Detail & Related papers (2023-09-05T07:49:10Z) - On the Stability and Generalization of Triplet Learning [55.75784102837832]
Triplet learning, i.e. learning from triplet data, has attracted much attention in computer vision tasks.
This paper investigates the generalization guarantees of triplet learning by leveraging the stability analysis.
arXiv Detail & Related papers (2023-02-20T07:32:50Z) - Benign Underfitting of Stochastic Gradient Descent [72.38051710389732]
We study to what extent may gradient descent (SGD) be understood as a "conventional" learning rule that achieves generalization performance by obtaining a good fit training data.
We analyze the closely related with-replacement SGD, for which an analogous phenomenon does not occur and prove that its population risk does in fact converge at the optimal rate.
arXiv Detail & Related papers (2022-02-27T13:25:01Z) - Learning Rates for Nonconvex Pairwise Learning [7.33244617309908]
Pairwise learning is receiving increasing attention since it improve many important learning tasks based on the size of the population.
Motivated nonwise learning provides improved learning rates.
arXiv Detail & Related papers (2021-11-09T16:12:20Z) - A High Probability Analysis of Adaptive SGD with Momentum [22.9530287983179]
Gradient Descent (DSG) and its variants are the most used algorithms in machine learning applications.
We show for the first time the probability of the gradients to zero in smooth non setting for DelayedGrad with momentum.
arXiv Detail & Related papers (2020-07-28T15:06:22Z) - Fast Objective & Duality Gap Convergence for Non-Convex Strongly-Concave
Min-Max Problems with PL Condition [52.08417569774822]
This paper focuses on methods for solving smooth non-concave min-max problems, which have received increasing attention due to deep learning (e.g., deep AUC)
arXiv Detail & Related papers (2020-06-12T00:32:21Z) - On Learning Rates and Schr\"odinger Operators [105.32118775014015]
We present a general theoretical analysis of the effect of the learning rate.
We find that the learning rate tends to zero for a broad non- neural class functions.
arXiv Detail & Related papers (2020-04-15T09:52:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.