On Acceleration of Gradient-Based Empirical Risk Minimization using
Local Polynomial Regression
- URL: http://arxiv.org/abs/2204.07702v1
- Date: Sat, 16 Apr 2022 02:39:45 GMT
- Title: On Acceleration of Gradient-Based Empirical Risk Minimization using
Local Polynomial Regression
- Authors: Ekaterina Trimbach, Edward Duc Hien Nguyen, and C\'esar A. Uribe
- Abstract summary: We study the acceleration of the Local Polynomial Interpolation-based Gradient Descent method (LPIGD) recently proposed for the approximate solution empirical risk problems (ERM)
We focus on loss functions that are strongly convex and smooth with condition number $sigma$.
We propose two accelerated methods for the problem based on LPI-GD and show an oracle complexity of $tildeOleft(sqrtsigma md log (1/varepsilon)$.
- Score: 0.491574468325115
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the acceleration of the Local Polynomial Interpolation-based
Gradient Descent method (LPI-GD) recently proposed for the approximate solution
of empirical risk minimization problems (ERM). We focus on loss functions that
are strongly convex and smooth with condition number $\sigma$. We additionally
assume the loss function is $\eta$-H\"older continuous with respect to the
data. The oracle complexity of LPI-GD is $\tilde{O}\left(\sigma m^d
\log(1/\varepsilon)\right)$ for a desired accuracy $\varepsilon$, where $d$ is
the dimension of the parameter space, and $m$ is the cardinality of an
approximation grid. The factor $m^d$ can be shown to scale as
$O((1/\varepsilon)^{d/2\eta})$. LPI-GD has been shown to have better oracle
complexity than gradient descent (GD) and stochastic gradient descent (SGD) for
certain parameter regimes. We propose two accelerated methods for the ERM
problem based on LPI-GD and show an oracle complexity of
$\tilde{O}\left(\sqrt{\sigma} m^d \log(1/\varepsilon)\right)$. Moreover, we
provide the first empirical study on local polynomial interpolation-based
gradient methods and corroborate that LPI-GD has better performance than GD and
SGD in some scenarios, and the proposed methods achieve acceleration.
Related papers
- Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs [56.237917407785545]
We consider the problem of learning an $varepsilon$-optimal policy in a general class of continuous-space Markov decision processes (MDPs) having smooth Bellman operators.
Key to our solution is a novel projection technique based on ideas from harmonic analysis.
Our result bridges the gap between two popular but conflicting perspectives on continuous-space MDPs.
arXiv Detail & Related papers (2024-05-10T09:58:47Z) - Span-Based Optimal Sample Complexity for Average Reward MDPs [6.996002801232415]
We study the sample complexity of learning an $varepsilon$-optimal policy in an average-reward Markov decision process (MDP) under a generative model.
We establish the complexity bound $widetildeOleft(SAfracH (1-gamma)2varepsilon2 right)$, where $H$ is the span of the bias function of the optimal policy and $SA$ is the cardinality of the state-action space.
arXiv Detail & Related papers (2023-11-22T15:34:44Z) - Near Sample-Optimal Reduction-based Policy Learning for Average Reward
MDP [58.13930707612128]
This work considers the sample complexity of obtaining an $varepsilon$-optimal policy in an average reward Markov Decision Process (AMDP)
We prove an upper bound of $widetilde O(H varepsilon-3 ln frac1delta)$ samples per state-action pair, where $H := sp(h*)$ is the span of bias of any optimal policy, $varepsilon$ is the accuracy and $delta$ is the failure probability.
arXiv Detail & Related papers (2022-12-01T15:57:58Z) - Adaptive Stochastic Variance Reduction for Non-convex Finite-Sum
Minimization [52.25843977506935]
We propose an adaptive variance method, called AdaSpider, for $L$-smooth, non-reduction functions with a finitesum structure.
In doing so, we are able to compute an $epsilon-stationary point with $tildeOleft + st/epsilon calls.
arXiv Detail & Related papers (2022-11-03T14:41:46Z) - Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model.
The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z) - Gradient Descent for Low-Rank Functions [36.56489593549855]
In machine learning tasks, e.g., training deep neural networks, the loss function varies significantly in only a few directions of the input.
Our proposed emphLowRank Descent finds an $epsilon gradient function by first identifying $mathcalO(plog(1/epsilon))$gd and $mathcalOp/epsilon2)$p/epsilon2)$.
arXiv Detail & Related papers (2022-06-16T15:58:05Z) - A Variance-Reduced Stochastic Accelerated Primal Dual Algorithm [3.2958527541557525]
Such problems arise frequently in machine learning in the context of robust empirical risk minimization.
We consider the accelerated primal dual (SAPD) algorithm as a robust method against gradient noise.
We show that our method improves upon SAPD both in practice and in theory.
arXiv Detail & Related papers (2022-02-19T22:12:30Z) - Faster Perturbed Stochastic Gradient Methods for Finding Local Minima [92.99933928528797]
We propose tttPullback, a faster perturbed gradient framework for finding local minima.
We show that Pullback with gradient estimators such as SARAH/SP and STORM can find $(epsilon, epsilon_H)$approximate local minima within $tilde O(epsilon-3 + H-6)$.
The core idea of our framework is a step-size pullback'' scheme to control the average movement of the gradient evaluations.
arXiv Detail & Related papers (2021-10-25T07:20:05Z) - Towards Noise-adaptive, Problem-adaptive Stochastic Gradient Descent [7.176107039687231]
We design step-size schemes that make gradient descent (SGD) adaptive to (i) the noise.
We prove that $T$ iterations of SGD with Nesterov iterations can be near optimal.
Compared to other step-size schemes, we demonstrate the effectiveness of a novel novel exponential step-size scheme.
arXiv Detail & Related papers (2021-10-21T19:22:14Z) - Gradient-Based Empirical Risk Minimization using Local Polynomial
Regression [39.29885444997579]
A major goal of this literature has been to compare different algorithms, such as gradient descent (GD) or gradient descent (SGD)
We demonstrate that when the loss function is smooth in the data, we can learn the oracle at every iteration and beat the oracle complexities of both GD and SGD.
arXiv Detail & Related papers (2020-11-04T20:10:31Z) - Faster Convergence of Stochastic Gradient Langevin Dynamics for
Non-Log-Concave Sampling [110.88857917726276]
We provide a new convergence analysis of gradient Langevin dynamics (SGLD) for sampling from a class of distributions that can be non-log-concave.
At the core of our approach is a novel conductance analysis of SGLD using an auxiliary time-reversible Markov Chain.
arXiv Detail & Related papers (2020-10-19T15:23:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.