Better scalability under potentially heavy-tailed feedback
- URL: http://arxiv.org/abs/2012.07346v1
- Date: Mon, 14 Dec 2020 08:56:04 GMT
- Title: Better scalability under potentially heavy-tailed feedback
- Authors: Matthew J. Holland
- Abstract summary: We study scalable alternatives to robust gradient descent (RGD) techniques that can be used when the losses and/or gradients can be heavy-tailed.
We focus computational effort on robustly choosing a strong candidate based on a collection of cheap sub-processes which can be run in parallel.
The exact selection process depends on the convexity of the underlying objective, but in all cases, our selection technique amounts to a robust form of boosting the confidence of weak learners.
- Score: 6.903929927172917
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study scalable alternatives to robust gradient descent (RGD) techniques
that can be used when the losses and/or gradients can be heavy-tailed, though
this will be unknown to the learner. The core technique is simple: instead of
trying to robustly aggregate gradients at each step, which is costly and leads
to sub-optimal dimension dependence in risk bounds, we instead focus
computational effort on robustly choosing (or newly constructing) a strong
candidate based on a collection of cheap stochastic sub-processes which can be
run in parallel. The exact selection process depends on the convexity of the
underlying objective, but in all cases, our selection technique amounts to a
robust form of boosting the confidence of weak learners. In addition to formal
guarantees, we also provide empirical analysis of robustness to perturbations
to experimental conditions, under both sub-Gaussian and heavy-tailed data,
along with applications to a variety of benchmark datasets. The overall
take-away is an extensible procedure that is simple to implement, trivial to
parallelize, which keeps the formal merits of RGD methods but scales much
better to large learning problems.
Related papers
- Dealing with unbounded gradients in stochastic saddle-point optimization [9.983014605039658]
We study the performance of first-order methods for finding saddle points of convex-concave functions.
A notorious challenge is that the gradients can grow arbitrarily large during optimization.
We propose a simple and effective regularization technique that stabilizes the iterates and yields meaningful performance guarantees.
arXiv Detail & Related papers (2024-02-21T16:13:49Z) - Policy Gradient for Rectangular Robust Markov Decision Processes [62.397882389472564]
We introduce robust policy gradient (RPG), a policy-based method that efficiently solves rectangular robust Markov decision processes (MDPs)
Our resulting RPG can be estimated from data with the same time complexity as its non-robust equivalent.
arXiv Detail & Related papers (2023-01-31T12:40:50Z) - Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - A Boosting Approach to Reinforcement Learning [59.46285581748018]
We study efficient algorithms for reinforcement learning in decision processes whose complexity is independent of the number of states.
We give an efficient algorithm that is capable of improving the accuracy of such weak learning methods.
arXiv Detail & Related papers (2021-08-22T16:00:45Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Scalable Marginal Likelihood Estimation for Model Selection in Deep
Learning [78.83598532168256]
Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties.
Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
arXiv Detail & Related papers (2021-04-11T09:50:24Z) - Learning while Respecting Privacy and Robustness to Distributional
Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model.
The objective is to endow the trained model with robustness against adversarially manipulated input data.
Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z) - The Strength of Nesterov's Extrapolation in the Individual Convergence
of Nonsmooth Optimization [0.0]
We prove that Nesterov's extrapolation has the strength to make the individual convergence of gradient descent methods optimal for nonsmooth problems.
We give an extension of the derived algorithms to solve regularized learning tasks with nonsmooth losses in settings.
Our method is applicable as an efficient tool for solving large-scale $l$1-regularized hinge-loss learning problems.
arXiv Detail & Related papers (2020-06-08T03:35:41Z) - Improved scalability under heavy tails, without strong convexity [9.36599317326032]
We study a simple algorithmic strategy that can be leveraged when both losses and gradients can be heavy-tailed.
We show that under heavy-tailed losses, the proposed procedure cannot simply be replaced with naive cross-validation.
We have a scalable method with transparent guarantees, which performs well without prior knowledge of how "convenient" the feedback it receives will be.
arXiv Detail & Related papers (2020-06-02T03:12:17Z) - Better scalability under potentially heavy-tailed gradients [9.36599317326032]
We study a scalable alternative to robust gradient descent (RGD) techniques that can be used when the gradients can be heavy-tailed.
The core technique is simple: instead of trying to robustly aggregate gradients at each step, we choose a candidate which does not diverge too far from the majority of cheap sub-processes run for a single pass over partitioned data.
arXiv Detail & Related papers (2020-06-01T08:16:56Z) - A Graduated Filter Method for Large Scale Robust Estimation [32.08441889054456]
We introduce a novel solver for robust estimation that possesses a strong ability to escape poor local minima.
Our algorithm is built upon the graduated-of-the-art methods to solve problems having many poor local minima.
arXiv Detail & Related papers (2020-03-20T02:51:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.