Improved scalability under heavy tails, without strong convexity
- URL: http://arxiv.org/abs/2006.01364v2
- Date: Tue, 15 Dec 2020 04:46:42 GMT
- Title: Improved scalability under heavy tails, without strong convexity
- Authors: Matthew J. Holland
- Abstract summary: We study a simple algorithmic strategy that can be leveraged when both losses and gradients can be heavy-tailed.
We show that under heavy-tailed losses, the proposed procedure cannot simply be replaced with naive cross-validation.
We have a scalable method with transparent guarantees, which performs well without prior knowledge of how "convenient" the feedback it receives will be.
- Score: 9.36599317326032
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Real-world data is laden with outlying values. The challenge for machine
learning is that the learner typically has no prior knowledge of whether the
feedback it receives (losses, gradients, etc.) will be heavy-tailed or not. In
this work, we study a simple algorithmic strategy that can be leveraged when
both losses and gradients can be heavy-tailed. The core technique introduces a
simple robust validation sub-routine, which is used to boost the confidence of
inexpensive gradient-based sub-processes. Compared with recent robust gradient
descent methods from the literature, dimension dependence (both risk bounds and
cost) is substantially improved, without relying upon strong convexity or
expensive per-step robustification. Empirically, we also show that under
heavy-tailed losses, the proposed procedure cannot simply be replaced with
naive cross-validation. Taken together, we have a scalable method with
transparent guarantees, which performs well without prior knowledge of how
"convenient" the feedback it receives will be.
Related papers
- On the Convergence of Gradient Descent for Large Learning Rates [55.33626480243135]
We show that convergence is impossible when a fixed step size is used.
We provide a proof of this in the case of linear neural networks with a squared loss.
We also prove the impossibility of convergence for more general losses without requiring strong assumptions such as Lipschitz continuity for the gradient.
arXiv Detail & Related papers (2024-02-20T16:01:42Z) - Enhancing Consistency and Mitigating Bias: A Data Replay Approach for
Incremental Learning [100.7407460674153]
Deep learning systems are prone to catastrophic forgetting when learning from a sequence of tasks.
To mitigate the problem, a line of methods propose to replay the data of experienced tasks when learning new tasks.
However, it is not expected in practice considering the memory constraint or data privacy issue.
As a replacement, data-free data replay methods are proposed by inverting samples from the classification model.
arXiv Detail & Related papers (2024-01-12T12:51:12Z) - Sampling-based Fast Gradient Rescaling Method for Highly Transferable
Adversarial Attacks [18.05924632169541]
We propose a Sampling-based Fast Gradient Rescaling Method (S-FGRM)
Specifically, we use data rescaling to substitute the sign function without extra computational cost.
Our method could significantly boost the transferability of gradient-based attacks and outperform the state-of-the-art baselines.
arXiv Detail & Related papers (2023-07-06T07:52:42Z) - SoftMatch: Addressing the Quantity-Quality Trade-off in Semi-supervised
Learning [101.86916775218403]
This paper revisits the popular pseudo-labeling methods via a unified sample weighting formulation.
We propose SoftMatch to overcome the trade-off by maintaining both high quantity and high quality of pseudo-labels during training.
In experiments, SoftMatch shows substantial improvements across a wide variety of benchmarks, including image, text, and imbalanced classification.
arXiv Detail & Related papers (2023-01-26T03:53:25Z) - A Boosting Approach to Reinforcement Learning [59.46285581748018]
We study efficient algorithms for reinforcement learning in decision processes whose complexity is independent of the number of states.
We give an efficient algorithm that is capable of improving the accuracy of such weak learning methods.
arXiv Detail & Related papers (2021-08-22T16:00:45Z) - Efficient First-Order Contextual Bandits: Prediction, Allocation, and
Triangular Discrimination [82.52105963476703]
A recurring theme in statistical learning, online learning, and beyond is that faster convergence rates are possible for problems with low noise.
First-order guarantees are relatively well understood in statistical and online learning.
We show that the logarithmic loss and an information-theoretic quantity called the triangular discrimination play a fundamental role in obtaining first-order guarantees.
arXiv Detail & Related papers (2021-07-05T19:20:34Z) - Robust learning with anytime-guaranteed feedback [6.903929927172917]
gradient-based learning algorithms are driven by queried feedback with almost no performance guarantees.
Here we explore a modified "anytime online-to-batch" mechanism which admits high-probability error bounds.
In practice, we show noteworthy gains on real-world data applications.
arXiv Detail & Related papers (2021-05-24T07:31:52Z) - Better scalability under potentially heavy-tailed feedback [6.903929927172917]
We study scalable alternatives to robust gradient descent (RGD) techniques that can be used when the losses and/or gradients can be heavy-tailed.
We focus computational effort on robustly choosing a strong candidate based on a collection of cheap sub-processes which can be run in parallel.
The exact selection process depends on the convexity of the underlying objective, but in all cases, our selection technique amounts to a robust form of boosting the confidence of weak learners.
arXiv Detail & Related papers (2020-12-14T08:56:04Z) - SSGD: A safe and efficient method of gradient descent [0.5099811144731619]
gradient descent method plays an important role in solving various optimization problems.
Super gradient descent approach to update parameters by concealing the length of gradient.
Our algorithm can defend against attacks on the gradient.
arXiv Detail & Related papers (2020-12-03T17:09:20Z) - Unbiased Risk Estimators Can Mislead: A Case Study of Learning with
Complementary Labels [92.98756432746482]
We study a weakly supervised problem called learning with complementary labels.
We show that the quality of gradient estimation matters more in risk minimization.
We propose a novel surrogate complementary loss(SCL) framework that trades zero bias with reduced variance.
arXiv Detail & Related papers (2020-07-05T04:19:37Z) - Better scalability under potentially heavy-tailed gradients [9.36599317326032]
We study a scalable alternative to robust gradient descent (RGD) techniques that can be used when the gradients can be heavy-tailed.
The core technique is simple: instead of trying to robustly aggregate gradients at each step, we choose a candidate which does not diverge too far from the majority of cheap sub-processes run for a single pass over partitioned data.
arXiv Detail & Related papers (2020-06-01T08:16:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.