Related papers: MBGDT:Robust Mini-Batch Gradient Descent

MBGDT:Robust Mini-Batch Gradient Descent

URL: http://arxiv.org/abs/2206.07139v1
Date: Tue, 14 Jun 2022 19:52:23 GMT
Title: MBGDT:Robust Mini-Batch Gradient Descent
Authors: Hanming Wang, Haozheng Luo, Yue Wang
Abstract summary: We introduce a new method with the base learner, such as Bayesian regression or gradient descent, to solve the problem of the vulnerability in the model. Because the mini-batch gradient descent allows for a more robust convergence, we work a method with the mini-batch gradient descent, called Mini-Batch Gradient Descent with Trimming (MBGDT) Our method show state-of-art performance and have greater robustness than several baselines when we apply our method in designed dataset.
Score: 4.141960931064351
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In high dimensions, most machine learning method perform fragile even there are a little outliers. To address this, we hope to introduce a new method with the base learner, such as Bayesian regression or stochastic gradient descent to solve the problem of the vulnerability in the model. Because the mini-batch gradient descent allows for a more robust convergence than the batch gradient descent, we work a method with the mini-batch gradient descent, called Mini-Batch Gradient Descent with Trimming (MBGDT). Our method show state-of-art performance and have greater robustness than several baselines when we apply our method in designed dataset.

Related papers

Discrete error dynamics of mini-batch gradient descent for least squares regression [4.159762735751163]
We study the dynamics of mini-batch gradient descent for at least squares when sampling without replacement. We also study discretization effects that a continuous-time gradient flow analysis cannot detect, and show that minibatch gradient descent converges to a step-size dependent solution.
arXiv Detail & Related papers (2024-06-06T02:26:14Z)
Careful with that Scalpel: Improving Gradient Surgery with an EMA [32.73961859864032]
We show how one can improve performance by blending the gradients beyond a simple sum. We demonstrate that our method, Bloop, can lead to much better performances on NLP and vision experiments.
arXiv Detail & Related papers (2024-02-05T13:37:00Z)
A Negative Result on Gradient Matching for Selective Backprop [8.463693396893731]
Training deep neural networks becomes a massive computational burden. One approach to speed up the training process is Selective Backprop. We build on this approach by choosing the (weighted) subset which best matches the mean gradient over the entire minibatch. We find that both the loss-based as well as the gradient-matching strategy fail to consistently outperform the random baseline.
arXiv Detail & Related papers (2023-12-08T13:03:10Z)
Aiming towards the minimizers: fast convergence of SGD for overparametrized problems [25.077446336619378]
We propose a regularity regime which endows the gradient method with the same worst-case complexity as the gradient method. All existing guarantees require the gradient method to take small steps, thereby resulting in a much slower linear rate of convergence. We demonstrate that our condition holds when training sufficiently wide feedforward neural networks with a linear output layer.
arXiv Detail & Related papers (2023-06-05T05:21:01Z)
Scaling Forward Gradient With Local Losses [117.22685584919756]
Forward learning is a biologically plausible alternative to backprop for learning deep neural networks. We show that it is possible to substantially reduce the variance of the forward gradient by applying perturbations to activations rather than weights. Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.
arXiv Detail & Related papers (2022-10-07T03:52:27Z)
Differentiable Annealed Importance Sampling and the Perils of Gradient Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation. Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective. We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z)
Staircase Sign Method for Boosting Adversarial Attacks [123.19227129979943]
Crafting adversarial examples for the transfer-based attack is challenging and remains a research hot spot. We propose a novel Staircase Sign Method (S$2$M) to alleviate this issue, thus boosting transfer-based attacks. Our method can be generally integrated into any transfer-based attacks, and the computational overhead is negligible.
arXiv Detail & Related papers (2021-04-20T02:31:55Z)
Attentional-Biased Stochastic Gradient Descent [74.49926199036481]
We present a provable method (named ABSGD) for addressing the data imbalance or label noise problem in deep learning. Our method is a simple modification to momentum SGD where we assign an individual importance weight to each sample in the mini-batch. ABSGD is flexible enough to combine with other robust losses without any additional cost.
arXiv Detail & Related papers (2020-12-13T03:41:52Z)
SSGD: A safe and efficient method of gradient descent [0.5099811144731619]
gradient descent method plays an important role in solving various optimization problems. Super gradient descent approach to update parameters by concealing the length of gradient. Our algorithm can defend against attacks on the gradient.
arXiv Detail & Related papers (2020-12-03T17:09:20Z)
Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose. We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
Carath\'eodory Sampling for Stochastic Gradient Descent [79.55586575988292]
We present an approach that is inspired by classical results of Tchakaloff and Carath'eodory about measure reduction. We adaptively select the descent steps where the measure reduction is carried out. We combine this with Block Coordinate Descent so that measure reduction can be done very cheaply.
arXiv Detail & Related papers (2020-06-02T17:52:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.