Related papers: TAdam: A Robust Stochastic Gradient Optimizer

TAdam: A Robust Stochastic Gradient Optimizer

URL: http://arxiv.org/abs/2003.00179v2
Date: Tue, 3 Mar 2020 03:50:48 GMT
Title: TAdam: A Robust Stochastic Gradient Optimizer
Authors: Wendyam Eric Lionel Ilboudo, Taisuke Kobayashi, and Kenji Sugimoto
Abstract summary: Machine learning algorithms aim to find patterns from observations, which may include some noise, especially in robotics domain. To perform well even with such noise, we expect them to be able to detect outliers and discard them when needed. We propose a new gradient optimization method, whose robustness is directly built in the algorithm, using the robust student-t distribution as its core idea.
Score: 6.973803123972298
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Machine learning algorithms aim to find patterns from observations, which may include some noise, especially in robotics domain. To perform well even with such noise, we expect them to be able to detect outliers and discard them when needed. We therefore propose a new stochastic gradient optimization method, whose robustness is directly built in the algorithm, using the robust student-t distribution as its core idea. Adam, the popular optimization method, is modified with our method and the resultant optimizer, so-called TAdam, is shown to effectively outperform Adam in terms of robustness against noise on diverse task, ranging from regression and classification to reinforcement learning problems. The implementation of our algorithm can be found at https://github.com/Mahoumaru/TAdam.git

Related papers

Outlier-Robust Training of Machine Learning Models [21.352210662488112]
We propose an Adaptive Alternation Algorithm for training machine learning models with outliers. The algorithm iteratively trains the model by using a weighted version of the non-robust loss, while updating the weights at each. Considering arbitrary outliers (i.e., with no distributional assumption on the outliers), we show that the use of robust loss kernels sigma increases the region of convergence.
arXiv Detail & Related papers (2024-12-31T04:19:53Z)
Towards Simple and Provable Parameter-Free Adaptive Gradient Methods [56.060918447252625]
We present AdaGrad++ and Adam++, novel and simple parameter-free variants of AdaGrad and Adam with convergence guarantees. We prove that AdaGrad++ achieves comparable convergence rates to AdaGrad in convex optimization without predefined learning rate assumptions. Similarly, Adam++ matches the convergence rate of Adam without relying on any conditions on the learning rates.
arXiv Detail & Related papers (2024-12-27T04:22:02Z)
Accelerated zero-order SGD under high-order smoothness and overparameterized regime [79.85163929026146]
We present a novel gradient-free algorithm to solve convex optimization problems. Such problems are encountered in medicine, physics, and machine learning. We provide convergence guarantees for the proposed algorithm under both types of noise.
arXiv Detail & Related papers (2024-11-21T10:26:17Z)
MARS: Unleashing the Power of Variance Reduction for Training Large Models [56.47014540413659]
We propose a unified training framework for deep neural networks. We introduce three instances of MARS that leverage preconditioned gradient optimization. Results indicate that the implementation of MARS consistently outperforms Adam.
arXiv Detail & Related papers (2024-11-15T18:57:39Z)
Pivotal Auto-Encoder via Self-Normalizing ReLU [20.76999663290342]
We formalize single hidden layer sparse auto-encoders as a transform learning problem. We propose an optimization problem that leads to a predictive model invariant to the noise level at test time. Our experimental results demonstrate that the trained models yield a significant improvement in stability against varying types of noise.
arXiv Detail & Related papers (2024-06-23T09:06:52Z)
On Convergence of Adam for Stochastic Optimization under Relaxed Assumptions [4.9495085874952895]
Adaptive Momentum Estimation (Adam) algorithm is highly effective in various deep learning tasks. We show that Adam can find a stationary point variance with a rate in high iterations under this general noise model.
arXiv Detail & Related papers (2024-02-06T13:19:26Z)
Semi-Bandit Learning for Monotone Stochastic Optimization [20.776114616154242]
We provide a generic online learning algorithm for a class of "monotone" problems. Our framework applies to several fundamental problems in optimization such as prophet, Pandora's box knapsack, inequality matchings and submodular optimization.
arXiv Detail & Related papers (2023-12-24T07:46:37Z)
StochGradAdam: Accelerating Neural Networks Training with Stochastic Gradient Sampling [0.0]
We introduce StochGradAdam, a novel extension of the Adam algorithm, incorporating gradient sampling techniques. StochGradAdam achieves comparable or superior performance to Adam, even when using fewer gradient updates per iteration. The results suggest that this approach is particularly effective for large-scale models and datasets.
arXiv Detail & Related papers (2023-10-25T22:45:31Z)
Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels [57.46832672991433]
We propose a novel equation discovery method based on Kernel learning and BAyesian Spike-and-Slab priors (KBASS) We use kernel regression to estimate the target function, which is flexible, expressive, and more robust to data sparsity and noises. We develop an expectation-propagation expectation-maximization algorithm for efficient posterior inference and function estimation.
arXiv Detail & Related papers (2023-10-09T03:55:09Z)
Learning the Positions in CountSketch [49.57951567374372]
We consider sketching algorithms which first compress data by multiplication with a random sketch matrix, and then apply the sketch to quickly solve an optimization problem. In this work, we propose the first learning-based algorithms that also optimize the locations of the non-zero entries.
arXiv Detail & Related papers (2023-06-11T07:28:35Z)
Robust Meta-learning with Sampling Noise and Label Noise via Eigen-Reptile [78.1212767880785]
meta-learner is prone to overfitting since there are only a few available samples. When handling the data with noisy labels, the meta-learner could be extremely sensitive to label noise. We present Eigen-Reptile (ER) that updates the meta- parameters with the main direction of historical task-specific parameters.
arXiv Detail & Related papers (2022-06-04T08:48:02Z)
Gradient Descent, Stochastic Optimization, and Other Tales [8.034728173797953]
This tutorial doesn't shy away from addressing both the formal and informal aspects of gradient descent and optimization methods. Gradient descent is one of the most popular algorithms to perform optimization and by far the most common way to optimize machine learning tasks. In deep neural networks, the gradient followed by a single sample or a batch of samples is employed to save computational resources and escape from saddle points.
arXiv Detail & Related papers (2022-05-02T12:06:53Z)
Evolving Reinforcement Learning Algorithms [186.62294652057062]
We propose a method for meta-learning reinforcement learning algorithms. The learned algorithms are domain-agnostic and can generalize to new environments not seen during training. We highlight two learned algorithms which obtain good generalization performance over other classical control tasks, gridworld type tasks, and Atari games.
arXiv Detail & Related papers (2021-01-08T18:55:07Z)
ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning [91.13797346047984]
We introduce ADAHESSIAN, a second order optimization algorithm which dynamically incorporates the curvature of the loss function via ADAptive estimates. We show that ADAHESSIAN achieves new state-of-the-art results by a large margin as compared to other adaptive optimization methods.
arXiv Detail & Related papers (2020-06-01T05:00:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.