TAdam: A Robust Stochastic Gradient Optimizer
- URL: http://arxiv.org/abs/2003.00179v2
- Date: Tue, 3 Mar 2020 03:50:48 GMT
- Title: TAdam: A Robust Stochastic Gradient Optimizer
- Authors: Wendyam Eric Lionel Ilboudo, Taisuke Kobayashi, and Kenji Sugimoto
- Abstract summary: Machine learning algorithms aim to find patterns from observations, which may include some noise, especially in robotics domain.
To perform well even with such noise, we expect them to be able to detect outliers and discard them when needed.
We propose a new gradient optimization method, whose robustness is directly built in the algorithm, using the robust student-t distribution as its core idea.
- Score: 6.973803123972298
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning algorithms aim to find patterns from observations, which may
include some noise, especially in robotics domain. To perform well even with
such noise, we expect them to be able to detect outliers and discard them when
needed. We therefore propose a new stochastic gradient optimization method,
whose robustness is directly built in the algorithm, using the robust student-t
distribution as its core idea. Adam, the popular optimization method, is
modified with our method and the resultant optimizer, so-called TAdam, is shown
to effectively outperform Adam in terms of robustness against noise on diverse
task, ranging from regression and classification to reinforcement learning
problems. The implementation of our algorithm can be found at
https://github.com/Mahoumaru/TAdam.git
Related papers
- Pivotal Auto-Encoder via Self-Normalizing ReLU [20.76999663290342]
We formalize single hidden layer sparse auto-encoders as a transform learning problem.
We propose an optimization problem that leads to a predictive model invariant to the noise level at test time.
Our experimental results demonstrate that the trained models yield a significant improvement in stability against varying types of noise.
arXiv Detail & Related papers (2024-06-23T09:06:52Z) - On Convergence of Adam for Stochastic Optimization under Relaxed
Assumptions [4.9495085874952895]
Adaptive Momentum Estimation (Adam) algorithm is highly effective in various deep learning tasks.
We show that Adam can find a stationary point variance with a rate in high iterations under this general noise model.
arXiv Detail & Related papers (2024-02-06T13:19:26Z) - Semi-Bandit Learning for Monotone Stochastic Optimization [20.776114616154242]
We provide a generic online learning algorithm for a class of "monotone" problems.
Our framework applies to several fundamental problems in optimization such as prophet, Pandora's box knapsack, inequality matchings and submodular optimization.
arXiv Detail & Related papers (2023-12-24T07:46:37Z) - StochGradAdam: Accelerating Neural Networks Training with Stochastic Gradient Sampling [0.0]
We introduce StochGradAdam, a novel extension of the Adam algorithm, incorporating gradient sampling techniques.
StochGradAdam achieves comparable or superior performance to Adam, even when using fewer gradient updates per iteration.
The results suggest that this approach is particularly effective for large-scale models and datasets.
arXiv Detail & Related papers (2023-10-25T22:45:31Z) - Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels [57.46832672991433]
We propose a novel equation discovery method based on Kernel learning and BAyesian Spike-and-Slab priors (KBASS)
We use kernel regression to estimate the target function, which is flexible, expressive, and more robust to data sparsity and noises.
We develop an expectation-propagation expectation-maximization algorithm for efficient posterior inference and function estimation.
arXiv Detail & Related papers (2023-10-09T03:55:09Z) - Learning the Positions in CountSketch [49.57951567374372]
We consider sketching algorithms which first compress data by multiplication with a random sketch matrix, and then apply the sketch to quickly solve an optimization problem.
In this work, we propose the first learning-based algorithms that also optimize the locations of the non-zero entries.
arXiv Detail & Related papers (2023-06-11T07:28:35Z) - Robust Meta-learning with Sampling Noise and Label Noise via
Eigen-Reptile [78.1212767880785]
meta-learner is prone to overfitting since there are only a few available samples.
When handling the data with noisy labels, the meta-learner could be extremely sensitive to label noise.
We present Eigen-Reptile (ER) that updates the meta- parameters with the main direction of historical task-specific parameters.
arXiv Detail & Related papers (2022-06-04T08:48:02Z) - Gradient Descent, Stochastic Optimization, and Other Tales [8.034728173797953]
This tutorial doesn't shy away from addressing both the formal and informal aspects of gradient descent and optimization methods.
Gradient descent is one of the most popular algorithms to perform optimization and by far the most common way to optimize machine learning tasks.
In deep neural networks, the gradient followed by a single sample or a batch of samples is employed to save computational resources and escape from saddle points.
arXiv Detail & Related papers (2022-05-02T12:06:53Z) - Evolving Reinforcement Learning Algorithms [186.62294652057062]
We propose a method for meta-learning reinforcement learning algorithms.
The learned algorithms are domain-agnostic and can generalize to new environments not seen during training.
We highlight two learned algorithms which obtain good generalization performance over other classical control tasks, gridworld type tasks, and Atari games.
arXiv Detail & Related papers (2021-01-08T18:55:07Z) - ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning [91.13797346047984]
We introduce ADAHESSIAN, a second order optimization algorithm which dynamically incorporates the curvature of the loss function via ADAptive estimates.
We show that ADAHESSIAN achieves new state-of-the-art results by a large margin as compared to other adaptive optimization methods.
arXiv Detail & Related papers (2020-06-01T05:00:51Z) - Meta-learning with Stochastic Linear Bandits [120.43000970418939]
We consider a class of bandit algorithms that implement a regularized version of the well-known OFUL algorithm, where the regularization is a square euclidean distance to a bias vector.
We show both theoretically and experimentally, that when the number of tasks grows and the variance of the task-distribution is small, our strategies have a significant advantage over learning the tasks in isolation.
arXiv Detail & Related papers (2020-05-18T08:41:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.