Related papers: SADAM: Stochastic Adam, A Stochastic Operator for First-Order Gradient-based Optimizer

SADAM: Stochastic Adam, A Stochastic Operator for First-Order Gradient-based Optimizer

URL: http://arxiv.org/abs/2205.10247v1
Date: Fri, 20 May 2022 15:20:19 GMT
Title: SADAM: Stochastic Adam, A Stochastic Operator for First-Order Gradient-based Optimizer
Authors: Wei Zhang, Yu Bao
Abstract summary: We propose, analyze, and generalize a strategy performed as an operator for a first-order descent algorithm. Unlike existing algorithms, the proposed strategy does not require any batches and sampling techniques.
Score: 8.93274096260726
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this work, to efficiently help escape the stationary and saddle points, we propose, analyze, and generalize a stochastic strategy performed as an operator for a first-order gradient descent algorithm in order to increase the target accuracy and reduce time consumption. Unlike existing algorithms, the proposed stochastic the strategy does not require any batches and sampling techniques, enabling efficient implementation and maintaining the initial first-order optimizer's convergence rate, but provides an incomparable improvement of target accuracy when optimizing the target functions. In short, the proposed strategy is generalized, applied to Adam, and validated via the decomposition of biomedical signals using Deep Matrix Fitting and another four peer optimizers. The validation results show that the proposed random strategy can be easily generalized for first-order optimizers and efficiently improve the target accuracy.

Related papers

Revisiting the Initial Steps in Adaptive Gradient Descent Optimization [6.468625143772815]
Adaptive gradient optimization methods, such as Adam, are prevalent in training deep neural networks across diverse machine learning tasks. These methods often suffer from suboptimal generalization compared to descent gradient (SGD) and exhibit instability. We introduce simple yet effective solutions: initializing the second-order moment estimation with non-zero values.
arXiv Detail & Related papers (2024-12-03T04:28:14Z)
Beyond Single-Model Views for Deep Learning: Optimization versus Generalizability of Stochastic Optimization Algorithms [13.134564730161983]
This paper adopts a novel approach to deep learning optimization, focusing on gradient descent (SGD) and its variants. We show that SGD and its variants demonstrate performance on par with flat-minimas like SAM, albeit with half the gradient evaluations. Our study uncovers several key findings regarding the relationship between training loss and hold-out accuracy, as well as the comparable performance of SGD and noise-enabled variants.
arXiv Detail & Related papers (2024-03-01T14:55:22Z)
Enhancing Gaussian Process Surrogates for Optimization and Posterior Approximation via Random Exploration [2.984929040246293]
novel noise-free Bayesian optimization strategies that rely on a random exploration step to enhance the accuracy of Gaussian process surrogate models. New algorithms retain the ease of implementation of the classical GP-UCB, but an additional exploration step facilitates their convergence.
arXiv Detail & Related papers (2024-01-30T14:16:06Z)
Exploring the Algorithm-Dependent Generalization of AUPRC Optimization with List Stability [107.65337427333064]
optimization of the Area Under the Precision-Recall Curve (AUPRC) is a crucial problem for machine learning. In this work, we present the first trial in the single-dependent generalization of AUPRC optimization. Experiments on three image retrieval datasets on speak to the effectiveness and soundness of our framework.
arXiv Detail & Related papers (2022-09-27T09:06:37Z)
Optimistic Optimisation of Composite Objective with Exponentiated Update [2.1700203922407493]
The algorithms can be interpreted as the combination of the exponentiated gradient and $p$-norm algorithm. They achieve a sequence-dependent regret upper bound, matching the best-known bounds for sparse target decision variables.
arXiv Detail & Related papers (2022-08-08T11:29:55Z)
Outlier-Robust Sparse Estimation via Non-Convex Optimization [73.18654719887205]
We explore the connection between high-dimensional statistics and non-robust optimization in the presence of sparsity constraints. We develop novel and simple optimization formulations for these problems. As a corollary, we obtain that any first-order method that efficiently converges to station yields an efficient algorithm for these tasks.
arXiv Detail & Related papers (2021-09-23T17:38:24Z)
Momentum Accelerates the Convergence of Stochastic AUPRC Maximization [80.8226518642952]
We study optimization of areas under precision-recall curves (AUPRC), which is widely used for imbalanced tasks. We develop novel momentum methods with a better iteration of $O (1/epsilon4)$ for finding an $epsilon$stationary solution. We also design a novel family of adaptive methods with the same complexity of $O (1/epsilon4)$, which enjoy faster convergence in practice.
arXiv Detail & Related papers (2021-07-02T16:21:52Z)
A Stochastic Sequential Quadratic Optimization Algorithm for Nonlinear Equality Constrained Optimization with Rank-Deficient Jacobians [11.03311584463036]
A sequential quadratic optimization algorithm is proposed for solving smooth nonlinear equality constrained optimization problems. Results of numerical experiments demonstrate that the algorithm offers superior performance when compared to popular alternatives.
arXiv Detail & Related papers (2021-06-24T13:46:52Z)
Bilevel Optimization: Convergence Analysis and Enhanced Design [63.64636047748605]
Bilevel optimization is a tool for many machine learning problems. We propose a novel stoc-efficientgradient estimator named stoc-BiO.
arXiv Detail & Related papers (2020-10-15T18:09:48Z)
Projection-Free Adaptive Gradients for Large-Scale Optimization [22.0439695290991]
Frank-Wolfe algorithms occupy a unique position as they alleviate both computational burdens by querying only approximate first-order information from the objective. We show that our method can improve the performance of adaptive algorithms for constrained optimization.
arXiv Detail & Related papers (2020-09-29T15:56:12Z)
Convergence of adaptive algorithms for weakly convex constrained optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope. Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z)
Stochastic batch size for adaptive regularization in deep network optimization [63.68104397173262]
We propose a first-order optimization algorithm incorporating adaptive regularization applicable to machine learning problems in deep learning framework. We empirically demonstrate the effectiveness of our algorithm using an image classification task based on conventional network models applied to commonly used benchmark datasets.
arXiv Detail & Related papers (2020-04-14T07:54:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.