Related papers: AdaSwarm: Augmenting Gradient-Based optimizers in Deep Learning with Swarm Intelligence

AdaSwarm: Augmenting Gradient-Based optimizers in Deep Learning with Swarm Intelligence

URL: http://arxiv.org/abs/2006.09875v5
Date: Wed, 19 May 2021 07:47:55 GMT
Title: AdaSwarm: Augmenting Gradient-Based optimizers in Deep Learning with Swarm Intelligence
Authors: Rohan Mohapatra, Snehanshu Saha, Carlos A. Coello Coello, Anwesh Bhattacharya, Soma S. Dhavala and Sriparna Saha
Abstract summary: This paper introduces AdaS, a gradient-free Mathematical Mathematical which has similar or even better performance than the Adamwarm adopted in neural networks. We show that, the gradient of any function, differentiable or not, can be approximated by using the parameters of EMPSO. We also show that AdaS is able to handle a variety of loss proofs during backpropagation, including the maximum absolute error (MAE)
Score: 19.573380763700715
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper introduces AdaSwarm, a novel gradient-free optimizer which has similar or even better performance than the Adam optimizer adopted in neural networks. In order to support our proposed AdaSwarm, a novel Exponentially weighted Momentum Particle Swarm Optimizer (EMPSO), is proposed. The ability of AdaSwarm to tackle optimization problems is attributed to its capability to perform good gradient approximations. We show that, the gradient of any function, differentiable or not, can be approximated by using the parameters of EMPSO. This is a novel technique to simulate GD which lies at the boundary between numerical methods and swarm intelligence. Mathematical proofs of the gradient approximation produced are also provided. AdaSwarm competes closely with several state-of-the-art (SOTA) optimizers. We also show that AdaSwarm is able to handle a variety of loss functions during backpropagation, including the maximum absolute error (MAE).

Related papers

Efficient Neural SDE Training using Wiener-Space Cubature [13.440621354486906]
We introduce a novel training technique which bypasses and improves upon Monte-Carlo simulation. We extend results in the theory of Wiener-space cubature to approximate the expected objective functional by a weighted sum of deterministic ODE solutions.
arXiv Detail & Related papers (2025-02-18T00:06:40Z)
ELRA: Exponential learning rate adaption gradient descent optimization method [83.88591755871734]
We present a novel, fast (exponential rate), ab initio (hyper-free) gradient based adaption. The main idea of the method is to adapt the $alpha by situational awareness. It can be applied to problems of any dimensions n and scales only linearly.
arXiv Detail & Related papers (2023-09-12T14:36:13Z)
Stochastic Marginal Likelihood Gradients using Neural Tangent Kernels [78.6096486885658]
We introduce lower bounds to the linearized Laplace approximation of the marginal likelihood. These bounds are amenable togradient-based optimization and allow to trade off estimation accuracy against computational complexity.
arXiv Detail & Related papers (2023-06-06T19:02:57Z)
Reducing the Variance of Gaussian Process Hyperparameter Optimization with Preconditioning [54.01682318834995]
Preconditioning is a highly effective step for any iterative method involving matrix-vector multiplication. We prove that preconditioning has an additional benefit that has been previously unexplored. It simultaneously can reduce variance at essentially negligible cost.
arXiv Detail & Related papers (2021-07-01T06:43:11Z)
Implicit differentiation for fast hyperparameter selection in non-smooth convex learning [87.60600646105696]
We study first-order methods when the inner optimization problem is convex but non-smooth. We show that the forward-mode differentiation of proximal gradient descent and proximal coordinate descent yield sequences of Jacobians converging toward the exact Jacobian.
arXiv Detail & Related papers (2021-05-04T17:31:28Z)
A Swarm Variant for the Schr\"odinger Solver [0.0]
This paper introduces application of the Exponentially Averaged Momentum Particle Swarm Optimization (EM-PSO) as a derivative-free derivative for Neural Networks. It adopts PSO's major advantages such as search space exploration and higher robustness endowed to local minima compared to gradient-descents such as Adam.
arXiv Detail & Related papers (2021-04-10T15:51:36Z)
Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information. We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z)
Hyper-parameter estimation method with particle swarm optimization [0.8883733362171032]
The PSO method cannot be directly used in the problem of hyper- parameters estimation. The proposed method uses the swarm method to optimize the performance of the acquisition function. The results on several problems are improved.
arXiv Detail & Related papers (2020-11-24T07:51:51Z)
CoolMomentum: A Method for Stochastic Optimization by Langevin Dynamics with Simulated Annealing [23.87373187143897]
Deep learning applications require global optimization of non- objective functions, which have multiple local minima. We show that a coordinate learning algorithm can be used to resolve the same problem in physical simulations.
arXiv Detail & Related papers (2020-05-29T14:44:24Z)
Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks. In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems. Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.