AdaSwarm: Augmenting Gradient-Based optimizers in Deep Learning with
Swarm Intelligence
- URL: http://arxiv.org/abs/2006.09875v5
- Date: Wed, 19 May 2021 07:47:55 GMT
- Title: AdaSwarm: Augmenting Gradient-Based optimizers in Deep Learning with
Swarm Intelligence
- Authors: Rohan Mohapatra, Snehanshu Saha, Carlos A. Coello Coello, Anwesh
Bhattacharya, Soma S. Dhavala and Sriparna Saha
- Abstract summary: This paper introduces AdaS, a gradient-free Mathematical Mathematical which has similar or even better performance than the Adamwarm adopted in neural networks.
We show that, the gradient of any function, differentiable or not, can be approximated by using the parameters of EMPSO.
We also show that AdaS is able to handle a variety of loss proofs during backpropagation, including the maximum absolute error (MAE)
- Score: 19.573380763700715
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper introduces AdaSwarm, a novel gradient-free optimizer which has
similar or even better performance than the Adam optimizer adopted in neural
networks. In order to support our proposed AdaSwarm, a novel Exponentially
weighted Momentum Particle Swarm Optimizer (EMPSO), is proposed. The ability of
AdaSwarm to tackle optimization problems is attributed to its capability to
perform good gradient approximations. We show that, the gradient of any
function, differentiable or not, can be approximated by using the parameters of
EMPSO. This is a novel technique to simulate GD which lies at the boundary
between numerical methods and swarm intelligence. Mathematical proofs of the
gradient approximation produced are also provided. AdaSwarm competes closely
with several state-of-the-art (SOTA) optimizers. We also show that AdaSwarm is
able to handle a variety of loss functions during backpropagation, including
the maximum absolute error (MAE).
Related papers
- ELRA: Exponential learning rate adaption gradient descent optimization
method [83.88591755871734]
We present a novel, fast (exponential rate), ab initio (hyper-free) gradient based adaption.
The main idea of the method is to adapt the $alpha by situational awareness.
It can be applied to problems of any dimensions n and scales only linearly.
arXiv Detail & Related papers (2023-09-12T14:36:13Z) - Stochastic Marginal Likelihood Gradients using Neural Tangent Kernels [78.6096486885658]
We introduce lower bounds to the linearized Laplace approximation of the marginal likelihood.
These bounds are amenable togradient-based optimization and allow to trade off estimation accuracy against computational complexity.
arXiv Detail & Related papers (2023-06-06T19:02:57Z) - Reducing the Variance of Gaussian Process Hyperparameter Optimization
with Preconditioning [54.01682318834995]
Preconditioning is a highly effective step for any iterative method involving matrix-vector multiplication.
We prove that preconditioning has an additional benefit that has been previously unexplored.
It simultaneously can reduce variance at essentially negligible cost.
arXiv Detail & Related papers (2021-07-01T06:43:11Z) - Implicit differentiation for fast hyperparameter selection in non-smooth
convex learning [87.60600646105696]
We study first-order methods when the inner optimization problem is convex but non-smooth.
We show that the forward-mode differentiation of proximal gradient descent and proximal coordinate descent yield sequences of Jacobians converging toward the exact Jacobian.
arXiv Detail & Related papers (2021-05-04T17:31:28Z) - A Swarm Variant for the Schr\"odinger Solver [0.0]
This paper introduces application of the Exponentially Averaged Momentum Particle Swarm Optimization (EM-PSO) as a derivative-free derivative for Neural Networks.
It adopts PSO's major advantages such as search space exploration and higher robustness endowed to local minima compared to gradient-descents such as Adam.
arXiv Detail & Related papers (2021-04-10T15:51:36Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - Hyper-parameter estimation method with particle swarm optimization [0.8883733362171032]
The PSO method cannot be directly used in the problem of hyper- parameters estimation.
The proposed method uses the swarm method to optimize the performance of the acquisition function.
The results on several problems are improved.
arXiv Detail & Related papers (2020-11-24T07:51:51Z) - CoolMomentum: A Method for Stochastic Optimization by Langevin Dynamics
with Simulated Annealing [23.87373187143897]
Deep learning applications require global optimization of non- objective functions, which have multiple local minima.
We show that a coordinate learning algorithm can be used to resolve the same problem in physical simulations.
arXiv Detail & Related papers (2020-05-29T14:44:24Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.