A Comparison of Optimization Algorithms for Deep Learning
- URL: http://arxiv.org/abs/2007.14166v1
- Date: Tue, 28 Jul 2020 12:42:28 GMT
- Title: A Comparison of Optimization Algorithms for Deep Learning
- Authors: Derya Soydaner
- Abstract summary: In this study, widely used optimization algorithms for deep learning are examined in detail.
To this end, these algorithms called adaptive gradient methods are implemented for both supervised and unsupervised tasks.
The behaviour of the algorithms during training and results on four image datasets are compared.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, we have witnessed the rise of deep learning. Deep neural
networks have proved their success in many areas. However, the optimization of
these networks has become more difficult as neural networks going deeper and
datasets becoming bigger. Therefore, more advanced optimization algorithms have
been proposed over the past years. In this study, widely used optimization
algorithms for deep learning are examined in detail. To this end, these
algorithms called adaptive gradient methods are implemented for both supervised
and unsupervised tasks. The behaviour of the algorithms during training and
results on four image datasets, namely, MNIST, CIFAR-10, Kaggle Flowers and
Labeled Faces in the Wild are compared by pointing out their differences
against basic optimization algorithms.
Related papers
- The Cascaded Forward Algorithm for Neural Network Training [61.06444586991505]
We propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF.
Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples.
In our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems.
arXiv Detail & Related papers (2023-03-17T02:01:11Z) - Optimization Methods in Deep Learning: A Comprehensive Overview [0.0]
Deep learning has achieved remarkable success in various fields such as image recognition, natural language processing, and speech recognition.
The effectiveness of deep learning largely depends on the optimization methods used to train deep neural networks.
We provide an overview of first-order optimization methods such as Gradient Descent, Adagrad, Adadelta, and RMSprop, as well as recent momentum-based and adaptive gradient methods such as Nesterov accelerated gradient, Adam, Nadam, AdaMax, and AMSGrad.
arXiv Detail & Related papers (2023-02-19T13:01:53Z) - Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting.
We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z) - Learning with Differentiable Algorithms [6.47243430672461]
This thesis explores combining classic algorithms and machine learning systems like neural networks.
The thesis formalizes the idea of algorithmic supervision, which allows a neural network to learn from or in conjunction with an algorithm.
In addition, this thesis proposes differentiable algorithms, such as differentiable sorting networks, differentiable sorting gates, and differentiable logic gate networks.
arXiv Detail & Related papers (2022-09-01T17:30:00Z) - On the Convergence of Distributed Stochastic Bilevel Optimization
Algorithms over a Network [55.56019538079826]
Bilevel optimization has been applied to a wide variety of machine learning models.
Most existing algorithms restrict their single-machine setting so that they are incapable of handling distributed data.
We develop novel decentralized bilevel optimization algorithms based on a gradient tracking communication mechanism and two different gradients.
arXiv Detail & Related papers (2022-06-30T05:29:52Z) - Training Neural Networks using SAT solvers [1.0152838128195465]
We propose an algorithm to explore the global optimisation method, using SAT solvers, for training a neural net.
In the experiments, we demonstrate the effectiveness of our algorithm against the ADAM optimiser in certain tasks like parity learning.
arXiv Detail & Related papers (2022-06-10T01:31:12Z) - Analytically Tractable Inference in Deep Neural Networks [0.0]
Tractable Approximate Inference (TAGI) algorithm was shown to be a viable and scalable alternative to backpropagation for shallow fully-connected neural networks.
We are demonstrating how TAGI matches or exceeds the performance of backpropagation, for training classic deep neural network architectures.
arXiv Detail & Related papers (2021-03-09T14:51:34Z) - Evolving Reinforcement Learning Algorithms [186.62294652057062]
We propose a method for meta-learning reinforcement learning algorithms.
The learned algorithms are domain-agnostic and can generalize to new environments not seen during training.
We highlight two learned algorithms which obtain good generalization performance over other classical control tasks, gridworld type tasks, and Atari games.
arXiv Detail & Related papers (2021-01-08T18:55:07Z) - Towards Understanding the Behaviors of Optimal Deep Active Learning
Algorithms [19.65665942630067]
Active learning (AL) algorithms may achieve better performance with fewer data because the model guides the data selection process.
There is little study on what the optimal AL looks like, which would help researchers understand where their models fall short.
We present a simulated annealing algorithm to search for this optimal oracle and analyze it for several tasks.
arXiv Detail & Related papers (2020-12-29T22:56:42Z) - Learning to Stop While Learning to Predict [85.7136203122784]
Many algorithm-inspired deep models are restricted to a fixed-depth'' for all inputs.
Similar to algorithms, the optimal depth of a deep architecture may be different for different input instances.
In this paper, we tackle this varying depth problem using a steerable architecture.
We show that the learned deep model along with the stopping policy improves the performances on a diverse set of tasks.
arXiv Detail & Related papers (2020-06-09T07:22:01Z) - Large Batch Training Does Not Need Warmup [111.07680619360528]
Training deep neural networks using a large batch size has shown promising results and benefits many real-world applications.
In this paper, we propose a novel Complete Layer-wise Adaptive Rate Scaling (CLARS) algorithm for large-batch training.
Based on our analysis, we bridge the gap and illustrate the theoretical insights for three popular large-batch training techniques.
arXiv Detail & Related papers (2020-02-04T23:03:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.