An Elementary Approach to Convergence Guarantees of Optimization
Algorithms for Deep Networks
- URL: http://arxiv.org/abs/2002.09051v2
- Date: Wed, 30 Dec 2020 02:47:52 GMT
- Title: An Elementary Approach to Convergence Guarantees of Optimization
Algorithms for Deep Networks
- Authors: Vincent Roulet and Zaid Harchaoui
- Abstract summary: We present an approach to obtain convergence guarantees of optimization algorithms for deep networks based on oracle elementary arguments and computations.
We provide a systematic way to compute estimates of the smoothness constants that govern the convergence behavior of first-order optimization algorithms used to train deep networks.
- Score: 2.715884199292287
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present an approach to obtain convergence guarantees of optimization
algorithms for deep networks based on elementary arguments and computations.
The convergence analysis revolves around the analytical and computational
structures of optimization oracles central to the implementation of deep
networks in machine learning software. We provide a systematic way to compute
estimates of the smoothness constants that govern the convergence behavior of
first-order optimization algorithms used to train deep networks. A diverse set
of example components and architectures arising in modern deep networks
intersperse the exposition to illustrate the approach.
Related papers
- Component-based Sketching for Deep ReLU Nets [55.404661149594375]
We develop a sketching scheme based on deep net components for various tasks.
We transform deep net training into a linear empirical risk minimization problem.
We show that the proposed component-based sketching provides almost optimal rates in approximating saturated functions.
arXiv Detail & Related papers (2024-09-21T15:30:43Z) - Lower Bounds and Optimal Algorithms for Non-Smooth Convex Decentralized Optimization over Time-Varying Networks [57.24087627267086]
We consider the task of minimizing the sum of convex functions stored in a decentralized manner across the nodes of a communication network.
Lower bounds on the number of decentralized communications and (sub)gradient computations required to solve the problem have been established.
We develop the first optimal algorithm that matches these lower bounds and offers substantially improved theoretical performance compared to the existing state of the art.
arXiv Detail & Related papers (2024-05-28T10:28:45Z) - Large-scale global optimization of ultra-high dimensional non-convex
landscapes based on generative neural networks [0.0]
We present an algorithm manage ultra-high dimensional optimization.
based on a deep generative network.
We show that our method performs better with fewer function evaluations compared to state-of-the-art algorithm.
arXiv Detail & Related papers (2023-07-09T00:05:59Z) - Optimisation & Generalisation in Networks of Neurons [8.078758339149822]
The goal of this thesis is to develop the optimisation and generalisation theoretic foundations of learning in artificial neural networks.
A new theoretical framework is proposed for deriving architecture-dependent first-order optimisation algorithms.
A new correspondence is proposed between ensembles of networks and individual networks.
arXiv Detail & Related papers (2022-10-18T18:58:40Z) - On the Convergence of Distributed Stochastic Bilevel Optimization
Algorithms over a Network [55.56019538079826]
Bilevel optimization has been applied to a wide variety of machine learning models.
Most existing algorithms restrict their single-machine setting so that they are incapable of handling distributed data.
We develop novel decentralized bilevel optimization algorithms based on a gradient tracking communication mechanism and two different gradients.
arXiv Detail & Related papers (2022-06-30T05:29:52Z) - Neural Combinatorial Optimization: a New Player in the Field [69.23334811890919]
This paper presents a critical analysis on the incorporation of algorithms based on neural networks into the classical optimization framework.
A comprehensive study is carried out to analyse the fundamental aspects of such algorithms, including performance, transferability, computational cost and to larger-sized instances.
arXiv Detail & Related papers (2022-05-03T07:54:56Z) - DESTRESS: Computation-Optimal and Communication-Efficient Decentralized
Nonconvex Finite-Sum Optimization [43.31016937305845]
Internet-of-things, networked sensing, autonomous systems and federated learning call for decentralized algorithms for finite-sum optimizations.
We develop DEcentralized STochastic REcurSive methodDESTRESS for non finite-sum optimization.
Detailed theoretical and numerical comparisons show that DESTRESS improves upon prior decentralized algorithms.
arXiv Detail & Related papers (2021-10-04T03:17:41Z) - Distributed Optimization, Averaging via ADMM, and Network Topology [0.0]
We study the connection between network topology and convergence rates for different algorithms on a real world problem of sensor localization.
We also show interesting connections between ADMM and lifted Markov chains besides providing an explicitly characterization of its convergence.
arXiv Detail & Related papers (2020-09-05T21:44:39Z) - Iterative Algorithm Induced Deep-Unfolding Neural Networks: Precoding
Design for Multiuser MIMO Systems [59.804810122136345]
We propose a framework for deep-unfolding, where a general form of iterative algorithm induced deep-unfolding neural network (IAIDNN) is developed.
An efficient IAIDNN based on the structure of the classic weighted minimum mean-square error (WMMSE) iterative algorithm is developed.
We show that the proposed IAIDNN efficiently achieves the performance of the iterative WMMSE algorithm with reduced computational complexity.
arXiv Detail & Related papers (2020-06-15T02:57:57Z) - Large Batch Training Does Not Need Warmup [111.07680619360528]
Training deep neural networks using a large batch size has shown promising results and benefits many real-world applications.
In this paper, we propose a novel Complete Layer-wise Adaptive Rate Scaling (CLARS) algorithm for large-batch training.
Based on our analysis, we bridge the gap and illustrate the theoretical insights for three popular large-batch training techniques.
arXiv Detail & Related papers (2020-02-04T23:03:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.