Particle Dual Averaging: Optimization of Mean Field Neural Networks with
Global Convergence Rate Analysis
- URL: http://arxiv.org/abs/2012.15477v1
- Date: Thu, 31 Dec 2020 07:07:32 GMT
- Title: Particle Dual Averaging: Optimization of Mean Field Neural Networks with
Global Convergence Rate Analysis
- Authors: Atsushi Nitanda, Denny Wu, Taiji Suzuki
- Abstract summary: We propose the particle dual averaging (PDA) method, which generalizes the dual averaging method in convex optimization.
An important application of the proposed method is the optimization of two-layer neural network in the mean field regime.
We show that neural networks in the mean field limit can be globally optimized by PDA.
- Score: 40.762447301225926
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose the particle dual averaging (PDA) method, which generalizes the
dual averaging method in convex optimization to the optimization over
probability distributions with quantitative runtime guarantee. The algorithm
consists of an inner loop and outer loop: the inner loop utilizes the Langevin
algorithm to approximately solve for a stationary distribution, which is then
optimized in the outer loop. The method can thus be interpreted as an extension
of the Langevin algorithm to naturally handle nonlinear functional on the
probability space. An important application of the proposed method is the
optimization of two-layer neural network in the mean field regime, which is
theoretically attractive due to the presence of nonlinear feature learning, but
quantitative convergence rate can be challenging to establish. We show that
neural networks in the mean field limit can be globally optimized by PDA.
Furthermore, we characterize the convergence rate by leveraging convex
optimization theory in finite-dimensional spaces. Our theoretical results are
supported by numerical simulations on neural networks with reasonable size.
Related papers
- Convergence of Implicit Gradient Descent for Training Two-Layer Physics-Informed Neural Networks [3.680127959836384]
implicit gradient descent (IGD) outperforms the common gradient descent (GD) in handling certain multi-scale problems.
We show that IGD converges a globally optimal solution at a linear convergence rate.
arXiv Detail & Related papers (2024-07-03T06:10:41Z) - The Convex Landscape of Neural Networks: Characterizing Global Optima
and Stationary Points via Lasso Models [75.33431791218302]
Deep Neural Network Network (DNN) models are used for programming purposes.
In this paper we examine the use of convex neural recovery models.
We show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
We also show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
arXiv Detail & Related papers (2023-12-19T23:04:56Z) - Stochastic Bayesian Optimization with Unknown Continuous Context
Distribution via Kernel Density Estimation [28.413085548038932]
We propose two algorithms that employ kernel density estimation to learn the probability density function (PDF) of continuous context variable online.
Theoretical results demonstrate that both algorithms have sub-linear Bayesian cumulative regret on the expectation objective.
arXiv Detail & Related papers (2023-12-16T11:32:28Z) - Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization [73.80101701431103]
The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks.
We study the usefulness of the LLA in Bayesian optimization and highlight its strong performance and flexibility.
arXiv Detail & Related papers (2023-04-17T14:23:43Z) - Fast Computation of Optimal Transport via Entropy-Regularized Extragradient Methods [75.34939761152587]
Efficient computation of the optimal transport distance between two distributions serves as an algorithm that empowers various applications.
This paper develops a scalable first-order optimization-based method that computes optimal transport to within $varepsilon$ additive accuracy.
arXiv Detail & Related papers (2023-01-30T15:46:39Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Sequential Subspace Search for Functional Bayesian Optimization
Incorporating Experimenter Intuition [63.011641517977644]
Our algorithm generates a sequence of finite-dimensional random subspaces of functional space spanned by a set of draws from the experimenter's Gaussian Process.
Standard Bayesian optimisation is applied on each subspace, and the best solution found used as a starting point (origin) for the next subspace.
We test our algorithm in simulated and real-world experiments, namely blind function matching, finding the optimal precipitation-strengthening function for an aluminium alloy, and learning rate schedule optimisation for deep networks.
arXiv Detail & Related papers (2020-09-08T06:54:11Z) - An Outer-approximation Guided Optimization Approach for Constrained
Neural Network Inverse Problems [0.0]
constrained neural network inverse problems refer to an optimization problem to find the best set of input values of a given trained neural network.
This paper analyzes the characteristics of optimal solutions of neural network inverse problems with rectified activation units.
Experiments demonstrate the superiority of the proposed algorithm compared to a projected gradient method.
arXiv Detail & Related papers (2020-02-24T17:49:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.