Finite-Sum Optimization: A New Perspective for Convergence to a Global
Solution
- URL: http://arxiv.org/abs/2202.03524v1
- Date: Mon, 7 Feb 2022 21:23:16 GMT
- Title: Finite-Sum Optimization: A New Perspective for Convergence to a Global
Solution
- Authors: Lam M. Nguyen, Trang H. Tran, Marten van Dijk
- Abstract summary: Deep neural networks (DNNs) have shown great success in many machine learning tasks.
Their training is challenging since the loss surface is generally non-smooth or even bounded.
We propose an algorithmic framework allowing a minimization problem allowing convergence to an $varepsilon$-(global) minimum.
- Score: 22.016345507132808
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks (DNNs) have shown great success in many machine learning
tasks. Their training is challenging since the loss surface of the network
architecture is generally non-convex, or even non-smooth. How and under what
assumptions is guaranteed convergence to a \textit{global} minimum possible? We
propose a reformulation of the minimization problem allowing for a new
recursive algorithmic framework. By using bounded style assumptions, we prove
convergence to an $\varepsilon$-(global) minimum using
$\mathcal{\tilde{O}}(1/\varepsilon^3)$ gradient computations. Our theoretical
foundation motivates further study, implementation, and optimization of the new
algorithmic framework and further investigation of its non-standard bounded
style assumptions. This new direction broadens our understanding of why and
under what circumstances training of a DNN converges to a global minimum.
Related papers
- Stable Minima Cannot Overfit in Univariate ReLU Networks: Generalization by Large Step Sizes [29.466981306355066]
We show that gradient descent with a fixed learning rate $eta$ can only find local minima that represent smooth functions.
We also prove a nearly-optimal MSE bound of $widetildeO(n-4/5)$ within the strict interior of the support of the $n$ data points.
arXiv Detail & Related papers (2024-06-10T22:57:27Z) - Decentralized Riemannian Algorithm for Nonconvex Minimax Problems [82.50374560598493]
The minimax algorithms for neural networks have been developed to solve many problems.
In this paper, we propose two types of minimax algorithms.
For the setting, we propose DRSGDA and prove that our method achieves a gradient.
arXiv Detail & Related papers (2023-02-08T01:42:45Z) - When Expressivity Meets Trainability: Fewer than $n$ Neurons Can Work [59.29606307518154]
We show that as long as the width $m geq 2n/d$ (where $d$ is the input dimension), its expressivity is strong, i.e., there exists at least one global minimizer with zero training loss.
We also consider a constrained optimization formulation where the feasible region is the nice local region, and prove that every KKT point is a nearly global minimizer.
arXiv Detail & Related papers (2022-10-21T14:41:26Z) - An Inexact Augmented Lagrangian Algorithm for Training Leaky ReLU Neural
Network with Group Sparsity [13.27709100571336]
A leaky ReLU network with a group regularization term has been widely used in the recent years.
We show that there is a lack of approaches to compute a stationary point deterministically.
We propose an inexact augmented Lagrangian algorithm for solving the new model.
arXiv Detail & Related papers (2022-05-11T11:53:15Z) - DASHA: Distributed Nonconvex Optimization with Communication
Compression, Optimal Oracle Complexity, and No Client Synchronization [77.34726150561087]
We develop and analyze DASHA: a new family of methods for noneps distributed optimization problems.
Unlike MARINA, the new methods DASHA, DASHA-MVR send compressed vectors only and never synchronize the nodes, which makes them more practical for learning.
arXiv Detail & Related papers (2022-02-02T20:10:40Z) - Path Regularization: A Convexity and Sparsity Inducing Regularization
for Parallel ReLU Networks [75.33431791218302]
We study the training problem of deep neural networks and introduce an analytic approach to unveil hidden convexity in the optimization landscape.
We consider a deep parallel ReLU network architecture, which also includes standard deep networks and ResNets as its special cases.
arXiv Detail & Related papers (2021-10-18T18:00:36Z) - A global convergence theory for deep ReLU implicit networks via
over-parameterization [26.19122384935622]
Implicit deep learning has received increasing attention recently.
This paper analyzes the gradient flow of Rectified Linear Unit (ReLU) activated implicit neural networks.
arXiv Detail & Related papers (2021-10-11T23:22:50Z) - Efficient Methods for Structured Nonconvex-Nonconcave Min-Max
Optimization [98.0595480384208]
We propose a generalization extraient spaces which converges to a stationary point.
The algorithm applies not only to general $p$-normed spaces, but also to general $p$-dimensional vector spaces.
arXiv Detail & Related papers (2020-10-31T21:35:42Z) - Gradient Free Minimax Optimization: Variance Reduction and Faster
Convergence [120.9336529957224]
In this paper, we denote the non-strongly setting on the magnitude of a gradient-free minimax optimization problem.
We show that a novel zeroth-order variance reduced descent algorithm achieves the best known query complexity.
arXiv Detail & Related papers (2020-06-16T17:55:46Z) - Why Learning of Large-Scale Neural Networks Behaves Like Convex
Optimization [6.852561400929072]
We present some theoretical work to explain why simple gradient descent methods are so successful in solving non-scale optimization problems.
We prove that the objective functions in learning NNs are convex in canonical model space.
arXiv Detail & Related papers (2019-03-06T02:21:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.