Convex Geometry and Duality of Over-parameterized Neural Networks
- URL: http://arxiv.org/abs/2002.11219v4
- Date: Tue, 31 Aug 2021 02:13:00 GMT
- Title: Convex Geometry and Duality of Over-parameterized Neural Networks
- Authors: Tolga Ergen, Mert Pilanci
- Abstract summary: We develop a convex analytic approach to analyze finite width two-layer ReLU networks.
We show that an optimal solution to the regularized training problem can be characterized as extreme points of a convex set.
In higher dimensions, we show that the training problem can be cast as a finite dimensional convex problem with infinitely many constraints.
- Score: 70.15611146583068
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We develop a convex analytic approach to analyze finite width two-layer ReLU
networks. We first prove that an optimal solution to the regularized training
problem can be characterized as extreme points of a convex set, where simple
solutions are encouraged via its convex geometrical properties. We then
leverage this characterization to show that an optimal set of parameters yield
linear spline interpolation for regression problems involving one dimensional
or rank-one data. We also characterize the classification decision regions in
terms of a kernel matrix and minimum $\ell_1$-norm solutions. This is in
contrast to Neural Tangent Kernel which is unable to explain predictions of
finite width networks. Our convex geometric characterization also provides
intuitive explanations of hidden neurons as auto-encoders. In higher
dimensions, we show that the training problem can be cast as a finite
dimensional convex problem with infinitely many constraints. Then, we apply
certain convex relaxations and introduce a cutting-plane algorithm to globally
optimize the network. We further analyze the exactness of the relaxations to
provide conditions for the convergence to a global optimum. Our analysis also
shows that optimal network parameters can be also characterized as
interpretable closed-form formulas in some practically relevant special cases.
Related papers
- Convergence of Implicit Gradient Descent for Training Two-Layer Physics-Informed Neural Networks [3.680127959836384]
implicit gradient descent (IGD) outperforms the common gradient descent (GD) in handling certain multi-scale problems.
We show that IGD converges a globally optimal solution at a linear convergence rate.
arXiv Detail & Related papers (2024-07-03T06:10:41Z) - The Convex Landscape of Neural Networks: Characterizing Global Optima
and Stationary Points via Lasso Models [75.33431791218302]
Deep Neural Network Network (DNN) models are used for programming purposes.
In this paper we examine the use of convex neural recovery models.
We show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
We also show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
arXiv Detail & Related papers (2023-12-19T23:04:56Z) - From Complexity to Clarity: Analytical Expressions of Deep Neural Network Weights via Clifford's Geometric Algebra and Convexity [54.01594785269913]
We show that optimal weights of deep ReLU neural networks are given by the wedge product of training samples when trained with standard regularized loss.
The training problem reduces to convex optimization over wedge product features, which encode the geometric structure of the training dataset.
arXiv Detail & Related papers (2023-09-28T15:19:30Z) - Path Regularization: A Convexity and Sparsity Inducing Regularization
for Parallel ReLU Networks [75.33431791218302]
We study the training problem of deep neural networks and introduce an analytic approach to unveil hidden convexity in the optimization landscape.
We consider a deep parallel ReLU network architecture, which also includes standard deep networks and ResNets as its special cases.
arXiv Detail & Related papers (2021-10-18T18:00:36Z) - Global Optimality Beyond Two Layers: Training Deep ReLU Networks via
Convex Programs [39.799125462526234]
We develop a novel unified framework to reveal a hidden regularization mechanism through the lens of convex optimization.
We numerically validate our theoretical results via experiments involving both synthetic and real datasets.
arXiv Detail & Related papers (2021-10-11T18:00:30Z) - The Hidden Convex Optimization Landscape of Two-Layer ReLU Neural
Networks: an Exact Characterization of the Optimal Solutions [51.60996023961886]
We prove that finding all globally optimal two-layer ReLU neural networks can be performed by solving a convex optimization program with cone constraints.
Our analysis is novel, characterizes all optimal solutions, and does not leverage duality-based analysis which was recently used to lift neural network training into convex spaces.
arXiv Detail & Related papers (2020-06-10T15:38:30Z) - Revealing the Structure of Deep Neural Networks via Convex Duality [70.15611146583068]
We study regularized deep neural networks (DNNs) and introduce a convex analytic framework to characterize the structure of hidden layers.
We show that a set of optimal hidden layer weights for a norm regularized training problem can be explicitly found as the extreme points of a convex set.
We apply the same characterization to deep ReLU networks with whitened data and prove the same weight alignment holds.
arXiv Detail & Related papers (2020-02-22T21:13:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.