Global Optimality Beyond Two Layers: Training Deep ReLU Networks via
Convex Programs
- URL: http://arxiv.org/abs/2110.05518v1
- Date: Mon, 11 Oct 2021 18:00:30 GMT
- Title: Global Optimality Beyond Two Layers: Training Deep ReLU Networks via
Convex Programs
- Authors: Tolga Ergen, Mert Pilanci
- Abstract summary: We develop a novel unified framework to reveal a hidden regularization mechanism through the lens of convex optimization.
We numerically validate our theoretical results via experiments involving both synthetic and real datasets.
- Score: 39.799125462526234
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding the fundamental mechanism behind the success of deep neural
networks is one of the key challenges in the modern machine learning
literature. Despite numerous attempts, a solid theoretical analysis is yet to
be developed. In this paper, we develop a novel unified framework to reveal a
hidden regularization mechanism through the lens of convex optimization. We
first show that the training of multiple three-layer ReLU sub-networks with
weight decay regularization can be equivalently cast as a convex optimization
problem in a higher dimensional space, where sparsity is enforced via a group
$\ell_1$-norm regularization. Consequently, ReLU networks can be interpreted as
high dimensional feature selection methods. More importantly, we then prove
that the equivalent convex problem can be globally optimized by a standard
convex optimization solver with a polynomial-time complexity with respect to
the number of samples and data dimension when the width of the network is
fixed. Finally, we numerically validate our theoretical results via experiments
involving both synthetic and real datasets.
Related papers
- The Convex Landscape of Neural Networks: Characterizing Global Optima
and Stationary Points via Lasso Models [75.33431791218302]
Deep Neural Network Network (DNN) models are used for programming purposes.
In this paper we examine the use of convex neural recovery models.
We show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
We also show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
arXiv Detail & Related papers (2023-12-19T23:04:56Z) - Does a sparse ReLU network training problem always admit an optimum? [0.0]
We show that the existence of an optimal solution is not always guaranteed, especially in the context of sparse ReLU neural networks.
In particular, we first show that optimization problems involving deep networks with certain sparsity patterns do not always have optimal parameters.
arXiv Detail & Related papers (2023-06-05T08:01:50Z) - Fast Convex Optimization for Two-Layer ReLU Networks: Equivalent Model
Classes and Cone Decompositions [41.337814204665364]
We develop algorithms for convex optimization of two-layer neural networks with ReLU activation functions.
We show that convex gated ReLU models obtain data-dependent approximation bounds for the ReLU training problem.
arXiv Detail & Related papers (2022-02-02T23:50:53Z) - Path Regularization: A Convexity and Sparsity Inducing Regularization
for Parallel ReLU Networks [75.33431791218302]
We study the training problem of deep neural networks and introduce an analytic approach to unveil hidden convexity in the optimization landscape.
We consider a deep parallel ReLU network architecture, which also includes standard deep networks and ResNets as its special cases.
arXiv Detail & Related papers (2021-10-18T18:00:36Z) - Neural Spectrahedra and Semidefinite Lifts: Global Convex Optimization
of Polynomial Activation Neural Networks in Fully Polynomial-Time [31.94590517036704]
We develop exact convex optimization formulations for two-layer numerical networks with second degree activations.
We show that semidefinite neural and therefore global optimization is in complexity dimension and sample size for all input data.
The proposed approach is significantly faster to obtain better test accuracy compared to the standard backpropagation procedure.
arXiv Detail & Related papers (2021-01-07T08:43:01Z) - Implicit Convex Regularizers of CNN Architectures: Convex Optimization
of Two- and Three-Layer Networks in Polynomial Time [70.15611146583068]
We study training of Convolutional Neural Networks (CNNs) with ReLU activations.
We introduce exact convex optimization with a complexity with respect to the number of data samples, the number of neurons, and data dimension.
arXiv Detail & Related papers (2020-06-26T04:47:20Z) - The Hidden Convex Optimization Landscape of Two-Layer ReLU Neural
Networks: an Exact Characterization of the Optimal Solutions [51.60996023961886]
We prove that finding all globally optimal two-layer ReLU neural networks can be performed by solving a convex optimization program with cone constraints.
Our analysis is novel, characterizes all optimal solutions, and does not leverage duality-based analysis which was recently used to lift neural network training into convex spaces.
arXiv Detail & Related papers (2020-06-10T15:38:30Z) - Convex Geometry and Duality of Over-parameterized Neural Networks [70.15611146583068]
We develop a convex analytic approach to analyze finite width two-layer ReLU networks.
We show that an optimal solution to the regularized training problem can be characterized as extreme points of a convex set.
In higher dimensions, we show that the training problem can be cast as a finite dimensional convex problem with infinitely many constraints.
arXiv Detail & Related papers (2020-02-25T23:05:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.