Related papers: The Hidden Convex Optimization Landscape of Two-Layer ReLU Neural Networks: an Exact Characterization of the Optimal Solutions

The Hidden Convex Optimization Landscape of Two-Layer ReLU Neural Networks: an Exact Characterization of the Optimal Solutions

URL: http://arxiv.org/abs/2006.05900v4
Date: Sun, 13 Mar 2022 18:23:30 GMT
Title: The Hidden Convex Optimization Landscape of Two-Layer ReLU Neural Networks: an Exact Characterization of the Optimal Solutions
Authors: Yifei Wang, Jonathan Lacotte and Mert Pilanci
Abstract summary: We prove that finding all globally optimal two-layer ReLU neural networks can be performed by solving a convex optimization program with cone constraints. Our analysis is novel, characterizes all optimal solutions, and does not leverage duality-based analysis which was recently used to lift neural network training into convex spaces.
Score: 51.60996023961886
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We prove that finding all globally optimal two-layer ReLU neural networks can be performed by solving a convex optimization program with cone constraints. Our analysis is novel, characterizes all optimal solutions, and does not leverage duality-based analysis which was recently used to lift neural network training into convex spaces. Given the set of solutions of our convex optimization program, we show how to construct exactly the entire set of optimal neural networks. We provide a detailed characterization of this optimal set and its invariant transformations. As additional consequences of our convex perspective, (i) we establish that Clarke stationary points found by stochastic gradient descent correspond to the global optimum of a subsampled convex problem (ii) we provide a polynomial-time algorithm for checking if a neural network is a global minimum of the training loss (iii) we provide an explicit construction of a continuous path between any neural network and the global minimum of its sublevel set and (iv) characterize the minimal size of the hidden layer so that the neural network optimization landscape has no spurious valleys. Overall, we provide a rich framework for studying the landscape of neural network training loss through convexity.

Related papers

Exploring the loss landscape of regularized neural networks via convex duality [42.48510370193192]
We discuss several aspects of the loss landscape of regularized neural networks. We first characterize the solution set of the convex problem using its dual and further characterize all stationary points. We show that the solution set characterization and connectivity results can be extended to different architectures.
arXiv Detail & Related papers (2024-11-12T11:41:38Z)
The Convex Landscape of Neural Networks: Characterizing Global Optima and Stationary Points via Lasso Models [75.33431791218302]
Deep Neural Network Network (DNN) models are used for programming purposes. In this paper we examine the use of convex neural recovery models. We show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program. We also show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
arXiv Detail & Related papers (2023-12-19T23:04:56Z)
Fixing the NTK: From Neural Network Linearizations to Exact Convex Programs [63.768739279562105]
We show that for a particular choice of mask weights that do not depend on the learning targets, this kernel is equivalent to the NTK of the gated ReLU network on the training data. A consequence of this lack of dependence on the targets is that the NTK cannot perform better than the optimal MKL kernel on the training set.
arXiv Detail & Related papers (2023-09-26T17:42:52Z)
Does a sparse ReLU network training problem always admit an optimum? [0.0]
We show that the existence of an optimal solution is not always guaranteed, especially in the context of sparse ReLU neural networks. In particular, we first show that optimization problems involving deep networks with certain sparsity patterns do not always have optimal parameters.
arXiv Detail & Related papers (2023-06-05T08:01:50Z)
Optimal Sets and Solution Paths of ReLU Networks [56.40911684005949]
We develop an analytical framework to characterize the set of optimal ReLU networks. We establish conditions for the neuralization of ReLU networks to be continuous, and develop sensitivity results for ReLU networks.
arXiv Detail & Related papers (2023-05-31T18:48:16Z)
Path Regularization: A Convexity and Sparsity Inducing Regularization for Parallel ReLU Networks [75.33431791218302]
We study the training problem of deep neural networks and introduce an analytic approach to unveil hidden convexity in the optimization landscape. We consider a deep parallel ReLU network architecture, which also includes standard deep networks and ResNets as its special cases.
arXiv Detail & Related papers (2021-10-18T18:00:36Z)
Neural Spectrahedra and Semidefinite Lifts: Global Convex Optimization of Polynomial Activation Neural Networks in Fully Polynomial-Time [31.94590517036704]
We develop exact convex optimization formulations for two-layer numerical networks with second degree activations. We show that semidefinite neural and therefore global optimization is in complexity dimension and sample size for all input data. The proposed approach is significantly faster to obtain better test accuracy compared to the standard backpropagation procedure.
arXiv Detail & Related papers (2021-01-07T08:43:01Z)
A Dynamical View on Optimization Algorithms of Overparameterized Neural Networks [23.038631072178735]
We consider a broad class of optimization algorithms that are commonly used in practice. As a consequence, we can leverage the convergence behavior of neural networks. We believe our approach can also be extended to other optimization algorithms and network theory.
arXiv Detail & Related papers (2020-10-25T17:10:22Z)
Convex Geometry and Duality of Over-parameterized Neural Networks [70.15611146583068]
We develop a convex analytic approach to analyze finite width two-layer ReLU networks. We show that an optimal solution to the regularized training problem can be characterized as extreme points of a convex set. In higher dimensions, we show that the training problem can be cast as a finite dimensional convex problem with infinitely many constraints.
arXiv Detail & Related papers (2020-02-25T23:05:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.