Neural Spectrahedra and Semidefinite Lifts: Global Convex Optimization
of Polynomial Activation Neural Networks in Fully Polynomial-Time
- URL: http://arxiv.org/abs/2101.02429v1
- Date: Thu, 7 Jan 2021 08:43:01 GMT
- Title: Neural Spectrahedra and Semidefinite Lifts: Global Convex Optimization
of Polynomial Activation Neural Networks in Fully Polynomial-Time
- Authors: Burak Bartan, Mert Pilanci
- Abstract summary: We develop exact convex optimization formulations for two-layer numerical networks with second degree activations.
We show that semidefinite neural and therefore global optimization is in complexity dimension and sample size for all input data.
The proposed approach is significantly faster to obtain better test accuracy compared to the standard backpropagation procedure.
- Score: 31.94590517036704
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The training of two-layer neural networks with nonlinear activation functions
is an important non-convex optimization problem with numerous applications and
promising performance in layerwise deep learning. In this paper, we develop
exact convex optimization formulations for two-layer neural networks with
second degree polynomial activations based on semidefinite programming.
Remarkably, we show that semidefinite lifting is always exact and therefore
computational complexity for global optimization is polynomial in the input
dimension and sample size for all input data. The developed convex formulations
are proven to achieve the same global optimal solution set as their non-convex
counterparts. More specifically, the globally optimal two-layer neural network
with polynomial activations can be found by solving a semidefinite program
(SDP) and decomposing the solution using a procedure we call Neural
Decomposition. Moreover, the choice of regularizers plays a crucial role in the
computational tractability of neural network training. We show that the
standard weight decay regularization formulation is NP-hard, whereas other
simple convex penalties render the problem tractable in polynomial time via
convex programming. We extend the results beyond the fully connected
architecture to different neural network architectures including networks with
vector outputs and convolutional architectures with pooling. We provide
extensive numerical simulations showing that the standard backpropagation
approach often fails to achieve the global optimum of the training loss. The
proposed approach is significantly faster to obtain better test accuracy
compared to the standard backpropagation procedure.
Related papers
- The Convex Landscape of Neural Networks: Characterizing Global Optima
and Stationary Points via Lasso Models [75.33431791218302]
Deep Neural Network Network (DNN) models are used for programming purposes.
In this paper we examine the use of convex neural recovery models.
We show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
We also show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
arXiv Detail & Related papers (2023-12-19T23:04:56Z) - Optimal Sets and Solution Paths of ReLU Networks [56.40911684005949]
We develop an analytical framework to characterize the set of optimal ReLU networks.
We establish conditions for the neuralization of ReLU networks to be continuous, and develop sensitivity results for ReLU networks.
arXiv Detail & Related papers (2023-05-31T18:48:16Z) - Acceleration techniques for optimization over trained neural network
ensembles [1.0323063834827415]
We study optimization problems where the objective function is modeled through feedforward neural networks with rectified linear unit activation.
We present a mixed-integer linear program based on existing popular big-$M$ formulations for optimizing over a single neural network.
arXiv Detail & Related papers (2021-12-13T20:50:54Z) - Path Regularization: A Convexity and Sparsity Inducing Regularization
for Parallel ReLU Networks [75.33431791218302]
We study the training problem of deep neural networks and introduce an analytic approach to unveil hidden convexity in the optimization landscape.
We consider a deep parallel ReLU network architecture, which also includes standard deep networks and ResNets as its special cases.
arXiv Detail & Related papers (2021-10-18T18:00:36Z) - Global Optimality Beyond Two Layers: Training Deep ReLU Networks via
Convex Programs [39.799125462526234]
We develop a novel unified framework to reveal a hidden regularization mechanism through the lens of convex optimization.
We numerically validate our theoretical results via experiments involving both synthetic and real datasets.
arXiv Detail & Related papers (2021-10-11T18:00:30Z) - Vector-output ReLU Neural Network Problems are Copositive Programs:
Convex Analysis of Two Layer Networks and Polynomial-time Algorithms [29.975118690758126]
We describe the semi-output global dual of the two-layer vector-infinite ReLU neural network training problem.
We provide a solution which is guaranteed to be exact for certain classes of problems.
arXiv Detail & Related papers (2020-12-24T17:03:30Z) - Implicit Convex Regularizers of CNN Architectures: Convex Optimization
of Two- and Three-Layer Networks in Polynomial Time [70.15611146583068]
We study training of Convolutional Neural Networks (CNNs) with ReLU activations.
We introduce exact convex optimization with a complexity with respect to the number of data samples, the number of neurons, and data dimension.
arXiv Detail & Related papers (2020-06-26T04:47:20Z) - The Hidden Convex Optimization Landscape of Two-Layer ReLU Neural
Networks: an Exact Characterization of the Optimal Solutions [51.60996023961886]
We prove that finding all globally optimal two-layer ReLU neural networks can be performed by solving a convex optimization program with cone constraints.
Our analysis is novel, characterizes all optimal solutions, and does not leverage duality-based analysis which was recently used to lift neural network training into convex spaces.
arXiv Detail & Related papers (2020-06-10T15:38:30Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.