The Convex Landscape of Neural Networks: Characterizing Global Optima
and Stationary Points via Lasso Models
- URL: http://arxiv.org/abs/2312.12657v1
- Date: Tue, 19 Dec 2023 23:04:56 GMT
- Title: The Convex Landscape of Neural Networks: Characterizing Global Optima
and Stationary Points via Lasso Models
- Authors: Tolga Ergen, Mert Pilanci
- Abstract summary: Deep Neural Network Network (DNN) models are used for programming purposes.
In this paper we examine the use of convex neural recovery models.
We show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
We also show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
- Score: 75.33431791218302
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Due to the non-convex nature of training Deep Neural Network (DNN) models,
their effectiveness relies on the use of non-convex optimization heuristics.
Traditional methods for training DNNs often require costly empirical methods to
produce successful models and do not have a clear theoretical foundation. In
this study, we examine the use of convex optimization theory and sparse
recovery models to refine the training process of neural networks and provide a
better interpretation of their optimal weights. We focus on training two-layer
neural networks with piecewise linear activations and demonstrate that they can
be formulated as a finite-dimensional convex program. These programs include a
regularization term that promotes sparsity, which constitutes a variant of
group Lasso. We first utilize semi-infinite programming theory to prove strong
duality for finite width neural networks and then we express these
architectures equivalently as high dimensional convex sparse recovery models.
Remarkably, the worst-case complexity to solve the convex program is polynomial
in the number of samples and number of neurons when the rank of the data matrix
is bounded, which is the case in convolutional networks. To extend our method
to training data of arbitrary rank, we develop a novel polynomial-time
approximation scheme based on zonotope subsampling that comes with a guaranteed
approximation ratio. We also show that all the stationary of the nonconvex
training objective can be characterized as the global optimum of a subsampled
convex program. Our convex models can be trained using standard convex solvers
without resorting to heuristics or extensive hyper-parameter tuning unlike
non-convex methods. Through extensive numerical experiments, we show that
convex models can outperform traditional non-convex methods and are not
sensitive to optimizer hyperparameters.
Related papers
- A Library of Mirrors: Deep Neural Nets in Low Dimensions are Convex Lasso Models with Reflection Features [54.83898311047626]
We consider neural networks with piecewise linear activations ranging from 2 to an arbitrary but finite number of layers.
We first show that two-layer networks with piecewise linear activations are Lasso models using a discrete dictionary of ramp depths.
arXiv Detail & Related papers (2024-03-02T00:33:45Z) - GloptiNets: Scalable Non-Convex Optimization with Certificates [61.50835040805378]
We present a novel approach to non-cube optimization with certificates, which handles smooth functions on the hypercube or on the torus.
By exploiting the regularity of the target function intrinsic in the decay of its spectrum, we allow at the same time to obtain precise certificates and leverage the advanced and powerful neural networks.
arXiv Detail & Related papers (2023-06-26T09:42:59Z) - An alternative approach to train neural networks using monotone
variational inequality [22.320632565424745]
We propose an alternative approach to neural network training using the monotone vector field.
Our approach can be used for more efficient fine-tuning of a pre-trained neural network.
arXiv Detail & Related papers (2022-02-17T19:24:20Z) - Fast Convex Optimization for Two-Layer ReLU Networks: Equivalent Model
Classes and Cone Decompositions [41.337814204665364]
We develop algorithms for convex optimization of two-layer neural networks with ReLU activation functions.
We show that convex gated ReLU models obtain data-dependent approximation bounds for the ReLU training problem.
arXiv Detail & Related papers (2022-02-02T23:50:53Z) - Global Optimality Beyond Two Layers: Training Deep ReLU Networks via
Convex Programs [39.799125462526234]
We develop a novel unified framework to reveal a hidden regularization mechanism through the lens of convex optimization.
We numerically validate our theoretical results via experiments involving both synthetic and real datasets.
arXiv Detail & Related papers (2021-10-11T18:00:30Z) - Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve
Optimism, Embrace Virtual Curvature [61.22680308681648]
We show that global convergence is statistically intractable even for one-layer neural net bandit with a deterministic reward.
For both nonlinear bandit and RL, the paper presents a model-based algorithm, Virtual Ascent with Online Model Learner (ViOL)
arXiv Detail & Related papers (2021-02-08T12:41:56Z) - Neural Spectrahedra and Semidefinite Lifts: Global Convex Optimization
of Polynomial Activation Neural Networks in Fully Polynomial-Time [31.94590517036704]
We develop exact convex optimization formulations for two-layer numerical networks with second degree activations.
We show that semidefinite neural and therefore global optimization is in complexity dimension and sample size for all input data.
The proposed approach is significantly faster to obtain better test accuracy compared to the standard backpropagation procedure.
arXiv Detail & Related papers (2021-01-07T08:43:01Z) - The Hidden Convex Optimization Landscape of Two-Layer ReLU Neural
Networks: an Exact Characterization of the Optimal Solutions [51.60996023961886]
We prove that finding all globally optimal two-layer ReLU neural networks can be performed by solving a convex optimization program with cone constraints.
Our analysis is novel, characterizes all optimal solutions, and does not leverage duality-based analysis which was recently used to lift neural network training into convex spaces.
arXiv Detail & Related papers (2020-06-10T15:38:30Z) - Convex Geometry and Duality of Over-parameterized Neural Networks [70.15611146583068]
We develop a convex analytic approach to analyze finite width two-layer ReLU networks.
We show that an optimal solution to the regularized training problem can be characterized as extreme points of a convex set.
In higher dimensions, we show that the training problem can be cast as a finite dimensional convex problem with infinitely many constraints.
arXiv Detail & Related papers (2020-02-25T23:05:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.