Fast Convex Optimization for Two-Layer ReLU Networks: Equivalent Model
Classes and Cone Decompositions
- URL: http://arxiv.org/abs/2202.01331v1
- Date: Wed, 2 Feb 2022 23:50:53 GMT
- Title: Fast Convex Optimization for Two-Layer ReLU Networks: Equivalent Model
Classes and Cone Decompositions
- Authors: Aaron Mishkin, Arda Sahiner, Mert Pilanci
- Abstract summary: We develop algorithms for convex optimization of two-layer neural networks with ReLU activation functions.
We show that convex gated ReLU models obtain data-dependent approximation bounds for the ReLU training problem.
- Score: 41.337814204665364
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We develop fast algorithms and robust software for convex optimization of
two-layer neural networks with ReLU activation functions. Our work leverages a
convex reformulation of the standard weight-decay penalized training problem as
a set of group-$\ell_1$-regularized data-local models, where locality is
enforced by polyhedral cone constraints. In the special case of
zero-regularization, we show that this problem is exactly equivalent to
unconstrained optimization of a convex "gated ReLU" network. For problems with
non-zero regularization, we show that convex gated ReLU models obtain
data-dependent approximation bounds for the ReLU training problem. To optimize
the convex reformulations, we develop an accelerated proximal gradient method
and a practical augmented Lagrangian solver. We show that these approaches are
faster than standard training heuristics for the non-convex problem, such as
SGD, and outperform commercial interior-point solvers. Experimentally, we
verify our theoretical results, explore the group-$\ell_1$ regularization path,
and scale convex optimization for neural networks to image classification on
MNIST and CIFAR-10.
Related papers
- The Convex Landscape of Neural Networks: Characterizing Global Optima
and Stationary Points via Lasso Models [75.33431791218302]
Deep Neural Network Network (DNN) models are used for programming purposes.
In this paper we examine the use of convex neural recovery models.
We show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
We also show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
arXiv Detail & Related papers (2023-12-19T23:04:56Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Fixing the NTK: From Neural Network Linearizations to Exact Convex
Programs [63.768739279562105]
We show that for a particular choice of mask weights that do not depend on the learning targets, this kernel is equivalent to the NTK of the gated ReLU network on the training data.
A consequence of this lack of dependence on the targets is that the NTK cannot perform better than the optimal MKL kernel on the training set.
arXiv Detail & Related papers (2023-09-26T17:42:52Z) - Optimal Sets and Solution Paths of ReLU Networks [56.40911684005949]
We develop an analytical framework to characterize the set of optimal ReLU networks.
We establish conditions for the neuralization of ReLU networks to be continuous, and develop sensitivity results for ReLU networks.
arXiv Detail & Related papers (2023-05-31T18:48:16Z) - An Inexact Augmented Lagrangian Algorithm for Training Leaky ReLU Neural
Network with Group Sparsity [13.27709100571336]
A leaky ReLU network with a group regularization term has been widely used in the recent years.
We show that there is a lack of approaches to compute a stationary point deterministically.
We propose an inexact augmented Lagrangian algorithm for solving the new model.
arXiv Detail & Related papers (2022-05-11T11:53:15Z) - Efficient Global Optimization of Two-layer ReLU Networks: Quadratic-time
Algorithms and Adversarial Training [12.354076490479516]
We develop two efficient algorithms that train ANNs with global convergence guarantees.
The first algorithm is based on the alternating method multiplier (ADMM)
The second algorithm, based on the "sampled convex programs" theory, is simpler to implement.
arXiv Detail & Related papers (2022-01-06T08:24:11Z) - Global Optimality Beyond Two Layers: Training Deep ReLU Networks via
Convex Programs [39.799125462526234]
We develop a novel unified framework to reveal a hidden regularization mechanism through the lens of convex optimization.
We numerically validate our theoretical results via experiments involving both synthetic and real datasets.
arXiv Detail & Related papers (2021-10-11T18:00:30Z) - The Hidden Convex Optimization Landscape of Two-Layer ReLU Neural
Networks: an Exact Characterization of the Optimal Solutions [51.60996023961886]
We prove that finding all globally optimal two-layer ReLU neural networks can be performed by solving a convex optimization program with cone constraints.
Our analysis is novel, characterizes all optimal solutions, and does not leverage duality-based analysis which was recently used to lift neural network training into convex spaces.
arXiv Detail & Related papers (2020-06-10T15:38:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.