Does a sparse ReLU network training problem always admit an optimum?
- URL: http://arxiv.org/abs/2306.02666v2
- Date: Tue, 5 Dec 2023 08:53:25 GMT
- Title: Does a sparse ReLU network training problem always admit an optimum?
- Authors: Quoc-Tung Le (LIP, OCKHAM), Elisa Riccietti (OCKHAM), R\'emi Gribonval
(OCKHAM)
- Abstract summary: We show that the existence of an optimal solution is not always guaranteed, especially in the context of sparse ReLU neural networks.
In particular, we first show that optimization problems involving deep networks with certain sparsity patterns do not always have optimal parameters.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Given a training set, a loss function, and a neural network architecture, it
is often taken for granted that optimal network parameters exist, and a common
practice is to apply available optimization algorithms to search for them. In
this work, we show that the existence of an optimal solution is not always
guaranteed, especially in the context of {\em sparse} ReLU neural networks. In
particular, we first show that optimization problems involving deep networks
with certain sparsity patterns do not always have optimal parameters, and that
optimization algorithms may then diverge. Via a new topological relation
between sparse ReLU neural networks and their linear counterparts, we derive --
using existing tools from real algebraic geometry -- an algorithm to verify
that a given sparsity pattern suffers from this issue. Then, the existence of a
global optimum is proved for every concrete optimization problem involving a
shallow sparse ReLU neural network of output dimension one. Overall, the
analysis is based on the investigation of two topological properties of the
space of functions implementable as sparse ReLU neural networks: a best
approximation property, and a closedness property, both in the uniform norm.
This is studied both for (finite) domains corresponding to practical training
on finite training sets, and for more general domains such as the unit cube.
This allows us to provide conditions for the guaranteed existence of an optimum
given a sparsity pattern. The results apply not only to several sparsity
patterns proposed in recent works on network pruning/sparsification, but also
to classical dense neural networks, including architectures not covered by
existing results.
Related papers
- The Convex Landscape of Neural Networks: Characterizing Global Optima
and Stationary Points via Lasso Models [75.33431791218302]
Deep Neural Network Network (DNN) models are used for programming purposes.
In this paper we examine the use of convex neural recovery models.
We show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
We also show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
arXiv Detail & Related papers (2023-12-19T23:04:56Z) - Optimal Sets and Solution Paths of ReLU Networks [56.40911684005949]
We develop an analytical framework to characterize the set of optimal ReLU networks.
We establish conditions for the neuralization of ReLU networks to be continuous, and develop sensitivity results for ReLU networks.
arXiv Detail & Related papers (2023-05-31T18:48:16Z) - AskewSGD : An Annealed interval-constrained Optimisation method to train
Quantized Neural Networks [12.229154524476405]
We develop a new algorithm, Annealed Skewed SGD - AskewSGD - for training deep neural networks (DNNs) with quantized weights.
Unlike algorithms with active sets and feasible directions, AskewSGD avoids projections or optimization under the entire feasible set.
Experimental results show that the AskewSGD algorithm performs better than or on par with state of the art methods in classical benchmarks.
arXiv Detail & Related papers (2022-11-07T18:13:44Z) - Acceleration techniques for optimization over trained neural network
ensembles [1.0323063834827415]
We study optimization problems where the objective function is modeled through feedforward neural networks with rectified linear unit activation.
We present a mixed-integer linear program based on existing popular big-$M$ formulations for optimizing over a single neural network.
arXiv Detail & Related papers (2021-12-13T20:50:54Z) - Path Regularization: A Convexity and Sparsity Inducing Regularization
for Parallel ReLU Networks [75.33431791218302]
We study the training problem of deep neural networks and introduce an analytic approach to unveil hidden convexity in the optimization landscape.
We consider a deep parallel ReLU network architecture, which also includes standard deep networks and ResNets as its special cases.
arXiv Detail & Related papers (2021-10-18T18:00:36Z) - Neural Spectrahedra and Semidefinite Lifts: Global Convex Optimization
of Polynomial Activation Neural Networks in Fully Polynomial-Time [31.94590517036704]
We develop exact convex optimization formulations for two-layer numerical networks with second degree activations.
We show that semidefinite neural and therefore global optimization is in complexity dimension and sample size for all input data.
The proposed approach is significantly faster to obtain better test accuracy compared to the standard backpropagation procedure.
arXiv Detail & Related papers (2021-01-07T08:43:01Z) - Efficient Methods for Structured Nonconvex-Nonconcave Min-Max
Optimization [98.0595480384208]
We propose a generalization extraient spaces which converges to a stationary point.
The algorithm applies not only to general $p$-normed spaces, but also to general $p$-dimensional vector spaces.
arXiv Detail & Related papers (2020-10-31T21:35:42Z) - The Hidden Convex Optimization Landscape of Two-Layer ReLU Neural
Networks: an Exact Characterization of the Optimal Solutions [51.60996023961886]
We prove that finding all globally optimal two-layer ReLU neural networks can be performed by solving a convex optimization program with cone constraints.
Our analysis is novel, characterizes all optimal solutions, and does not leverage duality-based analysis which was recently used to lift neural network training into convex spaces.
arXiv Detail & Related papers (2020-06-10T15:38:30Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.