A multiobjective continuation method to compute the regularization path of deep neural networks
- URL: http://arxiv.org/abs/2308.12044v5
- Date: Fri, 29 Mar 2024 14:25:29 GMT
- Title: A multiobjective continuation method to compute the regularization path of deep neural networks
- Authors: Augustina C. Amakor, Konstantin Sonntag, Sebastian Peitz,
- Abstract summary: Sparsity is a highly feature in deep neural networks (DNNs) since it ensures numerical efficiency, improves the interpretability of models, and robustness.
We present an algorithm that allows for the entire sparse front for the above-mentioned objectives in a very efficient manner for high-dimensional gradients with millions of parameters.
We demonstrate that knowledge of the regularization path allows for a well-generalizing network parametrization.
- Score: 1.3654846342364308
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Sparsity is a highly desired feature in deep neural networks (DNNs) since it ensures numerical efficiency, improves the interpretability of models (due to the smaller number of relevant features), and robustness. For linear models, it is well known that there exists a \emph{regularization path} connecting the sparsest solution in terms of the $\ell^1$ norm, i.e., zero weights and the non-regularized solution. Very recently, there was a first attempt to extend the concept of regularization paths to DNNs by means of treating the empirical loss and sparsity ($\ell^1$ norm) as two conflicting criteria and solving the resulting multiobjective optimization problem for low-dimensional DNN. However, due to the non-smoothness of the $\ell^1$ norm and the high number of parameters, this approach is not very efficient from a computational perspective for high-dimensional DNNs. To overcome this limitation, we present an algorithm that allows for the approximation of the entire Pareto front for the above-mentioned objectives in a very efficient manner for high-dimensional DNNs with millions of parameters. We present numerical examples using both deterministic and stochastic gradients. We furthermore demonstrate that knowledge of the regularization path allows for a well-generalizing network parametrization. To the best of our knowledge, this is the first algorithm to compute the regularization path for non-convex multiobjective optimization problems (MOPs) with millions of degrees of freedom.
Related papers
- The Convex Landscape of Neural Networks: Characterizing Global Optima
and Stationary Points via Lasso Models [75.33431791218302]
Deep Neural Network Network (DNN) models are used for programming purposes.
In this paper we examine the use of convex neural recovery models.
We show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
We also show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
arXiv Detail & Related papers (2023-12-19T23:04:56Z) - Pseudonorm Approachability and Applications to Regret Minimization [73.54127663296906]
We convert high-dimensional $ell_infty$-approachability problems to low-dimensional pseudonorm approachability problems.
We develop an algorithmic theory of pseudonorm approachability, analogous to previous work on approachability for $ell$ and other norms.
arXiv Detail & Related papers (2023-02-03T03:19:14Z) - AskewSGD : An Annealed interval-constrained Optimisation method to train
Quantized Neural Networks [12.229154524476405]
We develop a new algorithm, Annealed Skewed SGD - AskewSGD - for training deep neural networks (DNNs) with quantized weights.
Unlike algorithms with active sets and feasible directions, AskewSGD avoids projections or optimization under the entire feasible set.
Experimental results show that the AskewSGD algorithm performs better than or on par with state of the art methods in classical benchmarks.
arXiv Detail & Related papers (2022-11-07T18:13:44Z) - Algorithms for Efficiently Learning Low-Rank Neural Networks [12.916132936159713]
We study algorithms for learning low-rank neural networks.
We present a provably efficient algorithm which learns an optimal low-rank approximation to a single-hidden-layer ReLU network.
We propose a novel low-rank framework for training low-rank $textitdeep$ networks.
arXiv Detail & Related papers (2022-02-02T01:08:29Z) - Learning from Images: Proactive Caching with Parallel Convolutional
Neural Networks [94.85780721466816]
A novel framework for proactive caching is proposed in this paper.
It combines model-based optimization with data-driven techniques by transforming an optimization problem into a grayscale image.
Numerical results show that the proposed scheme can reduce 71.6% computation time with only 0.8% additional performance cost.
arXiv Detail & Related papers (2021-08-15T21:32:47Z) - dNNsolve: an efficient NN-based PDE solver [62.997667081978825]
We introduce dNNsolve, that makes use of dual Neural Networks to solve ODEs/PDEs.
We show that dNNsolve is capable of solving a broad range of ODEs/PDEs in 1, 2 and 3 spacetime dimensions.
arXiv Detail & Related papers (2021-03-15T19:14:41Z) - Consistent Sparse Deep Learning: Theory and Computation [11.24471623055182]
We propose a frequentist-like method for learning sparse deep learning networks (DNNs)
The proposed method can perform very well for large-scale network compression and high-dimensional nonlinear variable selection.
arXiv Detail & Related papers (2021-02-25T23:31:24Z) - Offline Model-Based Optimization via Normalized Maximum Likelihood
Estimation [101.22379613810881]
We consider data-driven optimization problems where one must maximize a function given only queries at a fixed set of points.
This problem setting emerges in many domains where function evaluation is a complex and expensive process.
We propose a tractable approximation that allows us to scale our method to high-capacity neural network models.
arXiv Detail & Related papers (2021-02-16T06:04:27Z) - On the Treatment of Optimization Problems with L1 Penalty Terms via
Multiobjective Continuation [0.0]
We present a novel algorithm that allows us to gain detailed insight into the effects of sparsity in linear and nonlinear optimization.
Our method can be seen as a generalization of well-known homotopy methods for linear regression problems to the nonlinear case.
arXiv Detail & Related papers (2020-12-14T13:00:50Z) - Effective Dimension Adaptive Sketching Methods for Faster Regularized
Least-Squares Optimization [56.05635751529922]
We propose a new randomized algorithm for solving L2-regularized least-squares problems based on sketching.
We consider two of the most popular random embeddings, namely, Gaussian embeddings and the Subsampled Randomized Hadamard Transform (SRHT)
arXiv Detail & Related papers (2020-06-10T15:00:09Z) - PFNN: A Penalty-Free Neural Network Method for Solving a Class of
Second-Order Boundary-Value Problems on Complex Geometries [4.620110353542715]
We present PFNN, a penalty-free neural network method, to solve a class of second-order boundary-value problems.
PFNN is superior to several existing approaches in terms of accuracy, flexibility and robustness.
arXiv Detail & Related papers (2020-04-14T13:36:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.