The role of optimization geometry in single neuron learning
- URL: http://arxiv.org/abs/2006.08575v4
- Date: Fri, 22 Apr 2022 00:53:38 GMT
- Title: The role of optimization geometry in single neuron learning
- Authors: Nicholas M. Boffi, Stephen Tu, and Jean-Jacques E. Slotine
- Abstract summary: Recent experiments have demonstrated the choice of optimization geometry can impact generalization performance when learning expressive neural model networks.
We show how the interplay between geometry and the feature geometry sets the out-of-sample leads and improves performance.
- Score: 12.891722496444036
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent numerical experiments have demonstrated that the choice of
optimization geometry used during training can impact generalization
performance when learning expressive nonlinear model classes such as deep
neural networks. These observations have important implications for modern deep
learning but remain poorly understood due to the difficulty of the associated
nonconvex optimization problem. Towards an understanding of this phenomenon, we
analyze a family of pseudogradient methods for learning generalized linear
models under the square loss - a simplified problem containing both
nonlinearity in the model parameters and nonconvexity of the optimization which
admits a single neuron as a special case. We prove non-asymptotic bounds on the
generalization error that sharply characterize how the interplay between the
optimization geometry and the feature space geometry sets the out-of-sample
performance of the learned model. Experimentally, selecting the optimization
geometry as suggested by our theory leads to improved performance in
generalized linear model estimation problems such as nonlinear and nonconvex
variants of sparse vector recovery and low-rank matrix sensing.
Related papers
- The Convex Landscape of Neural Networks: Characterizing Global Optima
and Stationary Points via Lasso Models [75.33431791218302]
Deep Neural Network Network (DNN) models are used for programming purposes.
In this paper we examine the use of convex neural recovery models.
We show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
We also show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
arXiv Detail & Related papers (2023-12-19T23:04:56Z) - Optimal Nonlinearities Improve Generalization Performance of Random
Features [0.9790236766474201]
Random feature model with a nonlinear activation function has been shown to performally equivalent to a Gaussian model in terms of training and generalization errors.
We show that acquired parameters from the Gaussian model enable us to define a set of optimal nonlinearities.
Our numerical results validate that the optimized nonlinearities achieve better generalization performance than widely-used nonlinear functions such as ReLU.
arXiv Detail & Related papers (2023-09-28T20:55:21Z) - The Power of Learned Locally Linear Models for Nonlinear Policy
Optimization [26.45568696453259]
This paper conducts a rigorous analysis of a simplified variant of this strategy for general nonlinear systems.
We analyze an algorithm which iterates between estimating local linear models of nonlinear system dynamics and performing $mathttiLQR$-like policy updates.
arXiv Detail & Related papers (2023-05-16T17:13:00Z) - Hessian Eigenspectra of More Realistic Nonlinear Models [73.31363313577941]
We make a emphprecise characterization of the Hessian eigenspectra for a broad family of nonlinear models.
Our analysis takes a step forward to identify the origin of many striking features observed in more complex machine learning models.
arXiv Detail & Related papers (2021-03-02T06:59:52Z) - LQF: Linear Quadratic Fine-Tuning [114.3840147070712]
We present the first method for linearizing a pre-trained model that achieves comparable performance to non-linear fine-tuning.
LQF consists of simple modifications to the architecture, loss function and optimization typically used for classification.
arXiv Detail & Related papers (2020-12-21T06:40:20Z) - Learning Fast Approximations of Sparse Nonlinear Regression [50.00693981886832]
In this work, we bridge the gap by introducing the Threshold Learned Iterative Shrinkage Algorithming (NLISTA)
Experiments on synthetic data corroborate our theoretical results and show our method outperforms state-of-the-art methods.
arXiv Detail & Related papers (2020-10-26T11:31:08Z) - Non-parametric Models for Non-negative Functions [48.7576911714538]
We provide the first model for non-negative functions from the same good linear models.
We prove that it admits a representer theorem and provide an efficient dual formulation for convex problems.
arXiv Detail & Related papers (2020-07-08T07:17:28Z) - Automatically Learning Compact Quality-aware Surrogates for Optimization
Problems [55.94450542785096]
Solving optimization problems with unknown parameters requires learning a predictive model to predict the values of the unknown parameters and then solving the problem using these values.
Recent work has shown that including the optimization problem as a layer in a complex training model pipeline results in predictions of iteration of unobserved decision making.
We show that we can improve solution quality by learning a low-dimensional surrogate model of a large optimization problem.
arXiv Detail & Related papers (2020-06-18T19:11:54Z) - Loss landscapes and optimization in over-parameterized non-linear
systems and neural networks [20.44438519046223]
We show that wide neural networks satisfy the PL$*$ condition, which explains the (S)GD convergence to a global minimum.
We show that wide neural networks satisfy the PL$*$ condition, which explains the (S)GD convergence to a global minimum.
arXiv Detail & Related papers (2020-02-29T17:18:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.