Hessian Eigenspectra of More Realistic Nonlinear Models
- URL: http://arxiv.org/abs/2103.01519v1
- Date: Tue, 2 Mar 2021 06:59:52 GMT
- Title: Hessian Eigenspectra of More Realistic Nonlinear Models
- Authors: Zhenyu Liao and Michael W. Mahoney
- Abstract summary: We make a emphprecise characterization of the Hessian eigenspectra for a broad family of nonlinear models.
Our analysis takes a step forward to identify the origin of many striking features observed in more complex machine learning models.
- Score: 73.31363313577941
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Given an optimization problem, the Hessian matrix and its eigenspectrum can
be used in many ways, ranging from designing more efficient second-order
algorithms to performing model analysis and regression diagnostics. When
nonlinear models and non-convex problems are considered, strong simplifying
assumptions are often made to make Hessian spectral analysis more tractable.
This leads to the question of how relevant the conclusions of such analyses are
for more realistic nonlinear models. In this paper, we exploit deterministic
equivalent techniques from random matrix theory to make a \emph{precise}
characterization of the Hessian eigenspectra for a broad family of nonlinear
models, including models that generalize the classical generalized linear
models, without relying on strong simplifying assumptions used previously. We
show that, depending on the data properties, the nonlinear response model, and
the loss function, the Hessian can have \emph{qualitatively} different spectral
behaviors: of bounded or unbounded support, with single- or multi-bulk, and
with isolated eigenvalues on the left- or right-hand side of the bulk. By
focusing on such a simple but nontrivial nonlinear model, our analysis takes a
step forward to unveil the theoretical origin of many visually striking
features observed in more complex machine learning models.
Related papers
- Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - An Analysis of Linear Time Series Forecasting Models [0.0]
We show that several popular variants of linear models for time series forecasting are equivalent and functionally indistinguishable from standard, unconstrained linear regression.
We provide experimental evidence that the models under inspection learn nearly identical solutions, and finally demonstrate that the simpler closed form solutions are superior forecasters across 72% of test settings.
arXiv Detail & Related papers (2024-03-21T17:42:45Z) - Identifiability of latent-variable and structural-equation models: from
linear to nonlinear [2.159277717031637]
In factor analysis, non-Gaussianity of the (latent) variables has been shown to provide identifiability.
More recently, we have shown how even general non nonlinear versions of such models can be estimated.
arXiv Detail & Related papers (2023-02-06T10:21:21Z) - Gradient flow in the gaussian covariate model: exact solution of
learning curves and multiple descent structures [14.578025146641806]
We provide a full and unified analysis of the whole time-evolution of the generalization curve.
We show that our theoretical predictions adequately match the learning curves obtained by gradient descent over realistic datasets.
arXiv Detail & Related papers (2022-12-13T17:39:18Z) - Learning Graphical Factor Models with Riemannian Optimization [70.13748170371889]
This paper proposes a flexible algorithmic framework for graph learning under low-rank structural constraints.
The problem is expressed as penalized maximum likelihood estimation of an elliptical distribution.
We leverage geometries of positive definite matrices and positive semi-definite matrices of fixed rank that are well suited to elliptical models.
arXiv Detail & Related papers (2022-10-21T13:19:45Z) - Sparse Quantized Spectral Clustering [85.77233010209368]
We exploit tools from random matrix theory to make precise statements about how the eigenspectrum of a matrix changes under such nonlinear transformations.
We show that very little change occurs in the informative eigenstructure even under drastic sparsification/quantization.
arXiv Detail & Related papers (2020-10-03T15:58:07Z) - Non-parametric Models for Non-negative Functions [48.7576911714538]
We provide the first model for non-negative functions from the same good linear models.
We prove that it admits a representer theorem and provide an efficient dual formulation for convex problems.
arXiv Detail & Related papers (2020-07-08T07:17:28Z) - The role of optimization geometry in single neuron learning [12.891722496444036]
Recent experiments have demonstrated the choice of optimization geometry can impact generalization performance when learning expressive neural model networks.
We show how the interplay between geometry and the feature geometry sets the out-of-sample leads and improves performance.
arXiv Detail & Related papers (2020-06-15T17:39:44Z) - Multiplicative noise and heavy tails in stochastic optimization [62.993432503309485]
empirical optimization is central to modern machine learning, but its role in its success is still unclear.
We show that it commonly arises in parameters of discrete multiplicative noise due to variance.
A detailed analysis is conducted in which we describe on key factors, including recent step size, and data, all exhibit similar results on state-of-the-art neural network models.
arXiv Detail & Related papers (2020-06-11T09:58:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.