Kernel Methods and Multi-layer Perceptrons Learn Linear Models in High
Dimensions
- URL: http://arxiv.org/abs/2201.08082v1
- Date: Thu, 20 Jan 2022 09:35:46 GMT
- Title: Kernel Methods and Multi-layer Perceptrons Learn Linear Models in High
Dimensions
- Authors: Mojtaba Sahraee-Ardakan, Melikasadat Emami, Parthe Pandit, Sundeep
Rangan, Alyson K. Fletcher
- Abstract summary: We show that for a large class of kernels, including the neural kernel of fully connected networks, kernel methods can only perform as well as linear models in a certain high-dimensional regime.
More complex models for the data other than independent features are needed for high-dimensional analysis.
- Score: 25.635225717360466
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Empirical observation of high dimensional phenomena, such as the double
descent behaviour, has attracted a lot of interest in understanding classical
techniques such as kernel methods, and their implications to explain
generalization properties of neural networks. Many recent works analyze such
models in a certain high-dimensional regime where the covariates are
independent and the number of samples and the number of covariates grow at a
fixed ratio (i.e. proportional asymptotics). In this work we show that for a
large class of kernels, including the neural tangent kernel of fully connected
networks, kernel methods can only perform as well as linear models in this
regime. More surprisingly, when the data is generated by a kernel model where
the relationship between input and the response could be very nonlinear, we
show that linear models are in fact optimal, i.e. linear models achieve the
minimum risk among all models, linear or nonlinear. These results suggest that
more complex models for the data other than independent features are needed for
high-dimensional analysis.
Related papers
- Capturing dynamical correlations using implicit neural representations [85.66456606776552]
We develop an artificial intelligence framework which combines a neural network trained to mimic simulated data from a model Hamiltonian with automatic differentiation to recover unknown parameters from experimental data.
In doing so, we illustrate the ability to build and train a differentiable model only once, which then can be applied in real-time to multi-dimensional scattering data.
arXiv Detail & Related papers (2023-04-08T07:55:36Z) - Gradient flow in the gaussian covariate model: exact solution of
learning curves and multiple descent structures [14.578025146641806]
We provide a full and unified analysis of the whole time-evolution of the generalization curve.
We show that our theoretical predictions adequately match the learning curves obtained by gradient descent over realistic datasets.
arXiv Detail & Related papers (2022-12-13T17:39:18Z) - Hessian Eigenspectra of More Realistic Nonlinear Models [73.31363313577941]
We make a emphprecise characterization of the Hessian eigenspectra for a broad family of nonlinear models.
Our analysis takes a step forward to identify the origin of many striking features observed in more complex machine learning models.
arXiv Detail & Related papers (2021-03-02T06:59:52Z) - The Neural Tangent Kernel in High Dimensions: Triple Descent and a
Multi-Scale Theory of Generalization [34.235007566913396]
Modern deep learning models employ considerably more parameters than required to fit the training data. Whereas conventional statistical wisdom suggests such models should drastically overfit, in practice these models generalize remarkably well.
An emerging paradigm for describing this unexpected behavior is in terms of a emphdouble descent curve.
We provide a precise high-dimensional analysis of generalization with the Neural Tangent Kernel, which characterizes the behavior of wide neural networks with gradient descent.
arXiv Detail & Related papers (2020-08-15T20:55:40Z) - Multipole Graph Neural Operator for Parametric Partial Differential
Equations [57.90284928158383]
One of the main challenges in using deep learning-based methods for simulating physical systems is formulating physics-based data.
We propose a novel multi-level graph neural network framework that captures interaction at all ranges with only linear complexity.
Experiments confirm our multi-graph network learns discretization-invariant solution operators to PDEs and can be evaluated in linear time.
arXiv Detail & Related papers (2020-06-16T21:56:22Z) - Bayesian Sparse Factor Analysis with Kernelized Observations [67.60224656603823]
Multi-view problems can be faced with latent variable models.
High-dimensionality and non-linear issues are traditionally handled by kernel methods.
We propose merging both approaches into single model.
arXiv Detail & Related papers (2020-06-01T14:25:38Z) - Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms.
We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z) - Learning Bijective Feature Maps for Linear ICA [73.85904548374575]
We show that existing probabilistic deep generative models (DGMs) which are tailor-made for image data, underperform on non-linear ICA tasks.
To address this, we propose a DGM which combines bijective feature maps with a linear ICA model to learn interpretable latent structures for high-dimensional data.
We create models that converge quickly, are easy to train, and achieve better unsupervised latent factor discovery than flow-based models, linear ICA, and Variational Autoencoders on images.
arXiv Detail & Related papers (2020-02-18T17:58:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.