Optimal Nonlinearities Improve Generalization Performance of Random
Features
- URL: http://arxiv.org/abs/2309.16846v1
- Date: Thu, 28 Sep 2023 20:55:21 GMT
- Title: Optimal Nonlinearities Improve Generalization Performance of Random
Features
- Authors: Samet Demir and Zafer Do\u{g}an
- Abstract summary: Random feature model with a nonlinear activation function has been shown to performally equivalent to a Gaussian model in terms of training and generalization errors.
We show that acquired parameters from the Gaussian model enable us to define a set of optimal nonlinearities.
Our numerical results validate that the optimized nonlinearities achieve better generalization performance than widely-used nonlinear functions such as ReLU.
- Score: 0.9790236766474201
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Random feature model with a nonlinear activation function has been shown to
perform asymptotically equivalent to a Gaussian model in terms of training and
generalization errors. Analysis of the equivalent model reveals an important
yet not fully understood role played by the activation function. To address
this issue, we study the "parameters" of the equivalent model to achieve
improved generalization performance for a given supervised learning problem. We
show that acquired parameters from the Gaussian model enable us to define a set
of optimal nonlinearities. We provide two example classes from this set, e.g.,
second-order polynomial and piecewise linear functions. These functions are
optimized to improve generalization performance regardless of the actual form.
We experiment with regression and classification problems, including synthetic
and real (e.g., CIFAR10) data. Our numerical results validate that the
optimized nonlinearities achieve better generalization performance than
widely-used nonlinear functions such as ReLU. Furthermore, we illustrate that
the proposed nonlinearities also mitigate the so-called double descent
phenomenon, which is known as the non-monotonic generalization performance
regarding the sample size and the model size.
Related papers
- Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning [53.97335841137496]
We propose an oracle-efficient algorithm, dubbed Pessimistic Least-Square Value Iteration (PNLSVI) for offline RL with non-linear function approximation.
Our algorithm enjoys a regret bound that has a tight dependency on the function class complexity and achieves minimax optimal instance-dependent regret when specialized to linear function approximation.
arXiv Detail & Related papers (2023-10-02T17:42:01Z) - Linear Stability Hypothesis and Rank Stratification for Nonlinear Models [3.0041514772139166]
We propose a rank stratification for general nonlinear models to uncover a model rank as an "effective size of parameters"
By these results, model rank of a target function predicts a minimal training data size for its successful recovery.
arXiv Detail & Related papers (2022-11-21T16:27:25Z) - A generalization gap estimation for overparameterized models via the
Langevin functional variance [6.231304401179968]
We show that a functional variance characterizes the generalization gap even in overparameterized settings.
We propose an efficient approximation of the function variance, the Langevin approximation of the functional variance (Langevin FV)
arXiv Detail & Related papers (2021-12-07T12:43:05Z) - Hessian Eigenspectra of More Realistic Nonlinear Models [73.31363313577941]
We make a emphprecise characterization of the Hessian eigenspectra for a broad family of nonlinear models.
Our analysis takes a step forward to identify the origin of many striking features observed in more complex machine learning models.
arXiv Detail & Related papers (2021-03-02T06:59:52Z) - LQF: Linear Quadratic Fine-Tuning [114.3840147070712]
We present the first method for linearizing a pre-trained model that achieves comparable performance to non-linear fine-tuning.
LQF consists of simple modifications to the architecture, loss function and optimization typically used for classification.
arXiv Detail & Related papers (2020-12-21T06:40:20Z) - Non-parametric Models for Non-negative Functions [48.7576911714538]
We provide the first model for non-negative functions from the same good linear models.
We prove that it admits a representer theorem and provide an efficient dual formulation for convex problems.
arXiv Detail & Related papers (2020-07-08T07:17:28Z) - Slice Sampling for General Completely Random Measures [74.24975039689893]
We present a novel Markov chain Monte Carlo algorithm for posterior inference that adaptively sets the truncation level using auxiliary slice variables.
The efficacy of the proposed algorithm is evaluated on several popular nonparametric models.
arXiv Detail & Related papers (2020-06-24T17:53:53Z) - The role of optimization geometry in single neuron learning [12.891722496444036]
Recent experiments have demonstrated the choice of optimization geometry can impact generalization performance when learning expressive neural model networks.
We show how the interplay between geometry and the feature geometry sets the out-of-sample leads and improves performance.
arXiv Detail & Related papers (2020-06-15T17:39:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.