A generalization gap estimation for overparameterized models via the
Langevin functional variance
- URL: http://arxiv.org/abs/2112.03660v3
- Date: Mon, 20 Mar 2023 00:22:11 GMT
- Title: A generalization gap estimation for overparameterized models via the
Langevin functional variance
- Authors: Akifumi Okuno, Keisuke Yano
- Abstract summary: We show that a functional variance characterizes the generalization gap even in overparameterized settings.
We propose an efficient approximation of the function variance, the Langevin approximation of the functional variance (Langevin FV)
- Score: 6.231304401179968
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper discusses the estimation of the generalization gap, the difference
between generalization performance and training performance, for
overparameterized models including neural networks. We first show that a
functional variance, a key concept in defining a widely-applicable information
criterion, characterizes the generalization gap even in overparameterized
settings where a conventional theory cannot be applied. As the computational
cost of the functional variance is expensive for the overparameterized models,
we propose an efficient approximation of the function variance, the Langevin
approximation of the functional variance (Langevin FV). This method leverages
only the $1$st-order gradient of the squared loss function, without referencing
the $2$nd-order gradient; this ensures that the computation is efficient and
the implementation is consistent with gradient-based optimization algorithms.
We demonstrate the Langevin FV numerically by estimating the generalization
gaps of overparameterized linear regression and non-linear neural network
models, containing more than a thousand of parameters therein.
Related papers
- Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - Neural Parameter Regression for Explicit Representations of PDE Solution Operators [22.355460388065964]
We introduce Neural Regression (NPR), a novel framework specifically developed for learning solution operators in Partial Differential Equations (PDEs)
NPR employs Physics-Informed Neural Network (PINN, Raissi et al., 2021) techniques to regress Neural Network (NN) parameters.
The framework shows remarkable adaptability to new initial and boundary conditions, allowing for rapid fine-tuning and inference.
arXiv Detail & Related papers (2024-03-19T14:30:56Z) - Optimal Nonlinearities Improve Generalization Performance of Random
Features [0.9790236766474201]
Random feature model with a nonlinear activation function has been shown to performally equivalent to a Gaussian model in terms of training and generalization errors.
We show that acquired parameters from the Gaussian model enable us to define a set of optimal nonlinearities.
Our numerical results validate that the optimized nonlinearities achieve better generalization performance than widely-used nonlinear functions such as ReLU.
arXiv Detail & Related papers (2023-09-28T20:55:21Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - Adaptive LASSO estimation for functional hidden dynamic geostatistical
model [69.10717733870575]
We propose a novel model selection algorithm based on a penalized maximum likelihood estimator (PMLE) for functional hiddenstatistical models (f-HD)
The algorithm is based on iterative optimisation and uses an adaptive least absolute shrinkage and selector operator (GMSOLAS) penalty function, wherein the weights are obtained by the unpenalised f-HD maximum-likelihood estimators.
arXiv Detail & Related papers (2022-08-10T19:17:45Z) - Support estimation in high-dimensional heteroscedastic mean regression [2.28438857884398]
We consider a linear mean regression model with random design and potentially heteroscedastic, heavy-tailed errors.
We use a strictly convex, smooth variant of the Huber loss function with tuning parameter depending on the parameters of the problem.
For the resulting estimator we show sign-consistency and optimal rates of convergence in the $ell_infty$ norm.
arXiv Detail & Related papers (2020-11-03T09:46:31Z) - Understanding Implicit Regularization in Over-Parameterized Single Index
Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model.
We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z) - On the Estimation of Derivatives Using Plug-in Kernel Ridge Regression
Estimators [4.392844455327199]
We propose a simple plug-in kernel ridge regression (KRR) estimator in nonparametric regression.
We provide a non-asymotic analysis to study the behavior of the proposed estimator in a unified manner.
The proposed estimator achieves the optimal rate of convergence with the same choice of tuning parameter for any order of derivatives.
arXiv Detail & Related papers (2020-06-02T02:32:39Z) - SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for
Gaussian Process Regression with Derivatives [86.01677297601624]
We propose a novel approach for scaling GP regression with derivatives based on quadrature Fourier features.
We prove deterministic, non-asymptotic and exponentially fast decaying error bounds which apply for both the approximated kernel as well as the approximated posterior.
arXiv Detail & Related papers (2020-03-05T14:33:20Z) - Implicit differentiation of Lasso-type models for hyperparameter
optimization [82.73138686390514]
We introduce an efficient implicit differentiation algorithm, without matrix inversion, tailored for Lasso-type problems.
Our approach scales to high-dimensional data by leveraging the sparsity of the solutions.
arXiv Detail & Related papers (2020-02-20T18:43:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.