The Eigenlearning Framework: A Conservation Law Perspective on Kernel
Regression and Wide Neural Networks
- URL: http://arxiv.org/abs/2110.03922v6
- Date: Thu, 26 Oct 2023 23:22:26 GMT
- Title: The Eigenlearning Framework: A Conservation Law Perspective on Kernel
Regression and Wide Neural Networks
- Authors: James B. Simon, Madeline Dickens, Dhruva Karkada, Michael R. DeWeese
- Abstract summary: We derive simple closed-form estimates for the test risk and other generalization metrics of kernel ridge regression.
We identify a sharp conservation law which limits the ability of KRR to learn any orthonormal basis of functions.
- Score: 1.6519302768772166
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We derive simple closed-form estimates for the test risk and other
generalization metrics of kernel ridge regression (KRR). Relative to prior
work, our derivations are greatly simplified and our final expressions are more
readily interpreted. These improvements are enabled by our identification of a
sharp conservation law which limits the ability of KRR to learn any orthonormal
basis of functions. Test risk and other objects of interest are expressed
transparently in terms of our conserved quantity evaluated in the kernel
eigenbasis. We use our improved framework to: i) provide a theoretical
explanation for the "deep bootstrap" of Nakkiran et al (2020), ii) generalize a
previous result regarding the hardness of the classic parity problem, iii)
fashion a theoretical tool for the study of adversarial robustness, and iv)
draw a tight analogy between KRR and a well-studied system in statistical
physics.
Related papers
- A Comprehensive Analysis on the Learning Curve in Kernel Ridge Regression [6.749750044497731]
This paper conducts a comprehensive study of the learning curves of kernel ridge regression (KRR) under minimal assumptions.
We analyze the role of key properties of the kernel, such as its spectral eigen-decay, the characteristics of the eigenfunctions, and the smoothness of the kernel.
arXiv Detail & Related papers (2024-10-23T11:52:52Z) - Learning Analysis of Kernel Ridgeless Regression with Asymmetric Kernel Learning [33.34053480377887]
This paper enhances kernel ridgeless regression with Locally-Adaptive-Bandwidths (LAB) RBF kernels.
For the first time, we demonstrate that functions learned from LAB RBF kernels belong to an integral space of Reproducible Kernel Hilbert Spaces (RKHSs)
arXiv Detail & Related papers (2024-06-03T15:28:12Z) - Provably Efficient Partially Observable Risk-Sensitive Reinforcement
Learning with Hindsight Observation [35.278669159850146]
We introduce a novel formulation that integrates hindsight observations into a Partially Observable Decision Process (POMDP) framework.
We develop the first provably efficient RL algorithm tailored for this setting.
These techniques are of particular interest to the theoretical study of reinforcement learning.
arXiv Detail & Related papers (2024-02-28T08:24:06Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - Efficient kernel surrogates for neural network-based regression [0.8030359871216615]
We study the performance of the Conjugate Kernel (CK), an efficient approximation to the Neural Tangent Kernel (NTK)
We show that the CK performance is only marginally worse than that of the NTK and, in certain cases, is shown to be superior.
In addition to providing a theoretical grounding for using CKs instead of NTKs, our framework suggests a recipe for improving DNN accuracy inexpensively.
arXiv Detail & Related papers (2023-10-28T06:41:47Z) - Learning Expressive Priors for Generalization and Uncertainty Estimation
in Neural Networks [77.89179552509887]
We propose a novel prior learning method for advancing generalization and uncertainty estimation in deep neural networks.
The key idea is to exploit scalable and structured posteriors of neural networks as informative priors with generalization guarantees.
We exhaustively show the effectiveness of this method for uncertainty estimation and generalization.
arXiv Detail & Related papers (2023-07-15T09:24:33Z) - Fine-grained analysis of non-parametric estimation for pairwise learning [9.676007573960383]
We are concerned with the generalization performance of non-parametric estimation for pairwise learning.
Our results can be used to handle a wide range of pairwise learning problems including ranking, AUC, pairwise regression and metric and similarity learning.
arXiv Detail & Related papers (2023-05-31T08:13:14Z) - Stability and Generalization Analysis of Gradient Methods for Shallow
Neural Networks [59.142826407441106]
We study the generalization behavior of shallow neural networks (SNNs) by leveraging the concept of algorithmic stability.
We consider gradient descent (GD) and gradient descent (SGD) to train SNNs, for both of which we develop consistent excess bounds.
arXiv Detail & Related papers (2022-09-19T18:48:00Z) - The Unreasonable Effectiveness of Deep Evidential Regression [72.30888739450343]
A new approach with uncertainty-aware regression-based neural networks (NNs) shows promise over traditional deterministic methods and typical Bayesian NNs.
We detail the theoretical shortcomings and analyze the performance on synthetic and real-world data sets, showing that Deep Evidential Regression is a quantification rather than an exact uncertainty.
arXiv Detail & Related papers (2022-05-20T10:10:32Z) - Domain-Adjusted Regression or: ERM May Already Learn Features Sufficient
for Out-of-Distribution Generalization [52.7137956951533]
We argue that devising simpler methods for learning predictors on existing features is a promising direction for future research.
We introduce Domain-Adjusted Regression (DARE), a convex objective for learning a linear predictor that is provably robust under a new model of distribution shift.
Under a natural model, we prove that the DARE solution is the minimax-optimal predictor for a constrained set of test distributions.
arXiv Detail & Related papers (2022-02-14T16:42:16Z) - Meta-Learning Hypothesis Spaces for Sequential Decision-making [79.73213540203389]
We propose to meta-learn a kernel from offline data (Meta-KeL)
Under mild conditions, we guarantee that our estimated RKHS yields valid confidence sets.
We also empirically evaluate the effectiveness of our approach on a Bayesian optimization task.
arXiv Detail & Related papers (2022-02-01T17:46:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.