Related papers: Predicting kernel regression learning curves from only raw data statistics

Predicting kernel regression learning curves from only raw data statistics

URL: http://arxiv.org/abs/2510.14878v1
Date: Thu, 16 Oct 2025 16:57:59 GMT
Title: Predicting kernel regression learning curves from only raw data statistics
Authors: Dhruva Karkada, Joseph Turnbull, Yuxi Liu, James B. Simon,
Abstract summary: We study kernel regression with common rotation-invariant kernels on real datasets including CIFAR-5m, SVHN, and ImageNet.<n>We give a theoretical framework that predicts learning curves from only two measurements: the empirical data matrix and an empirical decomposition of the target function $f_*$.<n>We prove the Hermite eigenstructure ansatz (HEA) for Gaussian data, but real image data is often "Gaussian enough" for the HEA to hold well in practice.
Score: 8.853323771088883
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study kernel regression with common rotation-invariant kernels on real datasets including CIFAR-5m, SVHN, and ImageNet. We give a theoretical framework that predicts learning curves (test risk vs. sample size) from only two measurements: the empirical data covariance matrix and an empirical polynomial decomposition of the target function $f_*$. The key new idea is an analytical approximation of a kernel's eigenvalues and eigenfunctions with respect to an anisotropic data distribution. The eigenfunctions resemble Hermite polynomials of the data, so we call this approximation the Hermite eigenstructure ansatz (HEA). We prove the HEA for Gaussian data, but we find that real image data is often "Gaussian enough" for the HEA to hold well in practice, enabling us to predict learning curves by applying prior results relating kernel eigenstructure to test risk. Extending beyond kernel regression, we empirically find that MLPs in the feature-learning regime learn Hermite polynomials in the order predicted by the HEA. Our HEA framework is a proof of concept that an end-to-end theory of learning which maps dataset structure all the way to model performance is possible for nontrivial learning algorithms on real datasets.

Related papers

Empirical Gaussian Processes [18.40952262882312]
Empirical GPs are a principled framework for constructing flexible, data-driven GP priors.<n>We show that Empirical GPs achieve competitive performance on learning curve extrapolation and time series forecasting benchmarks.
arXiv Detail & Related papers (2026-02-12T15:39:08Z)
Kernel-Based Learning of Chest X-ray Images for Predicting ICU Escalation among COVID-19 Patients [5.030572036126222]
Multiple kernel learning (MKL) addresses these limitations by constructing composite kernels from simpler ones and integrating information from heterogeneous sources.<n>We extend MKL to accommodate the outcome variable belonging to the exponential family, representing a broader variety of data types, and refer to our proposed method as generalized linear models with integrated multiple additive regression with kernels (GLIMARK)<n>We have applied GLIMARK to a COVID-19 chest X-ray dataset, predicting binary outcomes of ICU escalation and extracting clinically meaningful features, underscoring the practical utility of this approach in real-world scenarios.
arXiv Detail & Related papers (2026-02-10T20:11:43Z)
A general technique for approximating high-dimensional empirical kernel matrices [16.583173656638806]
We present user-friendly bounds for the expected operator norm of a random kernel matrix on the kernel function $k(cdot,cdot)$.<n>We then apply our method to provide new, tighter approximations for inner-product kernel matrix on general high-dimensional data.
arXiv Detail & Related papers (2025-11-05T22:36:52Z)
Highly Adaptive Ridge [84.38107748875144]
We propose a regression method that achieves a $n-2/3$ dimension-free L2 convergence rate in the class of right-continuous functions with square-integrable sectional derivatives. Har is exactly kernel ridge regression with a specific data-adaptive kernel based on a saturated zero-order tensor-product spline basis expansion. We demonstrate empirical performance better than state-of-the-art algorithms for small datasets in particular.
arXiv Detail & Related papers (2024-10-03T17:06:06Z)
On the Consistency of Kernel Methods with Dependent Observations [5.467140383171385]
We propose a new notion of empirical weak convergence (EWC) explaining such phenomena for kernel methods. EWC assumes the existence of a random data distribution and is a strict weakening of previous assumptions in the field. Our results open new classes of processes to statistical learning and can serve as a foundation for a theory of learning beyond i.i.d. and mixing.
arXiv Detail & Related papers (2024-06-10T08:35:01Z)
Demystifying Spectral Bias on Real-World Data [2.3020018305241337]
Kernel ridge regression (KRR) and Gaussian processes (GPs) are fundamental tools in statistics and machine learning.<n>We consider cross-dataset learnability and show that one may use eigenvalues and eigenfunctions associated with highly idealized data measures to reveal spectral bias on complex datasets.
arXiv Detail & Related papers (2024-06-04T18:00:00Z)
Probabilistic Unrolling: Scalable, Inverse-Free Maximum Likelihood Estimation for Latent Gaussian Models [69.22568644711113]
We introduce probabilistic unrolling, a method that combines Monte Carlo sampling with iterative linear solvers to circumvent matrix inversions. Our theoretical analyses reveal that unrolling and backpropagation through the iterations of the solver can accelerate gradient estimation for maximum likelihood estimation. In experiments on simulated and real data, we demonstrate that probabilistic unrolling learns latent Gaussian models up to an order of magnitude faster than gradient EM, with minimal losses in model performance.
arXiv Detail & Related papers (2023-06-05T21:08:34Z)
Local Random Feature Approximations of the Gaussian Kernel [14.230653042112834]
We focus on the popular Gaussian kernel and on techniques to linearize kernel-based models by means of random feature approximations. We show that such approaches yield poor results when modelling high-frequency data, and we propose a novel localization scheme that improves kernel approximations and downstream performance significantly.
arXiv Detail & Related papers (2022-04-12T09:52:36Z)
Test Set Sizing Via Random Matrix Theory [91.3755431537592]
This paper uses techniques from Random Matrix Theory to find the ideal training-testing data split for a simple linear regression. It defines "ideal" as satisfying the integrity metric, i.e. the empirical model error is the actual measurement noise. This paper is the first to solve for the training and test size for any model in a way that is truly optimal.
arXiv Detail & Related papers (2021-12-11T13:18:33Z)
Incremental Ensemble Gaussian Processes [53.3291389385672]
We propose an incremental ensemble (IE-) GP framework, where an EGP meta-learner employs an it ensemble of GP learners, each having a unique kernel belonging to a prescribed kernel dictionary. With each GP expert leveraging the random feature-based approximation to perform online prediction and model update with it scalability, the EGP meta-learner capitalizes on data-adaptive weights to synthesize the per-expert predictions. The novel IE-GP is generalized to accommodate time-varying functions by modeling structured dynamics at the EGP meta-learner and within each GP learner.
arXiv Detail & Related papers (2021-10-13T15:11:25Z)
Out-of-Distribution Generalization in Kernel Regression [21.958028127426196]
We study generalization in kernel regression when the training and test distributions are different. We identify an overlap matrix that quantifies the mismatch between distributions for a given kernel. We develop procedures for optimizing training and test distributions for a given data budget to find best and worst case generalizations under the shift.
arXiv Detail & Related papers (2021-06-04T04:54:25Z)
Learning Compositional Sparse Gaussian Processes with a Shrinkage Prior [26.52863547394537]
We present a novel probabilistic algorithm to learn a kernel composition by handling the sparsity in the kernel selection with Horseshoe prior. Our model can capture characteristics of time series with significant reductions in computational time and have competitive regression performance on real-world data sets.
arXiv Detail & Related papers (2020-12-21T13:41:15Z)
SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for Gaussian Process Regression with Derivatives [86.01677297601624]
We propose a novel approach for scaling GP regression with derivatives based on quadrature Fourier features. We prove deterministic, non-asymptotic and exponentially fast decaying error bounds which apply for both the approximated kernel as well as the approximated posterior.
arXiv Detail & Related papers (2020-03-05T14:33:20Z)
Improved guarantees and a multiple-descent curve for Column Subset Selection and the Nystr\"om method [76.73096213472897]
We develop techniques which exploit spectral properties of the data matrix to obtain improved approximation guarantees. Our approach leads to significantly better bounds for datasets with known rates of singular value decay. We show that both our improved bounds and the multiple-descent curve can be observed on real datasets simply by varying the RBF parameter.
arXiv Detail & Related papers (2020-02-21T00:43:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.