Precise asymptotic analysis of Sobolev training for random feature models
- URL: http://arxiv.org/abs/2511.03050v1
- Date: Tue, 04 Nov 2025 22:49:33 GMT
- Title: Precise asymptotic analysis of Sobolev training for random feature models
- Authors: Katharine E Fisher, Matthew TC Li, Youssef Marzouk, Timo Schorlepp,
- Abstract summary: We study the impact of Sobolev training -- regression with both function and gradient data -- on the generalization error of predictive models in high dimensions.<n>Our results identify settings where models perform optimally by interpolating noisy function and gradient data.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Gradient information is widely useful and available in applications, and is therefore natural to include in the training of neural networks. Yet little is known theoretically about the impact of Sobolev training -- regression with both function and gradient data -- on the generalization error of highly overparameterized predictive models in high dimensions. In this paper, we obtain a precise characterization of this training modality for random feature (RF) models in the limit where the number of trainable parameters, input dimensions, and training data tend proportionally to infinity. Our model for Sobolev training reflects practical implementations by sketching gradient data onto finite dimensional subspaces. By combining the replica method from statistical physics with linearizations in operator-valued free probability theory, we derive a closed-form description for the generalization errors of the trained RF models. For target functions described by single-index models, we demonstrate that supplementing function data with additional gradient data does not universally improve predictive performance. Rather, the degree of overparameterization should inform the choice of training method. More broadly, our results identify settings where models perform optimally by interpolating noisy function and gradient data.
Related papers
- Tensor Network Based Feature Learning Model [6.101839518775971]
Feature Learning (FL) model represents tensor-product features as a learnable Canonical Polyadic Decomposition (CPD)<n>We prove the effectiveness of the FL model through experiments on real data of various dimensionality and scale.
arXiv Detail & Related papers (2025-12-02T09:17:21Z) - Nonparametric Data Attribution for Diffusion Models [57.820618036556084]
Data attribution for generative models seeks to quantify the influence of individual training examples on model outputs.<n>We propose a nonparametric attribution method that operates entirely on data, measuring influence via patch-level similarity between generated and training images.
arXiv Detail & Related papers (2025-10-16T03:37:16Z) - Deep Partially Linear Transformation Model for Right-Censored Survival Data [6.315323176162257]
This paper introduces a deep partially linear transformation model (DPLTM) as a general and flexible regression framework.<n>The proposed method is capable of avoiding the curse of dimensionality while still retaining the interpretability of some cocensors of interest.<n> Comprehensive simulation studies demonstrate the impressive performance of the proposed procedure in terms of both accuracy and the predictive power.
arXiv Detail & Related papers (2024-12-10T15:50:43Z) - Computation-Aware Gaussian Processes: Model Selection And Linear-Time Inference [55.150117654242706]
We show that model selection for computation-aware GPs trained on 1.8 million data points can be done within a few hours on a single GPU.<n>As a result of this work, Gaussian processes can be trained on large-scale datasets without significantly compromising their ability to quantify uncertainty.
arXiv Detail & Related papers (2024-11-01T21:11:48Z) - Just How Flexible are Neural Networks in Practice? [89.80474583606242]
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters.
In practice, however, we only find solutions via our training procedure, including the gradient and regularizers, limiting flexibility.
arXiv Detail & Related papers (2024-06-17T12:24:45Z) - Scaling and renormalization in high-dimensional regression [72.59731158970894]
We present a unifying perspective on recent results on ridge regression.<n>We use the basic tools of random matrix theory and free probability, aimed at readers with backgrounds in physics and deep learning.<n>Our results extend and provide a unifying perspective on earlier models of scaling laws.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - Generalizing Backpropagation for Gradient-Based Interpretability [103.2998254573497]
We show that the gradient of a model is a special case of a more general formulation using semirings.
This observation allows us to generalize the backpropagation algorithm to efficiently compute other interpretable statistics.
arXiv Detail & Related papers (2023-07-06T15:19:53Z) - Linear Stability Hypothesis and Rank Stratification for Nonlinear Models [3.0041514772139166]
We propose a rank stratification for general nonlinear models to uncover a model rank as an "effective size of parameters"
By these results, model rank of a target function predicts a minimal training data size for its successful recovery.
arXiv Detail & Related papers (2022-11-21T16:27:25Z) - Sobolev Acceleration and Statistical Optimality for Learning Elliptic
Equations via Gradient Descent [11.483919798541393]
We study the statistical limits in terms of Sobolev norms of gradient descent for solving inverse problem from randomly sampled noisy observations.
Our class of objective functions includes Sobolev training for kernel regression, Deep Ritz Methods (DRM), and Physics Informed Neural Networks (PINN)
arXiv Detail & Related papers (2022-05-15T17:01:53Z) - Extension of Dynamic Mode Decomposition for dynamic systems with
incomplete information based on t-model of optimal prediction [69.81996031777717]
The Dynamic Mode Decomposition has proved to be a very efficient technique to study dynamic data.
The application of this approach becomes problematic if the available data is incomplete because some dimensions of smaller scale either missing or unmeasured.
We consider a first-order approximation of the Mori-Zwanzig decomposition, state the corresponding optimization problem and solve it with the gradient-based optimization method.
arXiv Detail & Related papers (2022-02-23T11:23:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.