A duality framework for generalization analysis of random feature models
and two-layer neural networks
- URL: http://arxiv.org/abs/2305.05642v1
- Date: Tue, 9 May 2023 17:41:50 GMT
- Title: A duality framework for generalization analysis of random feature models
and two-layer neural networks
- Authors: Hongrui Chen, Jihao Long, Lei Wu
- Abstract summary: We consider the problem of learning functions in the $mathcalF_p,pi$ and Barron spaces.
Through a duality analysis, we reveal that the approximation and estimation of these spaces can be considered equivalent in a certain sense.
- Score: 3.2931415075553576
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of learning functions in the $\mathcal{F}_{p,\pi}$
and Barron spaces, which are natural function spaces that arise in the
high-dimensional analysis of random feature models (RFMs) and two-layer neural
networks. Through a duality analysis, we reveal that the approximation and
estimation of these spaces can be considered equivalent in a certain sense.
This enables us to focus on the easier problem of approximation and estimation
when studying the generalization of both models. The dual equivalence is
established by defining an information-based complexity that can effectively
control estimation errors. Additionally, we demonstrate the flexibility of our
duality framework through comprehensive analyses of two concrete applications.
The first application is to study learning functions in $\mathcal{F}_{p,\pi}$
with RFMs. We prove that the learning does not suffer from the curse of
dimensionality as long as $p>1$, implying RFMs can work beyond the kernel
regime. Our analysis extends existing results [CMM21] to the noisy case and
removes the requirement of overparameterization.
The second application is to investigate the learnability of reproducing
kernel Hilbert space (RKHS) under the $L^\infty$ metric. We derive both lower
and upper bounds of the minimax estimation error by using the spectrum of the
associated kernel. We then apply these bounds to dot-product kernels and
analyze how they scale with the input dimension. Our results suggest that
learning with ReLU (random) features is generally intractable in terms of
reaching high uniform accuracy.
Related papers
- Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimiax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure.
We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z) - Random features and polynomial rules [0.0]
We present a generalization of the performance of random features models for generic supervised learning problems with Gaussian data.
We find good agreement far from the limits where $Dto infty$ and at least one between $P/DK$, $N/DL$ remains finite.
arXiv Detail & Related papers (2024-02-15T18:09:41Z) - Distribution learning via neural differential equations: a nonparametric
statistical perspective [1.4436965372953483]
This work establishes the first general statistical convergence analysis for distribution learning via ODE models trained through likelihood transformations.
We show that the latter can be quantified via the $C1$-metric entropy of the class $mathcal F$.
We then apply this general framework to the setting of $Ck$-smooth target densities, and establish nearly minimax-optimal convergence rates for two relevant velocity field classes $mathcal F$: $Ck$ functions and neural networks.
arXiv Detail & Related papers (2023-09-03T00:21:37Z) - On the Identifiability and Estimation of Causal Location-Scale Noise
Models [122.65417012597754]
We study the class of location-scale or heteroscedastic noise models (LSNMs)
We show the causal direction is identifiable up to some pathological cases.
We propose two estimators for LSNMs: an estimator based on (non-linear) feature maps, and one based on neural networks.
arXiv Detail & Related papers (2022-10-13T17:18:59Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - On Function Approximation in Reinforcement Learning: Optimism in the
Face of Large State Spaces [208.67848059021915]
We study the exploration-exploitation tradeoff at the core of reinforcement learning.
In particular, we prove that the complexity of the function class $mathcalF$ characterizes the complexity of the function.
Our regret bounds are independent of the number of episodes.
arXiv Detail & Related papers (2020-11-09T18:32:22Z) - Regularization Matters: A Nonparametric Perspective on Overparametrized
Neural Network [20.132432350255087]
Overparametrized neural networks trained by tangent descent (GD) can provably overfit any training data.
This paper studies how well overparametrized neural networks can recover the true target function in the presence of random noises.
arXiv Detail & Related papers (2020-07-06T01:02:23Z) - A Precise High-Dimensional Asymptotic Theory for Boosting and
Minimum-$\ell_1$-Norm Interpolated Classifiers [3.167685495996986]
This paper establishes a precise high-dimensional theory for boosting on separable data.
Under a class of statistical models, we provide an exact analysis of the universality error of boosting.
We also explicitly pin down the relation between the boosting test error and the optimal Bayes error.
arXiv Detail & Related papers (2020-02-05T00:24:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.