$L^p$ sampling numbers for the Fourier-analytic Barron space
- URL: http://arxiv.org/abs/2208.07605v1
- Date: Tue, 16 Aug 2022 08:41:48 GMT
- Title: $L^p$ sampling numbers for the Fourier-analytic Barron space
- Authors: Felix Voigtlaender
- Abstract summary: We study functions that can be written as [ f(x) = int_mathbbRd F(xi), e2 pi i langle x, xi rangle, d xi quad textwith quad int_mathbbRd |F(xi)| cdot (1 + |xi|)sigma, d xi infty.
For $
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we consider Barron functions $f : [0,1]^d \to \mathbb{R}$ of
smoothness $\sigma > 0$, which are functions that can be written as \[
f(x) = \int_{\mathbb{R}^d} F(\xi) \, e^{2 \pi i \langle x, \xi \rangle} \, d
\xi
\quad \text{with} \quad
\int_{\mathbb{R}^d} |F(\xi)| \cdot (1 + |\xi|)^{\sigma} \, d \xi < \infty. \]
For $\sigma = 1$, these functions play a prominent role in machine learning,
since they can be efficiently approximated by (shallow) neural networks without
suffering from the curse of dimensionality.
For these functions, we study the following question: Given $m$ point samples
$f(x_1),\dots,f(x_m)$ of an unknown Barron function $f : [0,1]^d \to
\mathbb{R}$ of smoothness $\sigma$, how well can $f$ be recovered from these
samples, for an optimal choice of the sampling points and the reconstruction
procedure? Denoting the optimal reconstruction error measured in $L^p$ by $s_m
(\sigma; L^p)$, we show that \[
m^{- \frac{1}{\max \{ p,2 \}} - \frac{\sigma}{d}}
\lesssim s_m(\sigma;L^p)
\lesssim (\ln (e + m))^{\alpha(\sigma,d) / p}
\cdot m^{- \frac{1}{\max \{ p,2 \}} - \frac{\sigma}{d}}
, \] where the implied constants only depend on $\sigma$ and $d$ and where
$\alpha(\sigma,d)$ stays bounded as $d \to \infty$.
Related papers
- Efficient Continual Finite-Sum Minimization [52.5238287567572]
We propose a key twist into the finite-sum minimization, dubbed as continual finite-sum minimization.
Our approach significantly improves upon the $mathcalO(n/epsilon)$ FOs that $mathrmStochasticGradientDescent$ requires.
We also prove that there is no natural first-order method with $mathcalOleft(n/epsilonalpharight)$ complexity gradient for $alpha 1/4$, establishing that the first-order complexity of our method is nearly tight.
arXiv Detail & Related papers (2024-06-07T08:26:31Z) - Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models.
In this work, we initiate the study of provably learning a multi-head attention layer from random examples.
We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z) - Fast $(1+\varepsilon)$-Approximation Algorithms for Binary Matrix
Factorization [54.29685789885059]
We introduce efficient $(1+varepsilon)$-approximation algorithms for the binary matrix factorization (BMF) problem.
The goal is to approximate $mathbfA$ as a product of low-rank factors.
Our techniques generalize to other common variants of the BMF problem.
arXiv Detail & Related papers (2023-06-02T18:55:27Z) - An Over-parameterized Exponential Regression [18.57735939471469]
Recent developments in the field of Large Language Models (LLMs) have sparked interest in the use of exponential activation functions.
We define the neural function $F: mathbbRd times m times mathbbRd times mathbbRd times mathbbRd times mathbbRd times mathbbRd times mathbbRd times mathbbRd
arXiv Detail & Related papers (2023-03-29T07:29:07Z) - Statistically Optimal Robust Mean and Covariance Estimation for
Anisotropic Gaussians [3.5788754401889014]
In the strong $varepsilon$-contamination model we assume that the adversary replaced an $varepsilon$ fraction of vectors in the original Gaussian sample by any other vectors.
We construct an estimator $widehat Sigma of the cofrac matrix $Sigma that satisfies, with probability at least $1 - delta.
arXiv Detail & Related papers (2023-01-21T23:28:55Z) - Online Learning of Smooth Functions [0.35534933448684125]
We study the online learning of real-valued functions where the hidden function is known to have certain smoothness properties.
We find new bounds for $textopt_p(mathcal F_q)$ that are sharp up to a constant factor.
In the multi-variable setup, we establish inequalities relating $textopt_p(mathcal F_q,d)$ to $textopt_p(mathcal F_q,d)$ and show that $textopt_p(mathcal F
arXiv Detail & Related papers (2023-01-04T04:05:58Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - On Outer Bi-Lipschitz Extensions of Linear Johnson-Lindenstrauss
Embeddings of Low-Dimensional Submanifolds of $\mathbb{R}^N$ [0.24366811507669117]
Let $mathcalM$ be a compact $d$-dimensional submanifold of $mathbbRN$ with reach $tau$ and volume $V_mathcal M$.
We prove that a nonlinear function $f: mathbbRN rightarrow mathbbRmm exists with $m leq C left(d / epsilon2right) log left(fracsqrt[d]V_math
arXiv Detail & Related papers (2022-06-07T15:10:46Z) - Random matrices in service of ML footprint: ternary random features with
no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$.
We propose a novel random technique, called Ternary Random Feature (TRF)
The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z) - Optimal Mean Estimation without a Variance [103.26777953032537]
We study the problem of heavy-tailed mean estimation in settings where the variance of the data-generating distribution does not exist.
We design an estimator which attains the smallest possible confidence interval as a function of $n,d,delta$.
arXiv Detail & Related papers (2020-11-24T22:39:21Z) - Efficient Statistics for Sparse Graphical Models from Truncated Samples [19.205541380535397]
We focus on two fundamental and classical problems: (i) inference of sparse Gaussian graphical models and (ii) support recovery of sparse linear models.
For sparse linear regression, suppose samples $(bf x,y)$ are generated where $y = bf xtopOmega* + mathcalN(0,1)$ and $(bf x, y)$ is seen only if $y$ belongs to a truncation set $S subseteq mathbbRd$.
arXiv Detail & Related papers (2020-06-17T09:21:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.