Effective Minkowski Dimension of Deep Nonparametric Regression: Function
Approximation and Statistical Theories
- URL: http://arxiv.org/abs/2306.14859v1
- Date: Mon, 26 Jun 2023 17:13:31 GMT
- Title: Effective Minkowski Dimension of Deep Nonparametric Regression: Function
Approximation and Statistical Theories
- Authors: Zixuan Zhang, Minshuo Chen, Mengdi Wang, Wenjing Liao, Tuo Zhao
- Abstract summary: Existing theories on deep nonparametric regression have shown that when the input data lie on a low-dimensional manifold, deep neural networks can adapt to intrinsic data structures.
This paper introduces a relaxed assumption that input data are concentrated around a subset of $mathbbRd$ denoted by $mathcalS$, and the intrinsic dimension $mathcalS$ can be characterized by a new complexity notation -- effective Minkowski dimension.
- Score: 70.90012822736988
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing theories on deep nonparametric regression have shown that when the
input data lie on a low-dimensional manifold, deep neural networks can adapt to
the intrinsic data structures. In real world applications, such an assumption
of data lying exactly on a low dimensional manifold is stringent. This paper
introduces a relaxed assumption that the input data are concentrated around a
subset of $\mathbb{R}^d$ denoted by $\mathcal{S}$, and the intrinsic dimension
of $\mathcal{S}$ can be characterized by a new complexity notation -- effective
Minkowski dimension. We prove that, the sample complexity of deep nonparametric
regression only depends on the effective Minkowski dimension of $\mathcal{S}$
denoted by $p$. We further illustrate our theoretical findings by considering
nonparametric regression with an anisotropic Gaussian random design
$N(0,\Sigma)$, where $\Sigma$ is full rank. When the eigenvalues of $\Sigma$
have an exponential or polynomial decay, the effective Minkowski dimension of
such an Gaussian random design is $p=\mathcal{O}(\sqrt{\log n})$ or
$p=\mathcal{O}(n^\gamma)$, respectively, where $n$ is the sample size and
$\gamma\in(0,1)$ is a small constant depending on the polynomial decay rate.
Our theory shows that, when the manifold assumption does not hold, deep neural
networks can still adapt to the effective Minkowski dimension of the data, and
circumvent the curse of the ambient dimensionality for moderate sample sizes.
Related papers
- Dimension-free Private Mean Estimation for Anisotropic Distributions [55.86374912608193]
Previous private estimators on distributions over $mathRd suffer from a curse of dimensionality.
We present an algorithm whose sample complexity has improved dependence on dimension.
arXiv Detail & Related papers (2024-11-01T17:59:53Z) - Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
We study the problem of gradient descent learning of a single-index target function $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$ under isotropic Gaussian data.
We prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_*$ of arbitrary link function with a sample and runtime complexity of $n asymp T asymp C(q) cdot d
arXiv Detail & Related papers (2024-06-03T17:56:58Z) - Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - Sample Complexity of Neural Policy Mirror Descent for Policy
Optimization on Low-Dimensional Manifolds [75.51968172401394]
We study the sample complexity of the neural policy mirror descent (NPMD) algorithm with deep convolutional neural networks (CNN)
In each iteration of NPMD, both the value function and the policy can be well approximated by CNNs.
We show that NPMD can leverage the low-dimensional structure of state space to escape from the curse of dimensionality.
arXiv Detail & Related papers (2023-09-25T07:31:22Z) - High-Dimensional Smoothed Entropy Estimation via Dimensionality
Reduction [14.53979700025531]
We consider the estimation of the differential entropy $h(X+Z)$ via $n$ independently and identically distributed samples of $X$.
Under the absolute-error loss, the above problem has a parametric estimation rate of $fraccDsqrtn$.
We overcome this exponential sample complexity by projecting $X$ to a low-dimensional space via principal component analysis (PCA) before the entropy estimation.
arXiv Detail & Related papers (2023-05-08T13:51:48Z) - Neural Implicit Manifold Learning for Topology-Aware Density Estimation [15.878635603835063]
Current generative models learn $mathcalM$ by mapping an $m$-dimensional latent variable through a neural network.
We show that our model can learn manifold-supported distributions with complex topologies more accurately than pushforward models.
arXiv Detail & Related papers (2022-06-22T18:00:00Z) - Besov Function Approximation and Binary Classification on
Low-Dimensional Manifolds Using Convolutional Residual Networks [42.43493635899849]
We establish theoretical guarantees of convolutional residual networks (ConvResNet) in terms of function approximation and statistical estimation for binary classification.
Our results demonstrate that ConvResNets are adaptive to low-dimensional structures of data sets.
arXiv Detail & Related papers (2021-09-07T02:58:11Z) - Fundamental tradeoffs between memorization and robustness in random
features and neural tangent regimes [15.76663241036412]
We prove for a large class of activation functions that, if the model memorizes even a fraction of the training, then its Sobolev-seminorm is lower-bounded.
Experiments reveal for the first time, (iv) a multiple-descent phenomenon in the robustness of the min-norm interpolator.
arXiv Detail & Related papers (2021-06-04T17:52:50Z) - Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.