Eluder Dimension and Generalized Rank
- URL: http://arxiv.org/abs/2104.06970v1
- Date: Wed, 14 Apr 2021 16:53:13 GMT
- Title: Eluder Dimension and Generalized Rank
- Authors: Gene Li, Pritish Kamath, Dylan J. Foster, Nathan Srebro
- Abstract summary: We show that eluder dimension can be exponentially larger than $sigma$-rank.
When $sigma$ is the $mathrmrelu$ activation, we show that eluder dimension can be exponentially larger than $sigma$-rank.
- Score: 48.27338656415236
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the relationship between the eluder dimension for a function class
and a generalized notion of rank, defined for any monotone "activation" $\sigma
: \mathbb{R} \to \mathbb{R}$, which corresponds to the minimal dimension
required to represent the class as a generalized linear model. When $\sigma$
has derivatives bounded away from $0$, it is known that $\sigma$-rank gives
rise to an upper bound on eluder dimension for any function class; we show
however that eluder dimension can be exponentially smaller than $\sigma$-rank.
We also show that the condition on the derivative is necessary; namely, when
$\sigma$ is the $\mathrm{relu}$ activation, we show that eluder dimension can
be exponentially larger than $\sigma$-rank.
Related papers
- Renormalization group for Anderson localization on high-dimensional lattices [0.0]
We show how in the delocalized region, including the transition point, the $beta$-function for the fractal dimension $D_1$ evolves smoothly.
We put forward a conjecture about a lower bound for the fractal dimension.
arXiv Detail & Related papers (2024-03-04T12:16:35Z) - Dimension-free Remez Inequalities and norm designs [48.5897526636987]
A class of domains $X$ and test sets $Y$ -- termed emphnorm -- enjoy dimension-free Remez-type estimates.
We show that the supremum of $f$ does not increase by more than $mathcalO(log K)2d$ when $f$ is extended to the polytorus.
arXiv Detail & Related papers (2023-10-11T22:46:09Z) - Universality of max-margin classifiers [10.797131009370219]
We study the role of featurization maps and the high-dimensional universality of the misclassification error for non-Gaussian features.
In particular, the overparametrization threshold and generalization error can be computed within a simpler model.
arXiv Detail & Related papers (2023-09-29T22:45:56Z) - A Unified Framework for Uniform Signal Recovery in Nonlinear Generative
Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously.
Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples.
We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z) - Higher rank antipodality [0.0]
Motivated by general probability theory, we say that the set $X$ in $mathbbRd$ is emphantipodal of rank $k$.
For $k=1$, it coincides with the well-studied notion of (pairwise) antipodality introduced by Klee.
arXiv Detail & Related papers (2023-07-31T17:15:46Z) - Effective Minkowski Dimension of Deep Nonparametric Regression: Function
Approximation and Statistical Theories [70.90012822736988]
Existing theories on deep nonparametric regression have shown that when the input data lie on a low-dimensional manifold, deep neural networks can adapt to intrinsic data structures.
This paper introduces a relaxed assumption that input data are concentrated around a subset of $mathbbRd$ denoted by $mathcalS$, and the intrinsic dimension $mathcalS$ can be characterized by a new complexity notation -- effective Minkowski dimension.
arXiv Detail & Related papers (2023-06-26T17:13:31Z) - Bottleneck Structure in Learned Features: Low-Dimension vs Regularity Tradeoff [12.351756386062291]
We formalize a balance between learning low-dimensional representations and minimizing complexity/irregularity in the feature maps.
For large depths, almost all hidden representations are approximately $R(0)(f)$-dimensional, and almost all weight matrices $W_ell$ have $R(0)(f)$ singular values close to 1.
Interestingly, the use of large learning rates is required to guarantee an order $O(L)$ NTK which in turns guarantees infinite depth convergence of the representations of almost all layers.
arXiv Detail & Related papers (2023-05-30T13:06:26Z) - On minimal representations of shallow ReLU networks [0.0]
We show that the minimal representation for $f$ uses either $n$, $n+1$ or $n+2$ neurons.
In particular, where the input layer is one-dimensional, minimal representations always use at most $n+1$ neurons but in all higher dimensional settings there are functions for which $n+2$ neurons are needed.
arXiv Detail & Related papers (2021-08-12T10:22:24Z) - A deep network construction that adapts to intrinsic dimensionality
beyond the domain [79.23797234241471]
We study the approximation of two-layer compositions $f(x) = g(phi(x))$ via deep networks with ReLU activation.
We focus on two intuitive and practically relevant choices for $phi$: the projection onto a low-dimensional embedded submanifold and a distance to a collection of low-dimensional sets.
arXiv Detail & Related papers (2020-08-06T09:50:29Z) - Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.