Expressivity of expand-and-sparsify representations
- URL: http://arxiv.org/abs/2006.03741v1
- Date: Fri, 5 Jun 2020 23:36:59 GMT
- Title: Expressivity of expand-and-sparsify representations
- Authors: Sanjoy Dasgupta and Christopher Tosh
- Abstract summary: A simple sparse coding mechanism appears in the sensory systems of several organisms.
We show that $z$ unpacks the information in $x$ and makes it more readily accessible.
We consider whether the representation is adaptive to manifold structure in the input space.
- Score: 15.016047591601094
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A simple sparse coding mechanism appears in the sensory systems of several
organisms: to a coarse approximation, an input $x \in \R^d$ is mapped to much
higher dimension $m \gg d$ by a random linear transformation, and is then
sparsified by a winner-take-all process in which only the positions of the top
$k$ values are retained, yielding a $k$-sparse vector $z \in \{0,1\}^m$. We
study the benefits of this representation for subsequent learning.
We first show a universal approximation property, that arbitrary continuous
functions of $x$ are well approximated by linear functions of $z$, provided $m$
is large enough. This can be interpreted as saying that $z$ unpacks the
information in $x$ and makes it more readily accessible. The linear functions
can be specified explicitly and are easy to learn, and we give bounds on how
large $m$ needs to be as a function of the input dimension $d$ and the
smoothness of the target function. Next, we consider whether the representation
is adaptive to manifold structure in the input space. This is highly dependent
on the specific method of sparsification: we show that adaptivity is not
obtained under the winner-take-all mechanism, but does hold under a slight
variant. Finally we consider mappings to the representation space that are
random but are attuned to the data distribution, and we give favorable
approximation bounds in this setting.
Related papers
- Misspecified $Q$-Learning with Sparse Linear Function Approximation: Tight Bounds on Approximation Error [25.777423855881878]
We show a novel elimination-based algorithm to show one can obtain an $Oleft(Hepsilonright)$-optimal policy.
We complement our upper bound with an $widetildeOmegaleft(Hepsilonright)$-optimality lower bound, giving a complete picture of this problem.
arXiv Detail & Related papers (2024-07-18T15:58:04Z) - Universality of max-margin classifiers [10.797131009370219]
We study the role of featurization maps and the high-dimensional universality of the misclassification error for non-Gaussian features.
In particular, the overparametrization threshold and generalization error can be computed within a simpler model.
arXiv Detail & Related papers (2023-09-29T22:45:56Z) - Approximate Function Evaluation via Multi-Armed Bandits [51.146684847667125]
We study the problem of estimating the value of a known smooth function $f$ at an unknown point $boldsymbolmu in mathbbRn$, where each component $mu_i$ can be sampled via a noisy oracle.
We design an instance-adaptive algorithm that learns to sample according to the importance of each coordinate, and with probability at least $1-delta$ returns an $epsilon$ accurate estimate of $f(boldsymbolmu)$.
arXiv Detail & Related papers (2022-03-18T18:50:52Z) - TURF: A Two-factor, Universal, Robust, Fast Distribution Learning
Algorithm [64.13217062232874]
One of its most powerful and successful modalities approximates every distribution to an $ell$ distance essentially at most a constant times larger than its closest $t$-piece degree-$d_$.
We provide a method that estimates this number near-optimally, hence helps approach the best possible approximation.
arXiv Detail & Related papers (2022-02-15T03:49:28Z) - Random matrices in service of ML footprint: ternary random features with
no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$.
We propose a novel random technique, called Ternary Random Feature (TRF)
The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z) - A deep network construction that adapts to intrinsic dimensionality
beyond the domain [79.23797234241471]
We study the approximation of two-layer compositions $f(x) = g(phi(x))$ via deep networks with ReLU activation.
We focus on two intuitive and practically relevant choices for $phi$: the projection onto a low-dimensional embedded submanifold and a distance to a collection of low-dimensional sets.
arXiv Detail & Related papers (2020-08-06T09:50:29Z) - Truncated Linear Regression in High Dimensions [26.41623833920794]
In truncated linear regression, $(A_i, y_i)_i$ whose dependent variable equals $y_i= A_irm T cdot x* + eta_i$ is some fixed unknown vector of interest.
The goal is to recover $x*$ under some favorable conditions on the $A_i$'s and the noise distribution.
We prove that there exists a computationally and statistically efficient method for recovering $k$-sparse $n$-dimensional vectors $x*$ from $m$ truncated samples.
arXiv Detail & Related papers (2020-07-29T00:31:34Z) - Linear Time Sinkhorn Divergences using Positive Features [51.50788603386766]
Solving optimal transport with an entropic regularization requires computing a $ntimes n$ kernel matrix that is repeatedly applied to a vector.
We propose to use instead ground costs of the form $c(x,y)=-logdotpvarphi(x)varphi(y)$ where $varphi$ is a map from the ground space onto the positive orthant $RRr_+$, with $rll n$.
arXiv Detail & Related papers (2020-06-12T10:21:40Z) - On the Modularity of Hypernetworks [103.1147622394852]
We show that for a structured target function, the overall number of trainable parameters in a hypernetwork is smaller by orders of magnitude than the number of trainable parameters of a standard neural network and an embedding method.
arXiv Detail & Related papers (2020-02-23T22:51:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.