Subspace Embeddings Under Nonlinear Transformations
- URL: http://arxiv.org/abs/2010.02264v1
- Date: Mon, 5 Oct 2020 18:18:04 GMT
- Title: Subspace Embeddings Under Nonlinear Transformations
- Authors: Aarshvi Gajjar, Cameron Musco
- Abstract summary: We consider embeddings that preserve the norm of all vectors in a space $S = y: y = f(x)text for x in Z$, where $Z$ is a $k$-dimensional subspace of $mathbbRn$ and $f(x)$ is a nonlinear activation function applied entrywise to $x$.
In particular, we give additive $epsilon$ error embeddings into $O(fracklog (n/epsilon)epsilon2)$ dimensions for
- Score: 19.28531602450729
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider low-distortion embeddings for subspaces under \emph{entrywise
nonlinear transformations}. In particular we seek embeddings that preserve the
norm of all vectors in a space $S = \{y: y = f(x)\text{ for }x \in Z\}$, where
$Z$ is a $k$-dimensional subspace of $\mathbb{R}^n$ and $f(x)$ is a nonlinear
activation function applied entrywise to $x$. When $f$ is the identity, and so
$S$ is just a $k$-dimensional subspace, it is known that, with high
probability, a random embedding into $O(k/\epsilon^2)$ dimensions preserves the
norm of all $y \in S$ up to $(1\pm \epsilon)$ relative error. Such embeddings
are known as \emph{subspace embeddings}, and have found widespread use in
compressed sensing and approximation algorithms. We give the first
low-distortion embeddings for a wide class of nonlinear functions $f$. In
particular, we give additive $\epsilon$ error embeddings into $O(\frac{k\log
(n/\epsilon)}{\epsilon^2})$ dimensions for a class of nonlinearities that
includes the popular Sigmoid SoftPlus, and Gaussian functions. We strengthen
this result to give relative error embeddings under some further restrictions,
which are satisfied e.g., by the Tanh, SoftSign, Exponential Linear Unit, and
many other `soft' step functions and rectifying units. Understanding embeddings
for subspaces under nonlinear transformations is a key step towards extending
random sketching and compressing sensing techniques for linear problems to
nonlinear ones. We discuss example applications of our results to improved
bounds for compressed sensing via generative neural networks.
Related papers
- Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms [50.15964512954274]
We study the problem of residual error estimation for matrix and vector norms using a linear sketch.
We demonstrate that this gives a substantial advantage empirically, for roughly the same sketch size and accuracy as in previous work.
We also show an $Omega(k2/pn1-2/p)$ lower bound for the sparse recovery problem, which is tight up to a $mathrmpoly(log n)$ factor.
arXiv Detail & Related papers (2024-08-16T02:33:07Z) - Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models.
In this work, we initiate the study of provably learning a multi-head attention layer from random examples.
We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z) - Learned Nonlinear Predictor for Critically Sampled 3D Point Cloud
Attribute Compression [24.001318485207207]
We study 3D point cloud compression via a decoder approach.
In this paper, we study predicting $f_l*$ at level $l+1$ given $f_l*$ $l$ and encoding of $G_l*$ for the $p=1$ case.
arXiv Detail & Related papers (2023-11-22T17:26:54Z) - Optimal Embedding Dimension for Sparse Subspace Embeddings [4.042707434058959]
A random $mtimes n$ matrix $S$ is an oblivious subspace embedding (OSE)
We show that an $mtimes n$ random matrix $S$ with $mgeq (1+theta)d$ is an oblivious subspace embedding with $epsilon = O_theta(1)$.
We use this to construct the first oblivious subspace embedding with $O(d)$ embedding dimension that can be applied faster than current matrix multiplication time.
arXiv Detail & Related papers (2023-11-17T18:01:58Z) - A Unified Framework for Uniform Signal Recovery in Nonlinear Generative
Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously.
Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples.
We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z) - Optimal Sketching Bounds for Sparse Linear Regression [116.30196615349226]
We study oblivious sketching for $k$-sparse linear regression under various loss functions such as an $ell_p$ norm, or from a broad class of hinge-like loss functions.
We show that for sparse $ell$vareps regression, there is an oblivious distribution over sketches with $Theta(klog(d/k)/varepsilon2)$ rows, which is tight up to a constant factor.
We also show that sketching $O(mu2 klog(mu n d/varepsilon)/var
arXiv Detail & Related papers (2023-04-05T07:24:19Z) - Learning linear dynamical systems under convex constraints [4.4351901934764975]
We consider the problem of identification of linear dynamical systems from $T$ samples of a single trajectory.
$A*$ can be reliably estimated for values $T$ smaller than what is needed for unconstrained setting.
arXiv Detail & Related papers (2023-03-27T11:49:40Z) - Neural Networks Efficiently Learn Low-Dimensional Representations with
SGD [22.703825902761405]
We show that SGD-trained ReLU NNs can learn a single-index target of the form $y=f(langleboldsymbolu,boldsymbolxrangle) + epsilon$ by recovering the principal direction.
We also provide compress guarantees for NNs using the approximate low-rank structure produced by SGD.
arXiv Detail & Related papers (2022-09-29T15:29:10Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - On Fast Johnson-Lindernstrauss Embeddings of Compact Submanifolds of
$\mathbb{R}^N$ with Boundary [0.4125187280299246]
We consider the probability that a random matrix $A in mathbbRm times N$ will serve as a bi-Lipschitz function $A: mathcalM rightarrow mathbbRm$ with bi-Lipschitz constants close to one.
We present a new class of highly structured distributions for embedding sufficiently low-dimensional submanifolds of $mathbbRN$.
arXiv Detail & Related papers (2021-10-08T15:27:52Z) - Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.