Leverage Score Sampling for Tensor Product Matrices in Input Sparsity
Time
- URL: http://arxiv.org/abs/2202.04515v1
- Date: Wed, 9 Feb 2022 15:26:03 GMT
- Title: Leverage Score Sampling for Tensor Product Matrices in Input Sparsity
Time
- Authors: David P. Woodruff, Amir Zandieh
- Abstract summary: We give an input sparsity time sampling algorithm for approximating the Gram matrix corresponding to the $q$-fold column-wise tensor product of $q$ matrices.
Our sampling technique relies on a collection of $q$ partially correlated random projections which can be simultaneously applied to a dataset $X$ in total time.
- Score: 54.65688986250061
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We give an input sparsity time sampling algorithm for spectrally
approximating the Gram matrix corresponding to the $q$-fold column-wise tensor
product of $q$ matrices using a nearly optimal number of samples, improving
upon all previously known methods by poly$(q)$ factors. Furthermore, for the
important special care of the $q$-fold self-tensoring of a dataset, which is
the feature matrix of the degree-$q$ polynomial kernel, the leading term of our
method's runtime is proportional to the size of the dataset and has no
dependence on $q$. Previous techniques either incur a poly$(q)$ factor slowdown
in their runtime or remove the dependence on $q$ at the expense of having
sub-optimal target dimension, and depend quadratically on the number of
data-points in their runtime. Our sampling technique relies on a collection of
$q$ partially correlated random projections which can be simultaneously applied
to a dataset $X$ in total time that only depends on the size of $X$, and at the
same time their $q$-fold Kronecker product acts as a near-isometry for any
fixed vector in the column span of $X^{\otimes q}$. We show that our sampling
methods generalize to other classes of kernels beyond polynomial, such as
Gaussian and Neural Tangent kernels.
Related papers
- Data-Driven Linear Complexity Low-Rank Approximation of General Kernel
Matrices: A Geometric Approach [0.9453554184019107]
A kernel matrix may be defined as $K_ij = kappa(x_i,y_j)$ where $kappa(x,y)$ is a kernel function.
We seek a low-rank approximation to a kernel matrix where the sets of points $X$ and $Y$ are large.
arXiv Detail & Related papers (2022-12-24T07:15:00Z) - An Equivalence Principle for the Spectrum of Random Inner-Product Kernel
Matrices with Polynomial Scalings [21.727073594338297]
This study is motivated by applications in machine learning and statistics.
We establish the weak limit of the empirical distribution of these random matrices in a scaling regime.
Our results can be characterized as the free additive convolution between a Marchenko-Pastur law and a semicircle law.
arXiv Detail & Related papers (2022-05-12T18:50:21Z) - Random matrices in service of ML footprint: ternary random features with
no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$.
We propose a novel random technique, called Ternary Random Feature (TRF)
The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z) - Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean
Estimation [58.24280149662003]
We study the problem of list-decodable mean estimation, where an adversary can corrupt a majority of the dataset.
We develop new algorithms for list-decodable mean estimation, achieving nearly-optimal statistical guarantees.
arXiv Detail & Related papers (2021-06-16T03:34:14Z) - Two-way kernel matrix puncturing: towards resource-efficient PCA and
spectral clustering [43.50783459690612]
The method consists in randomly "puncturing" both the data matrix $XinmathbbCptimes n$ and its corresponding kernel (Gram) matrix $K$ through Bernoulli masks.
We empirically confirm on GAN-generated image databases, that it is possible to drastically puncture the data, thereby providing possibly huge computational and storage gains.
arXiv Detail & Related papers (2021-02-24T14:01:58Z) - Linear-Sample Learning of Low-Rank Distributions [56.59844655107251]
We show that learning $ktimes k$, rank-$r$, matrices to normalized $L_1$ distance requires $Omega(frackrepsilon2)$ samples.
We propose an algorithm that uses $cal O(frackrepsilon2log2fracepsilon)$ samples, a number linear in the high dimension, and nearly linear in the matrices, typically low, rank proofs.
arXiv Detail & Related papers (2020-09-30T19:10:32Z) - Sampling from a $k$-DPP without looking at all items [58.30573872035083]
Given a kernel function and a subset size $k$, our goal is to sample $k$ out of $n$ items with probability proportional to the determinant of the kernel matrix induced by the subset (a.k.a. $k$-DPP)
Existing $k$-DPP sampling algorithms require an expensive preprocessing step which involves multiple passes over all $n$ items, making it infeasible for large datasets.
We develop an algorithm which adaptively builds a sufficiently large uniform sample of data that is then used to efficiently generate a smaller set of $k$ items.
arXiv Detail & Related papers (2020-06-30T16:40:44Z) - Linear Time Sinkhorn Divergences using Positive Features [51.50788603386766]
Solving optimal transport with an entropic regularization requires computing a $ntimes n$ kernel matrix that is repeatedly applied to a vector.
We propose to use instead ground costs of the form $c(x,y)=-logdotpvarphi(x)varphi(y)$ where $varphi$ is a map from the ground space onto the positive orthant $RRr_+$, with $rll n$.
arXiv Detail & Related papers (2020-06-12T10:21:40Z) - Robust Sub-Gaussian Principal Component Analysis and Width-Independent
Schatten Packing [22.337756118270217]
We develop two methods for a fundamental statistical task: given an $epsilon$-corrupted set of $n$ samples from a $d$-linear sub-Gaussian distribution.
Our first robust algorithm runs iterative filtering in time, returns an approximate eigenvector, and is based on a simple filtering approach.
Our second, which attains a slightly worse approximation factor, runs in nearly-trivial time and iterations under a mild spectral gap assumption.
arXiv Detail & Related papers (2020-06-12T07:45:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.