Related papers: Leverage Score Sampling for Tensor Product Matrices in Input Sparsity Time

Leverage Score Sampling for Tensor Product Matrices in Input Sparsity Time

URL: http://arxiv.org/abs/2202.04515v1
Date: Wed, 9 Feb 2022 15:26:03 GMT
Title: Leverage Score Sampling for Tensor Product Matrices in Input Sparsity Time
Authors: David P. Woodruff, Amir Zandieh
Abstract summary: We give an input sparsity time sampling algorithm for approximating the Gram matrix corresponding to the $q$-fold column-wise tensor product of $q$ matrices. Our sampling technique relies on a collection of $q$ partially correlated random projections which can be simultaneously applied to a dataset $X$ in total time.
Score: 54.65688986250061
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We give an input sparsity time sampling algorithm for spectrally approximating the Gram matrix corresponding to the $q$-fold column-wise tensor product of $q$ matrices using a nearly optimal number of samples, improving upon all previously known methods by poly$(q)$ factors. Furthermore, for the important special care of the $q$-fold self-tensoring of a dataset, which is the feature matrix of the degree-$q$ polynomial kernel, the leading term of our method's runtime is proportional to the size of the dataset and has no dependence on $q$. Previous techniques either incur a poly$(q)$ factor slowdown in their runtime or remove the dependence on $q$ at the expense of having sub-optimal target dimension, and depend quadratically on the number of data-points in their runtime. Our sampling technique relies on a collection of $q$ partially correlated random projections which can be simultaneously applied to a dataset $X$ in total time that only depends on the size of $X$, and at the same time their $q$-fold Kronecker product acts as a near-isometry for any fixed vector in the column span of $X^{\otimes q}$. We show that our sampling methods generalize to other classes of kernels beyond polynomial, such as Gaussian and Neural Tangent kernels.

Related papers

SoS Certificates for Sparse Singular Values and Their Applications: Robust Statistics, Subspace Distortion, and More [37.208622097149714]
We give a new family of upper-time algorithms which can certify bounds on the maximum of $|M u|$. Our certification algorithm makes essential use of the Sum-of-Squares hierarchy.
arXiv Detail & Related papers (2024-12-30T18:59:46Z)
Data-Driven Linear Complexity Low-Rank Approximation of General Kernel Matrices: A Geometric Approach [0.9453554184019107]
A kernel matrix may be defined as $K_ij = kappa(x_i,y_j)$ where $kappa(x,y)$ is a kernel function. We seek a low-rank approximation to a kernel matrix where the sets of points $X$ and $Y$ are large.
arXiv Detail & Related papers (2022-12-24T07:15:00Z)
An Equivalence Principle for the Spectrum of Random Inner-Product Kernel Matrices with Polynomial Scalings [21.727073594338297]
This study is motivated by applications in machine learning and statistics. We establish the weak limit of the empirical distribution of these random matrices in a scaling regime. Our results can be characterized as the free additive convolution between a Marchenko-Pastur law and a semicircle law.
arXiv Detail & Related papers (2022-05-12T18:50:21Z)
Random matrices in service of ML footprint: ternary random features with no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$. We propose a novel random technique, called Ternary Random Feature (TRF) The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z)
Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean Estimation [58.24280149662003]
We study the problem of list-decodable mean estimation, where an adversary can corrupt a majority of the dataset. We develop new algorithms for list-decodable mean estimation, achieving nearly-optimal statistical guarantees.
arXiv Detail & Related papers (2021-06-16T03:34:14Z)
Two-way kernel matrix puncturing: towards resource-efficient PCA and spectral clustering [43.50783459690612]
The method consists in randomly "puncturing" both the data matrix $XinmathbbCptimes n$ and its corresponding kernel (Gram) matrix $K$ through Bernoulli masks. We empirically confirm on GAN-generated image databases, that it is possible to drastically puncture the data, thereby providing possibly huge computational and storage gains.
arXiv Detail & Related papers (2021-02-24T14:01:58Z)
Linear-Sample Learning of Low-Rank Distributions [56.59844655107251]
We show that learning $ktimes k$, rank-$r$, matrices to normalized $L_1$ distance requires $Omega(frackrepsilon2)$ samples. We propose an algorithm that uses $cal O(frackrepsilon2log2fracepsilon)$ samples, a number linear in the high dimension, and nearly linear in the matrices, typically low, rank proofs.
arXiv Detail & Related papers (2020-09-30T19:10:32Z)
Sampling from a $k$-DPP without looking at all items [58.30573872035083]
Given a kernel function and a subset size $k$, our goal is to sample $k$ out of $n$ items with probability proportional to the determinant of the kernel matrix induced by the subset (a.k.a. $k$-DPP) Existing $k$-DPP sampling algorithms require an expensive preprocessing step which involves multiple passes over all $n$ items, making it infeasible for large datasets. We develop an algorithm which adaptively builds a sufficiently large uniform sample of data that is then used to efficiently generate a smaller set of $k$ items.
arXiv Detail & Related papers (2020-06-30T16:40:44Z)
Linear Time Sinkhorn Divergences using Positive Features [51.50788603386766]
Solving optimal transport with an entropic regularization requires computing a $ntimes n$ kernel matrix that is repeatedly applied to a vector. We propose to use instead ground costs of the form $c(x,y)=-logdotpvarphi(x)varphi(y)$ where $varphi$ is a map from the ground space onto the positive orthant $RRr_+$, with $rll n$.
arXiv Detail & Related papers (2020-06-12T10:21:40Z)
Robust Sub-Gaussian Principal Component Analysis and Width-Independent Schatten Packing [22.337756118270217]
We develop two methods for a fundamental statistical task: given an $epsilon$-corrupted set of $n$ samples from a $d$-linear sub-Gaussian distribution. Our first robust algorithm runs iterative filtering in time, returns an approximate eigenvector, and is based on a simple filtering approach. Our second, which attains a slightly worse approximation factor, runs in nearly-trivial time and iterations under a mild spectral gap assumption.
arXiv Detail & Related papers (2020-06-12T07:45:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.