Uniform Approximations for Randomized Hadamard Transforms with
Applications
- URL: http://arxiv.org/abs/2203.01599v1
- Date: Thu, 3 Mar 2022 09:56:39 GMT
- Title: Uniform Approximations for Randomized Hadamard Transforms with
Applications
- Authors: Yeshwanth Cherapanamjeri, Jelani Nelson
- Abstract summary: Randomized Hadamard Transforms (RHTs) have emerged as a computationally efficient alternative to the use of dense unstructured random matrices.
We use our inequality to derive improved guarantees for two applications in the high-dimensional regime: kernel approximation and distance estimation.
- Score: 8.985261743452986
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Randomized Hadamard Transforms (RHTs) have emerged as a computationally
efficient alternative to the use of dense unstructured random matrices across a
range of domains in computer science and machine learning. For several
applications such as dimensionality reduction and compressed sensing, the
theoretical guarantees for methods based on RHTs are comparable to approaches
using dense random matrices with i.i.d.\ entries. However, several such
applications are in the low-dimensional regime where the number of rows sampled
from the matrix is rather small. Prior arguments are not applicable to the
high-dimensional regime often found in machine learning applications like
kernel approximation. Given an ensemble of RHTs with Gaussian diagonals,
$\{M^i\}_{i = 1}^m$, and any $1$-Lipschitz function, $f: \mathbb{R} \to
\mathbb{R}$, we prove that the average of $f$ over the entries of $\{M^i v\}_{i
= 1}^m$ converges to its expectation uniformly over $\| v \| \leq 1$ at a rate
comparable to that obtained from using truly Gaussian matrices. We use our
inequality to then derive improved guarantees for two applications in the
high-dimensional regime: 1) kernel approximation and 2) distance estimation.
For kernel approximation, we prove the first \emph{uniform} approximation
guarantees for random features constructed through RHTs lending theoretical
justification to their empirical success while for distance estimation, our
convergence result implies data structures with improved runtime guarantees
over previous work by the authors. We believe our general inequality is likely
to find use in other applications.
Related papers
- SoS Certifiability of Subgaussian Distributions and its Algorithmic Applications [37.208622097149714]
We prove that there is a universal constant $C>0$ so that for every $d inmathbb N$, every centered subgaussian distribution $mathcal D$ on $mathbb Rd$, and every even $p inmathbb N$, the $d-optimal inmathbb N$, and the $d-optimal inmathbb N$.
This establishes that every subgaussian distribution is emphS-certifiably subgaussian -- a condition that yields efficient learning algorithms for a wide variety of
arXiv Detail & Related papers (2024-10-28T16:36:58Z) - Universality of kernel random matrices and kernel regression in the quadratic regime [18.51014786894174]
In this work, we extend the study of kernel kernel regression to the quadratic regime.
We establish an operator norm approximation bound for the difference between the original kernel random matrix and a quadratic kernel random matrix.
We characterize the precise training and generalization errors for KRR in the quadratic regime when $n/d2$ converges to a nonzero constant.
arXiv Detail & Related papers (2024-08-02T07:29:49Z) - Stochastic Optimization for Non-convex Problem with Inexact Hessian
Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties.
We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z) - Low-complexity subspace-descent over symmetric positive definite
manifold [9.346050098365648]
We develop low-complexity algorithms for the minimization of functions over the symmetric positive definite (SPD) manifold.
The proposed approach utilizes carefully chosen subspaces that allow the update to be written as a product of the Cholesky factor of the iterate and a sparse matrix.
arXiv Detail & Related papers (2023-05-03T11:11:46Z) - Replicable Clustering [57.19013971737493]
We propose algorithms for the statistical $k$-medians, statistical $k$-means, and statistical $k$-centers problems by utilizing approximation routines for their counterparts in a black-box manner.
We also provide experiments on synthetic distributions in 2D using the $k$-means++ implementation from sklearn as a black-box that validate our theoretical results.
arXiv Detail & Related papers (2023-02-20T23:29:43Z) - Optimal Randomized Approximations for Matrix based Renyi's Entropy [16.651155375441796]
We develop random approximations for integer order $alpha$ cases and series approximations for non-integer $alpha$ cases.
Large-scale simulations and real-world applications validate the effectiveness of the developed approximations.
arXiv Detail & Related papers (2022-05-16T02:24:52Z) - Fast Differentiable Matrix Square Root and Inverse Square Root [65.67315418971688]
We propose two more efficient variants to compute the differentiable matrix square root and the inverse square root.
For the forward propagation, one method is to use Matrix Taylor Polynomial (MTP), and the other method is to use Matrix Pad'e Approximants (MPA)
A series of numerical tests show that both methods yield considerable speed-up compared with the SVD or the NS iteration.
arXiv Detail & Related papers (2022-01-29T10:00:35Z) - Random matrices in service of ML footprint: ternary random features with
no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$.
We propose a novel random technique, called Ternary Random Feature (TRF)
The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z) - Fourier Sparse Leverage Scores and Approximate Kernel Learning [29.104055676527913]
We prove new explicit upper bounds on the leverage scores of Fourier functions under both the Gaussian and Laplace measures.
Our bounds are motivated by two important applications in machine learning.
arXiv Detail & Related papers (2020-06-12T17:25:39Z) - Linear Time Sinkhorn Divergences using Positive Features [51.50788603386766]
Solving optimal transport with an entropic regularization requires computing a $ntimes n$ kernel matrix that is repeatedly applied to a vector.
We propose to use instead ground costs of the form $c(x,y)=-logdotpvarphi(x)varphi(y)$ where $varphi$ is a map from the ground space onto the positive orthant $RRr_+$, with $rll n$.
arXiv Detail & Related papers (2020-06-12T10:21:40Z) - Optimal Iterative Sketching with the Subsampled Randomized Hadamard
Transform [64.90148466525754]
We study the performance of iterative sketching for least-squares problems.
We show that the convergence rate for Haar and randomized Hadamard matrices are identical, andally improve upon random projections.
These techniques may be applied to other algorithms that employ randomized dimension reduction.
arXiv Detail & Related papers (2020-02-03T16:17:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.