When big data actually are low-rank, or entrywise approximation of certain function-generated matrices
- URL: http://arxiv.org/abs/2407.03250v3
- Date: Fri, 6 Sep 2024 11:56:16 GMT
- Title: When big data actually are low-rank, or entrywise approximation of certain function-generated matrices
- Authors: Stanislav Budzinskiy,
- Abstract summary: We refute an argument made in the literature to prove that, for a specific class of analytic functions, such matrices admit accurate entrywise approximation of rank that is independent of $m$.
We describe three narrower classes of functions for which $n times n$ function-generated matrices can be approximated within an entrywise error of order $varepsilon$ with rank $mathcalO(log(n) varepsilon-2 mathrmpolylog(varepsilon-1)$ that is independent of the dimension $m$
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The article concerns low-rank approximation of matrices generated by sampling a smooth function of two $m$-dimensional variables. We refute an argument made in the literature to prove that, for a specific class of analytic functions, such matrices admit accurate entrywise approximation of rank that is independent of $m$ -- a claim known as "big-data matrices are approximately low-rank". We provide a theoretical explanation of the numerical results presented in support of this claim, describing three narrower classes of functions for which $n \times n$ function-generated matrices can be approximated within an entrywise error of order $\varepsilon$ with rank $\mathcal{O}(\log(n) \varepsilon^{-2} \mathrm{polylog}(\varepsilon^{-1}))$ that is independent of the dimension $m$: (i) functions of the inner product of the two variables, (ii) functions of the Euclidean distance between the variables, and (iii) shift-invariant positive-definite kernels. We extend our argument to tensor-train approximation of tensors generated with functions of the multi-linear product of their $m$-dimensional variables. We discuss our results in the context of low-rank approximation of (a) growing datasets and (b) attention in transformer neural networks.
Related papers
- An approximation of the $S$ matrix for solving the Marchenko equation [0.0]
I present a new approximation of the $S$-matrix dependence on momentum $q$, formulated as a sum of a rational function and a truncated Sinc series.
This approach enables pointwise determination of the $S$ matrix with specified resolution, capturing essential features such as resonance behavior with high accuracy.
arXiv Detail & Related papers (2024-10-27T11:06:28Z) - Understanding Matrix Function Normalizations in Covariance Pooling through the Lens of Riemannian Geometry [63.694184882697435]
Global Covariance Pooling (GCP) has been demonstrated to improve the performance of Deep Neural Networks (DNNs) by exploiting second-order statistics of high-level representations.
arXiv Detail & Related papers (2024-07-15T07:11:44Z) - Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
We study the problem of gradient descent learning of a single-index target function $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$ under isotropic Gaussian data.
We prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_*$ of arbitrary link function with a sample and runtime complexity of $n asymp T asymp C(q) cdot d
arXiv Detail & Related papers (2024-06-03T17:56:58Z) - An Equivalence Principle for the Spectrum of Random Inner-Product Kernel
Matrices with Polynomial Scalings [21.727073594338297]
This study is motivated by applications in machine learning and statistics.
We establish the weak limit of the empirical distribution of these random matrices in a scaling regime.
Our results can be characterized as the free additive convolution between a Marchenko-Pastur law and a semicircle law.
arXiv Detail & Related papers (2022-05-12T18:50:21Z) - Perturbation Analysis of Randomized SVD and its Applications to
High-dimensional Statistics [8.90202564665576]
We study the statistical properties of RSVD under a general "signal-plus-noise" framework.
We derive nearly-optimal performance guarantees for RSVD when applied to three statistical inference problems.
arXiv Detail & Related papers (2022-03-19T07:26:45Z) - A Law of Robustness beyond Isoperimetry [84.33752026418045]
We prove a Lipschitzness lower bound $Omega(sqrtn/p)$ of robustness of interpolating neural network parameters on arbitrary distributions.
We then show the potential benefit of overparametrization for smooth data when $n=mathrmpoly(d)$.
We disprove the potential existence of an $O(1)$-Lipschitz robust interpolating function when $n=exp(omega(d))$.
arXiv Detail & Related papers (2022-02-23T16:10:23Z) - Computationally Efficient Approximations for Matrix-based Renyi's
Entropy [33.72108955447222]
Recently developed matrix based Renyi's entropy enables measurement of information in data.
computation of such quantity involves the trace operator on a PSD matrix $G$ to power $alpha$(i.e., $tr(Galpha)$.
We present computationally efficient approximations to this new entropy functional that can reduce its complexity to even significantly less than $O(n2)$.
arXiv Detail & Related papers (2021-12-27T14:59:52Z) - High-Dimensional Gaussian Process Inference with Derivatives [90.8033626920884]
We show that in the low-data regime $ND$, the Gram matrix can be decomposed in a manner that reduces the cost of inference to $mathcalO(N2D + (N2)3)$.
We demonstrate this potential in a variety of tasks relevant for machine learning, such as optimization and Hamiltonian Monte Carlo with predictive gradients.
arXiv Detail & Related papers (2021-02-15T13:24:41Z) - Linear Time Sinkhorn Divergences using Positive Features [51.50788603386766]
Solving optimal transport with an entropic regularization requires computing a $ntimes n$ kernel matrix that is repeatedly applied to a vector.
We propose to use instead ground costs of the form $c(x,y)=-logdotpvarphi(x)varphi(y)$ where $varphi$ is a map from the ground space onto the positive orthant $RRr_+$, with $rll n$.
arXiv Detail & Related papers (2020-06-12T10:21:40Z) - A Random Matrix Analysis of Random Fourier Features: Beyond the Gaussian
Kernel, a Precise Phase Transition, and the Corresponding Double Descent [85.77233010209368]
This article characterizes the exacts of random Fourier feature (RFF) regression, in the realistic setting where the number of data samples $n$ is all large and comparable.
This analysis also provides accurate estimates of training and test regression errors for large $n,p,N$.
arXiv Detail & Related papers (2020-06-09T02:05:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.