Related papers: When big data actually are low-rank, or entrywise approximation of certain function-generated matrices

When big data actually are low-rank, or entrywise approximation of certain function-generated matrices

URL: http://arxiv.org/abs/2407.03250v2
Date: Thu, 4 Jul 2024 10:56:45 GMT
Title: When big data actually are low-rank, or entrywise approximation of certain function-generated matrices
Authors: Stanislav Budzinskiy,
Abstract summary: We refute an argument made in the literature that, for a specific class of analytic functions, such matrices admit accurate entrywise approximation of rank that is independent of $m$. We describe three narrower classes of functions for which $n times n$ function-generated matrices can be approximated within an entrywise error of order $varepsilon$ with rank $mathcalO(log(n) varepsilon-2 mathrmpolylog(varepsilon-1)$ that is independent of the dimension $m$.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The article concerns low-rank approximation of matrices generated by sampling a smooth function of two $m$-dimensional variables. We refute an argument made in the literature that, for a specific class of analytic functions, such matrices admit accurate entrywise approximation of rank that is independent of $m$. We provide a theoretical explanation of the numerical results presented in support of this argument, describing three narrower classes of functions for which $n \times n$ function-generated matrices can be approximated within an entrywise error of order $\varepsilon$ with rank $\mathcal{O}(\log(n) \varepsilon^{-2} \mathrm{polylog}(\varepsilon^{-1}))$ that is independent of the dimension $m$: (i) functions of the inner product of the two variables, (ii) functions of the squared Euclidean distance between the variables, and (iii) shift-invariant positive-definite kernels. We extend our argument to low-rank tensor-train approximation of tensors generated with functions of the multi-linear product of their $m$-dimensional variables. We discuss our results in the context of low-rank approximation of attention in transformer neural networks.

Related papers

Private Low-Rank Approximation for Covariance Matrices, Dyson Brownian Motion, and Eigenvalue-Gap Bounds for Gaussian Perturbations [29.212403229351253]
We analyze a complex variant of the Gaussian mechanism and obtain upper bounds on the Frobenius norm of the difference between the matrix output by this mechanism and the best rank-$k$ approximation to $M$. We show that the eigenvalues of the matrix $M$ perturbed by Gaussian noise have large gaps with high probability.
arXiv Detail & Related papers (2025-02-11T15:46:03Z)
A Statistical Analysis for Supervised Deep Learning with Exponential Families for Intrinsically Low-dimensional Data [32.98264375121064]
We consider supervised deep learning when the given explanatory variable is distributed according to an exponential family.<n>Under the assumption of an upper-bounded density of the explanatory variables, we characterize the rate of convergence as $tildemathcalOleft( dfrac2lfloorbetarfloor(beta + d)2beta + dn-frac22beta + dn-frac22beta + dn-frac22beta + dn-
arXiv Detail & Related papers (2024-12-13T01:15:17Z)
An approximation of the $S$ matrix for solving the Marchenko equation [0.0]
I present a new approximation of the $S$-matrix dependence on momentum $q$, formulated as a sum of a rational function and a truncated Sinc series. This approach enables pointwise determination of the $S$ matrix with specified resolution, capturing essential features such as resonance behavior with high accuracy.
arXiv Detail & Related papers (2024-10-27T11:06:28Z)
Understanding Matrix Function Normalizations in Covariance Pooling through the Lens of Riemannian Geometry [63.694184882697435]
Global Covariance Pooling (GCP) has been demonstrated to improve the performance of Deep Neural Networks (DNNs) by exploiting second-order statistics of high-level representations.
arXiv Detail & Related papers (2024-07-15T07:11:44Z)
Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
We study the problem of gradient descent learning of a single-index target function $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$ under isotropic Gaussian data. We prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_*$ of arbitrary link function with a sample and runtime complexity of $n asymp T asymp C(q) cdot d
arXiv Detail & Related papers (2024-06-03T17:56:58Z)
Average-Case Complexity of Tensor Decomposition for Low-Degree Polynomials [93.59919600451487]
"Statistical-computational gaps" occur in many statistical inference tasks. We consider a model for random order-3 decomposition where one component is slightly larger in norm than the rest. We show that tensor entries can accurately estimate the largest component when $ll n3/2$ but fail to do so when $rgg n3/2$.
arXiv Detail & Related papers (2022-11-10T00:40:37Z)
Scalable First-Order Bayesian Optimization via Structured Automatic Differentiation [4.061135251278187]
We show that a wide range of kernels gives rise to structured matrices, enabling an exact $mathcalO(n2d)$ matrix-vector multiply for gradient observations and $mathcalO(n2d2)$ for Hessian observations. Our methods apply to virtually all canonical kernels and automatically extend to complex kernels, like the neural network, radial basis function network, and spectral mixture kernels.
arXiv Detail & Related papers (2022-06-16T17:59:48Z)
An Equivalence Principle for the Spectrum of Random Inner-Product Kernel Matrices with Polynomial Scalings [21.727073594338297]
This study is motivated by applications in machine learning and statistics. We establish the weak limit of the empirical distribution of these random matrices in a scaling regime. Our results can be characterized as the free additive convolution between a Marchenko-Pastur law and a semicircle law.
arXiv Detail & Related papers (2022-05-12T18:50:21Z)
Perturbation Analysis of Randomized SVD and its Applications to High-dimensional Statistics [8.90202564665576]
We study the statistical properties of RSVD under a general "signal-plus-noise" framework. We derive nearly-optimal performance guarantees for RSVD when applied to three statistical inference problems.
arXiv Detail & Related papers (2022-03-19T07:26:45Z)
A Law of Robustness beyond Isoperimetry [84.33752026418045]
We prove a Lipschitzness lower bound $Omega(sqrtn/p)$ of robustness of interpolating neural network parameters on arbitrary distributions. We then show the potential benefit of overparametrization for smooth data when $n=mathrmpoly(d)$. We disprove the potential existence of an $O(1)$-Lipschitz robust interpolating function when $n=exp(omega(d))$.
arXiv Detail & Related papers (2022-02-23T16:10:23Z)
Computationally Efficient Approximations for Matrix-based Renyi's Entropy [33.72108955447222]
Recently developed matrix based Renyi's entropy enables measurement of information in data. computation of such quantity involves the trace operator on a PSD matrix $G$ to power $alpha$(i.e., $tr(Galpha)$. We present computationally efficient approximations to this new entropy functional that can reduce its complexity to even significantly less than $O(n2)$.
arXiv Detail & Related papers (2021-12-27T14:59:52Z)
Random matrices in service of ML footprint: ternary random features with no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$. We propose a novel random technique, called Ternary Random Feature (TRF) The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z)
High-Dimensional Gaussian Process Inference with Derivatives [90.8033626920884]
We show that in the low-data regime $ND$, the Gram matrix can be decomposed in a manner that reduces the cost of inference to $mathcalO(N2D + (N2)3)$. We demonstrate this potential in a variety of tasks relevant for machine learning, such as optimization and Hamiltonian Monte Carlo with predictive gradients.
arXiv Detail & Related papers (2021-02-15T13:24:41Z)
Linear Time Sinkhorn Divergences using Positive Features [51.50788603386766]
Solving optimal transport with an entropic regularization requires computing a $ntimes n$ kernel matrix that is repeatedly applied to a vector. We propose to use instead ground costs of the form $c(x,y)=-logdotpvarphi(x)varphi(y)$ where $varphi$ is a map from the ground space onto the positive orthant $RRr_+$, with $rll n$.
arXiv Detail & Related papers (2020-06-12T10:21:40Z)
A Random Matrix Analysis of Random Fourier Features: Beyond the Gaussian Kernel, a Precise Phase Transition, and the Corresponding Double Descent [85.77233010209368]
This article characterizes the exacts of random Fourier feature (RFF) regression, in the realistic setting where the number of data samples $n$ is all large and comparable. This analysis also provides accurate estimates of training and test regression errors for large $n,p,N$.
arXiv Detail & Related papers (2020-06-09T02:05:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.