Concentration of Random Feature Matrices in High-Dimensions
- URL: http://arxiv.org/abs/2204.06935v1
- Date: Thu, 14 Apr 2022 13:01:27 GMT
- Title: Concentration of Random Feature Matrices in High-Dimensions
- Authors: Zhijun Chen, Hayden Schaeffer, Rachel Ward
- Abstract summary: spectra of random feature matrices provide information on the conditioning of the linear system used in random feature regression problems.
We consider two settings for the two input variables, either both are random variables or one is a random variable and the other is well-separated.
- Score: 7.1171757928258135
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The spectra of random feature matrices provide essential information on the
conditioning of the linear system used in random feature regression problems
and are thus connected to the consistency and generalization of random feature
models. Random feature matrices are asymmetric rectangular nonlinear matrices
depending on two input variables, the data and the weights, which can make
their characterization challenging. We consider two settings for the two input
variables, either both are random variables or one is a random variable and the
other is well-separated, i.e. there is a minimum distance between points. With
conditions on the dimension, the complexity ratio, and the sampling variance,
we show that the singular values of these matrices concentrate near their full
expectation and near one with high-probability. In particular, since the
dimension depends only on the logarithm of the number of random weights or the
number of data points, our complexity bounds can be achieved even in moderate
dimensions for many practical setting. The theoretical results are verified
with numerical experiments.
Related papers
- Automatic feature selection and weighting using Differentiable Information Imbalance [41.452380773977154]
We introduce the Differentiable Information Imbalance (DII), an automatic data analysis method to rank information content between sets of features.
Based on the nearest neighbors according to distances in the ground truth feature space, the method finds a low-dimensional subset of the input features.
By employing the Differentiable Information Imbalance as a loss function, the relative feature weights of the inputs are optimized, simultaneously performing unit alignment and relative importance scaling.
arXiv Detail & Related papers (2024-10-30T11:19:10Z) - A class of 2 X 2 correlated random-matrix models with Brody spacing distribution [0.0]
A class of 2 X 2 random-matrix models is introduced for which the Brody distribution is the eigenvalue spacing distribution.
The random matrices introduced here differ from those of the Gaussian Orthogonal Ensemble (GOE) in three important ways.
arXiv Detail & Related papers (2023-08-03T03:11:54Z) - Quantitative deterministic equivalent of sample covariance matrices with
a general dependence structure [0.0]
We prove quantitative bounds involving both the dimensions and the spectral parameter, in particular allowing it to get closer to the real positive semi-line.
As applications, we obtain a new bound for the convergence in Kolmogorov distance of the empirical spectral distributions of these general models.
arXiv Detail & Related papers (2022-11-23T15:50:31Z) - Equivariance Discovery by Learned Parameter-Sharing [153.41877129746223]
We study how to discover interpretable equivariances from data.
Specifically, we formulate this discovery process as an optimization problem over a model's parameter-sharing schemes.
Also, we theoretically analyze the method for Gaussian data and provide a bound on the mean squared gap between the studied discovery scheme and the oracle scheme.
arXiv Detail & Related papers (2022-04-07T17:59:19Z) - Test Set Sizing Via Random Matrix Theory [91.3755431537592]
This paper uses techniques from Random Matrix Theory to find the ideal training-testing data split for a simple linear regression.
It defines "ideal" as satisfying the integrity metric, i.e. the empirical model error is the actual measurement noise.
This paper is the first to solve for the training and test size for any model in a way that is truly optimal.
arXiv Detail & Related papers (2021-12-11T13:18:33Z) - Numerical Solution of Stiff Ordinary Differential Equations with Random
Projection Neural Networks [0.0]
We propose a numerical scheme based on Random Projection Neural Networks (RPNN) for the solution of Ordinary Differential Equations (ODEs)
We show that our proposed scheme yields good numerical approximation accuracy without being affected by the stiffness, thus outperforming in same cases the textttode45 and textttode15s functions.
arXiv Detail & Related papers (2021-08-03T15:49:17Z) - Spectral statistics for the difference of two Wishart matrices [1.2225709246035374]
We derive the joint probability density function of the corresponding eigenvalues in a finite-dimension scenario using two distinct approaches.
We point out the relationship of these results with the corresponding results for difference of two random density matrices and obtain some explicit and closed form expressions for the spectral density and absolute mean.
arXiv Detail & Related papers (2020-11-14T18:43:34Z) - Understanding Implicit Regularization in Over-Parameterized Single Index
Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model.
We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z) - A Random Matrix Analysis of Random Fourier Features: Beyond the Gaussian
Kernel, a Precise Phase Transition, and the Corresponding Double Descent [85.77233010209368]
This article characterizes the exacts of random Fourier feature (RFF) regression, in the realistic setting where the number of data samples $n$ is all large and comparable.
This analysis also provides accurate estimates of training and test regression errors for large $n,p,N$.
arXiv Detail & Related papers (2020-06-09T02:05:40Z) - Semiparametric Nonlinear Bipartite Graph Representation Learning with
Provable Guarantees [106.91654068632882]
We consider the bipartite graph and formalize its representation learning problem as a statistical estimation problem of parameters in a semiparametric exponential family distribution.
We show that the proposed objective is strongly convex in a neighborhood around the ground truth, so that a gradient descent-based method achieves linear convergence rate.
Our estimator is robust to any model misspecification within the exponential family, which is validated in extensive experiments.
arXiv Detail & Related papers (2020-03-02T16:40:36Z) - Optimal Iterative Sketching with the Subsampled Randomized Hadamard
Transform [64.90148466525754]
We study the performance of iterative sketching for least-squares problems.
We show that the convergence rate for Haar and randomized Hadamard matrices are identical, andally improve upon random projections.
These techniques may be applied to other algorithms that employ randomized dimension reduction.
arXiv Detail & Related papers (2020-02-03T16:17:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.