Node Similarities under Random Projections: Limits and Pathological Cases
- URL: http://arxiv.org/abs/2404.10148v2
- Date: Mon, 29 Jul 2024 16:51:26 GMT
- Title: Node Similarities under Random Projections: Limits and Pathological Cases
- Authors: Tvrtko Tadić, Cassiano Becker, Jennifer Neville,
- Abstract summary: We investigate how well dot product and cosine similarity are preserved by random projections.
We specialize our fundamental results to a ranking application by computing the probability of random projections flipping the node ordering induced by their embeddings.
With respect to the statistical noise introduced by random projections, we show that cosine similarity produces remarkably more precise approximations.
- Score: 9.452274776651494
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Random Projections have been widely used to generate embeddings for various graph learning tasks due to their computational efficiency. The majority of applications have been justified through the Johnson-Lindenstrauss Lemma. In this paper, we take a step further and investigate how well dot product and cosine similarity are preserved by random projections when these are applied over the rows of the graph matrix. Our analysis provides new asymptotic and finite-sample results, identifies pathological cases, and tests them with numerical experiments. We specialize our fundamental results to a ranking application by computing the probability of random projections flipping the node ordering induced by their embeddings. We find that, depending on the degree distribution, the method produces especially unreliable embeddings for the dot product, regardless of whether the adjacency or the normalized transition matrix is used. With respect to the statistical noise introduced by random projections, we show that cosine similarity produces remarkably more precise approximations.
Related papers
- Gaussian Processes Sampling with Sparse Grids under Additive Schwarz Preconditioner [6.408773096179187]
We propose a scalable algorithm for sampling random realizations of the prior and posterior of GP models.
The proposed algorithm leverages inducing points approximation with sparse grids, as well as additive Schwarz preconditioners.
arXiv Detail & Related papers (2024-08-01T00:19:36Z) - On randomized estimators of the Hafnian of a nonnegative matrix [0.0]
Gaussian Boson samplers aim to demonstrate quantum advantage by performing a sampling task believed to be classically hard.
For nonnegative matrices, there is a family of randomized estimators of the Hafnian based on generating a particular random matrix and calculating its determinant.
Here we investigate the performance of two such estimators, which we call the Barvinok and Godsil-Gutman estimators.
arXiv Detail & Related papers (2023-12-15T19:00:07Z) - A Heavy-Tailed Algebra for Probabilistic Programming [53.32246823168763]
We propose a systematic approach for analyzing the tails of random variables.
We show how this approach can be used during the static analysis (before drawing samples) pass of a probabilistic programming language compiler.
Our empirical results confirm that inference algorithms that leverage our heavy-tailed algebra attain superior performance across a number of density modeling and variational inference tasks.
arXiv Detail & Related papers (2023-06-15T16:37:36Z) - Statistical Efficiency of Score Matching: The View from Isoperimetry [96.65637602827942]
We show a tight connection between statistical efficiency of score matching and the isoperimetric properties of the distribution being estimated.
We formalize these results both in the sample regime and in the finite regime.
arXiv Detail & Related papers (2022-10-03T06:09:01Z) - Confidence-Optimal Random Embeddings [0.0]
This paper develops Johnson-Lindenstrauss distributions with optimal, data-oblivious, statistical confidence bounds.
The bounds are numerically best possible, for any given data dimension, embedding, and distortion tolerance.
They improve upon prior works in terms of statistical accuracy, as well as exactly determine the no-go regimes for data-oblivious approaches.
arXiv Detail & Related papers (2021-04-06T18:00:02Z) - Pathwise Conditioning of Gaussian Processes [72.61885354624604]
Conventional approaches for simulating Gaussian process posteriors view samples as draws from marginal distributions of process values at finite sets of input locations.
This distribution-centric characterization leads to generative strategies that scale cubically in the size of the desired random vector.
We show how this pathwise interpretation of conditioning gives rise to a general family of approximations that lend themselves to efficiently sampling Gaussian process posteriors.
arXiv Detail & Related papers (2020-11-08T17:09:37Z) - Asymptotic Analysis of an Ensemble of Randomly Projected Linear
Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets.
We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator.
We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z) - Efficiently Sampling Functions from Gaussian Process Posteriors [76.94808614373609]
We propose an easy-to-use and general-purpose approach for fast posterior sampling.
We demonstrate how decoupled sample paths accurately represent Gaussian process posteriors at a fraction of the usual cost.
arXiv Detail & Related papers (2020-02-21T14:03:16Z) - Optimal Iterative Sketching with the Subsampled Randomized Hadamard
Transform [64.90148466525754]
We study the performance of iterative sketching for least-squares problems.
We show that the convergence rate for Haar and randomized Hadamard matrices are identical, andally improve upon random projections.
These techniques may be applied to other algorithms that employ randomized dimension reduction.
arXiv Detail & Related papers (2020-02-03T16:17:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.