Related papers: Quantitative Bounds for Sorting-Based Permutation-Invariant Embeddings

Quantitative Bounds for Sorting-Based Permutation-Invariant Embeddings

URL: http://arxiv.org/abs/2510.22186v1
Date: Sat, 25 Oct 2025 06:44:08 GMT
Title: Quantitative Bounds for Sorting-Based Permutation-Invariant Embeddings
Authors: Nadav Dym, Matthias Wellershoff, Efstratios Tsoukanis, Daniel Levy, Radu Balan,
Abstract summary: We study the sorting-based embedding $beta_mathbf A : mathbb Rn times d to mathbb Rn times D$, $mathbf X mapsto downarrow(mathbf X mathbf A)$, where $downarrow$ denotes column wise sorting of matrices.
Score: 13.307069556587471
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study the sorting-based embedding $\beta_{\mathbf A} : \mathbb R^{n \times d} \to \mathbb R^{n \times D}$, $\mathbf X \mapsto {\downarrow}(\mathbf X \mathbf A)$, where $\downarrow$ denotes column wise sorting of matrices. Such embeddings arise in graph deep learning where outputs should be invariant to permutations of graph nodes. Previous work showed that for large enough $D$ and appropriate $\mathbf A$, the mapping $\beta_{\mathbf A}$ is injective, and moreover satisfies a bi-Lipschitz condition. However, two gaps remain: firstly, the optimal size $D$ required for injectivity is not yet known, and secondly, no estimates of the bi-Lipschitz constants of the mapping are known. In this paper, we make substantial progress in addressing both of these gaps. Regarding the first gap, we improve upon the best known upper bounds for the embedding dimension $D$ necessary for injectivity, and also provide a lower bound on the minimal injectivity dimension. Regarding the second gap, we construct matrices $\mathbf A$, so that the bi-Lipschitz distortion of $\beta_{\mathbf A} $ depends quadratically on $n$, and is completely independent of $d$. We also show that the distortion of $\beta_{\mathbf A}$ is necessarily at least in $\Omega(\sqrt{n})$. Finally, we provide similar results for variants of $\beta_{\mathbf A}$ obtained by applying linear projections to reduce the output dimension of $\beta_{\mathbf A}$.

Related papers

Sparse Linear Regression is Easy on Random Supports [20.128442161507582]
We are given as input a design matrix $X in mathbbRN times d$ and measurements or labels $y in mathbbRN$.<n>We find that if the support of $w*$ is chosen at random, we can get prediction error $epsilon$ with roughly $N = O(klog d/epsilon)$ samples.
arXiv Detail & Related papers (2025-11-09T03:48:21Z)
Sublinear-Time Algorithms for Diagonally Dominant Systems and Applications to the Friedkin-Johnsen Model [5.101318208537081]
We study sublinear-time algorithms for solving linear systems $Sz = b$, where $S$ is a diagonally dominant matrix.<n>We present randomized algorithms that, for any $u in [n]$, return an estimate $z_u$ of $z*_u$ with additive error.<n>We also prove a matching lower bound, showing that the linear dependence on $S_max$ is optimal.
arXiv Detail & Related papers (2025-09-16T14:13:31Z)
Beyond Worst-Case Dimensionality Reduction for Sparse Vectors [47.927989749887864]
We study beyond worst-case dimensionality reduction for $s$-sparse vectors.<n>For any collection $X$ of $s$-sparse vectors in $mathbbRO(s2)$, there exists a linear map to $mathbbRO(s2)$ which emphexactly preserves the norm of $99%$ of the vectors in $X$ in any $ell_p$ norm.<n>We show that both the non-linearity of $f$ and the non-negativity of $
arXiv Detail & Related papers (2025-02-27T08:17:47Z)
The Communication Complexity of Approximating Matrix Rank [50.6867896228563]
We show that this problem has randomized communication complexity $Omega(frac1kcdot n2log|mathbbF|)$. As an application, we obtain an $Omega(frac1kcdot n2log|mathbbF|)$ space lower bound for any streaming algorithm with $k$ passes.
arXiv Detail & Related papers (2024-10-26T06:21:42Z)
Efficient $1$-bit tensor approximations [1.104960878651584]
Our algorithm yields efficient signed cut decompositions in $20$ lines of pseudocode. We approximate the weight matrices in the open textitMistral-7B-v0.1 Large Language Model to a $50%$ spatial compression.
arXiv Detail & Related papers (2024-10-02T17:56:32Z)
Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms [50.15964512954274]
We study the problem of residual error estimation for matrix and vector norms using a linear sketch. We demonstrate that this gives a substantial advantage empirically, for roughly the same sketch size and accuracy as in previous work. We also show an $Omega(k2/pn1-2/p)$ lower bound for the sparse recovery problem, which is tight up to a $mathrmpoly(log n)$ factor.
arXiv Detail & Related papers (2024-08-16T02:33:07Z)
Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models. In this work, we initiate the study of provably learning a multi-head attention layer from random examples. We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z)
Randomized and Deterministic Attention Sparsification Algorithms for Over-parameterized Feature Dimension [18.57735939471469]
We consider the sparsification of the attention problem. For any super large feature dimension, we can reduce it down to the size nearly linear in length of sentence.
arXiv Detail & Related papers (2023-04-10T05:52:38Z)
Sketching Algorithms and Lower Bounds for Ridge Regression [65.0720777731368]
We give a sketching-based iterative algorithm that computes $1+varepsilon$ approximate solutions for the ridge regression problem. We also show that this algorithm can be used to give faster algorithms for kernel ridge regression.
arXiv Detail & Related papers (2022-04-13T22:18:47Z)
Fast Graph Sampling for Short Video Summarization using Gershgorin Disc Alignment [52.577757919003844]
We study the problem of efficiently summarizing a short video into several paragraphs, leveraging recent progress in fast graph sampling. Experimental results show that our algorithm achieves comparable video summarization as state-of-the-art methods, at a substantially reduced complexity.
arXiv Detail & Related papers (2021-10-21T18:43:00Z)
Spectral properties of sample covariance matrices arising from random matrices with independent non identically distributed columns [50.053491972003656]
It was previously shown that the functionals $texttr(AR(z))$, for $R(z) = (frac1nXXT- zI_p)-1$ and $Ain mathcal M_p$ deterministic, have a standard deviation of order $O(|A|_* / sqrt n)$. Here, we show that $|mathbb E[R(z)] - tilde R(z)|_F
arXiv Detail & Related papers (2021-09-06T14:21:43Z)
Linear Bandits on Uniformly Convex Sets [88.3673525964507]
Linear bandit algorithms yield $tildemathcalO(nsqrtT)$ pseudo-regret bounds on compact convex action sets. Two types of structural assumptions lead to better pseudo-regret bounds.
arXiv Detail & Related papers (2021-03-10T07:33:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.