Related papers: Multi-level projection with exponential parallel speedup; Application to sparse auto-encoders neural networks

Multi-level projection with exponential parallel speedup; Application to sparse auto-encoders neural networks

URL: http://arxiv.org/abs/2405.02086v2
Date: Thu, 4 Jul 2024 07:58:17 GMT
Title: Multi-level projection with exponential parallel speedup; Application to sparse auto-encoders neural networks
Authors: Guillaume Perez, Michel Barlaud,
Abstract summary: We show that the time complexity for the $ell_1,infty$ norm is only $mathcalObig(n m big)$ for a matrix in $mathbbRntimes m$. Experiments show that our projection is $2$ times faster than the actual fastest Euclidean algorithms.
Score: 2.264332709661011
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The $\ell_{1,\infty}$ norm is an efficient structured projection but the complexity of the best algorithm is unfortunately $\mathcal{O}\big(n m \log(n m)\big)$ for a matrix in $\mathbb{R}^{n\times m}$. In this paper, we propose a new bi-level projection method for which we show that the time complexity for the $\ell_{1,\infty}$ norm is only $\mathcal{O}\big(n m \big)$ for a matrix in $\mathbb{R}^{n\times m}$, and $\mathcal{O}\big(n + m \big)$ with full parallel power. We generalize our method to tensors and we propose a new multi-level projection, having an induced decomposition that yields a linear parallel speedup up to an exponential speedup factor, resulting in a time complexity lower-bounded by the sum of the dimensions, instead of the product of the dimensions. we provide a large base of implementation of our framework for bi-level and tri-level (matrices and tensors) for various norms and provides also the parallel implementation. Experiments show that our projection is $2$ times faster than the actual fastest Euclidean algorithms while providing same accuracy and better sparsity in neural networks applications.

Related papers

Approaching Optimality for Solving Dense Linear Systems with Low-Rank Structure [16.324043075920564]
We provide new high-accuracy randomized algorithms for solving linear systems and regression problems.<n>Our algorithms nearly-match a natural complexity limit under dense inputs for these problems.<n>We show how to obtain these running times even under the weaker assumption that all but $k$ of the singular values have a bounded generalized mean.
arXiv Detail & Related papers (2025-07-15T20:48:30Z)
The Generative Leap: Sharp Sample Complexity for Efficiently Learning Gaussian Multi-Index Models [71.5283441529015]
In this work we consider generic Gaussian Multi-index models, in which the labels only depend on the (Gaussian) $d$-dimensional inputs through their projection onto a low-dimensional $r = O_d(1)$ subspace.<n>We introduce the generative leap exponent $kstar$, a natural extension of the generative exponent from [Damian et al.'24] to the multi-index setting.
arXiv Detail & Related papers (2025-06-05T18:34:56Z)
Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms [50.15964512954274]
We study the problem of residual error estimation for matrix and vector norms using a linear sketch. We demonstrate that this gives a substantial advantage empirically, for roughly the same sketch size and accuracy as in previous work. We also show an $Omega(k2/pn1-2/p)$ lower bound for the sparse recovery problem, which is tight up to a $mathrmpoly(log n)$ factor.
arXiv Detail & Related papers (2024-08-16T02:33:07Z)
A new Linear Time Bi-level $\ell_{1,\infty}$ projection ; Application to the sparsification of auto-encoders neural networks [2.014710510332479]
We show that the time complexity for the $ell_1,infty$ norm is only $mathcalObig(n m big)$ for a matrix $ntimes m$. Experiments show that our bi-level $ell_1,infty$ projection is $2.5$ times faster than the actual fastest algorithm.
arXiv Detail & Related papers (2024-07-23T08:51:29Z)
Fine-grained Analysis and Faster Algorithms for Iteratively Solving Linear Systems [9.30306458153248]
We consider the spectral tail condition number, $kappa_ell$, defined as the ratio between the $ell$th largest and the smallest singular value of the matrix representing the system. Some of the implications of our result, and of the use of $kappa_ell$, include direct improvement over a fine-grained analysis of the Conjugate method.
arXiv Detail & Related papers (2024-05-09T14:56:49Z)
Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models. In this work, we initiate the study of provably learning a multi-head attention layer from random examples. We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z)
Efficiently Learning One-Hidden-Layer ReLU Networks via Schur Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss. Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z)
Fast and Practical Quantum-Inspired Classical Algorithms for Solving Linear Systems [11.929584800629673]
We propose fast and practical quantum-inspired classical algorithms for solving linear systems. Our main contribution is the application of the heavy ball momentum method to quantum-inspired classical algorithms for solving linear systems.
arXiv Detail & Related papers (2023-07-13T08:46:19Z)
Pseudonorm Approachability and Applications to Regret Minimization [73.54127663296906]
We convert high-dimensional $ell_infty$-approachability problems to low-dimensional pseudonorm approachability problems. We develop an algorithmic theory of pseudonorm approachability, analogous to previous work on approachability for $ell$ and other norms.
arXiv Detail & Related papers (2023-02-03T03:19:14Z)
Batch-efficient EigenDecomposition for Small and Medium Matrices [65.67315418971688]
EigenDecomposition (ED) is at the heart of many computer vision algorithms and applications. We propose a QR-based ED method dedicated to the application scenarios of computer vision.
arXiv Detail & Related papers (2022-07-09T09:14:12Z)
Minimax Optimal Quantization of Linear Models: Information-Theoretic Limits and Efficient Algorithms [59.724977092582535]
We consider the problem of quantizing a linear model learned from measurements. We derive an information-theoretic lower bound for the minimax risk under this setting. We show that our method and upper-bounds can be extended for two-layer ReLU neural networks.
arXiv Detail & Related papers (2022-02-23T02:39:04Z)
Faster Binary Embeddings for Preserving Euclidean Distances [9.340611077939828]
We propose a fast, distance-preserving, binary embedding algorithm to transform a dataset $mathcalTsubseteqmathbbRn$ into binary sequences in the cube $pm 1m$. Our method is both fast and memory efficient, with time complexity $O(m)$ and space complexity $O(m)$.
arXiv Detail & Related papers (2020-10-01T22:41:41Z)
Training (Overparametrized) Neural Networks in Near-Linear Time [21.616949485102342]
We show how to speed up the algorithm of [CGH+1] for training (mildly overetrized) ReparamLU networks. The centerpiece of our algorithm is to reformulate the Gauss-Newton as an $ell$-recondition.
arXiv Detail & Related papers (2020-06-20T20:26:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.