Multi-level projection with exponential parallel speedup; Application to sparse auto-encoders neural networks
- URL: http://arxiv.org/abs/2405.02086v2
- Date: Thu, 4 Jul 2024 07:58:17 GMT
- Title: Multi-level projection with exponential parallel speedup; Application to sparse auto-encoders neural networks
- Authors: Guillaume Perez, Michel Barlaud,
- Abstract summary: We show that the time complexity for the $ell_1,infty$ norm is only $mathcalObig(n m big)$ for a matrix in $mathbbRntimes m$.
Experiments show that our projection is $2$ times faster than the actual fastest Euclidean algorithms.
- Score: 2.264332709661011
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The $\ell_{1,\infty}$ norm is an efficient structured projection but the complexity of the best algorithm is unfortunately $\mathcal{O}\big(n m \log(n m)\big)$ for a matrix in $\mathbb{R}^{n\times m}$. In this paper, we propose a new bi-level projection method for which we show that the time complexity for the $\ell_{1,\infty}$ norm is only $\mathcal{O}\big(n m \big)$ for a matrix in $\mathbb{R}^{n\times m}$, and $\mathcal{O}\big(n + m \big)$ with full parallel power. We generalize our method to tensors and we propose a new multi-level projection, having an induced decomposition that yields a linear parallel speedup up to an exponential speedup factor, resulting in a time complexity lower-bounded by the sum of the dimensions, instead of the product of the dimensions. we provide a large base of implementation of our framework for bi-level and tri-level (matrices and tensors) for various norms and provides also the parallel implementation. Experiments show that our projection is $2$ times faster than the actual fastest Euclidean algorithms while providing same accuracy and better sparsity in neural networks applications.
Related papers
- Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms [50.15964512954274]
We study the problem of residual error estimation for matrix and vector norms using a linear sketch.
We demonstrate that this gives a substantial advantage empirically, for roughly the same sketch size and accuracy as in previous work.
We also show an $Omega(k2/pn1-2/p)$ lower bound for the sparse recovery problem, which is tight up to a $mathrmpoly(log n)$ factor.
arXiv Detail & Related papers (2024-08-16T02:33:07Z) - A new Linear Time Bi-level $\ell_{1,\infty}$ projection ; Application to the sparsification of auto-encoders neural networks [2.014710510332479]
We show that the time complexity for the $ell_1,infty$ norm is only $mathcalObig(n m big)$ for a matrix $ntimes m$.
Experiments show that our bi-level $ell_1,infty$ projection is $2.5$ times faster than the actual fastest algorithm.
arXiv Detail & Related papers (2024-07-23T08:51:29Z) - Fine-grained Analysis and Faster Algorithms for Iteratively Solving Linear Systems [9.30306458153248]
We consider the spectral tail condition number, $kappa_ell$, defined as the ratio between the $ell$th largest and the smallest singular value of the matrix representing the system.
Some of the implications of our result, and of the use of $kappa_ell$, include direct improvement over a fine-grained analysis of the Conjugate method.
arXiv Detail & Related papers (2024-05-09T14:56:49Z) - Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models.
In this work, we initiate the study of provably learning a multi-head attention layer from random examples.
We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z) - Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - Fast and Practical Quantum-Inspired Classical Algorithms for Solving
Linear Systems [11.929584800629673]
We propose fast and practical quantum-inspired classical algorithms for solving linear systems.
Our main contribution is the application of the heavy ball momentum method to quantum-inspired classical algorithms for solving linear systems.
arXiv Detail & Related papers (2023-07-13T08:46:19Z) - Pseudonorm Approachability and Applications to Regret Minimization [73.54127663296906]
We convert high-dimensional $ell_infty$-approachability problems to low-dimensional pseudonorm approachability problems.
We develop an algorithmic theory of pseudonorm approachability, analogous to previous work on approachability for $ell$ and other norms.
arXiv Detail & Related papers (2023-02-03T03:19:14Z) - Batch-efficient EigenDecomposition for Small and Medium Matrices [65.67315418971688]
EigenDecomposition (ED) is at the heart of many computer vision algorithms and applications.
We propose a QR-based ED method dedicated to the application scenarios of computer vision.
arXiv Detail & Related papers (2022-07-09T09:14:12Z) - Minimax Optimal Quantization of Linear Models: Information-Theoretic
Limits and Efficient Algorithms [59.724977092582535]
We consider the problem of quantizing a linear model learned from measurements.
We derive an information-theoretic lower bound for the minimax risk under this setting.
We show that our method and upper-bounds can be extended for two-layer ReLU neural networks.
arXiv Detail & Related papers (2022-02-23T02:39:04Z) - Faster Binary Embeddings for Preserving Euclidean Distances [9.340611077939828]
We propose a fast, distance-preserving, binary embedding algorithm to transform a dataset $mathcalTsubseteqmathbbRn$ into binary sequences in the cube $pm 1m$.
Our method is both fast and memory efficient, with time complexity $O(m)$ and space complexity $O(m)$.
arXiv Detail & Related papers (2020-10-01T22:41:41Z) - Training (Overparametrized) Neural Networks in Near-Linear Time [21.616949485102342]
We show how to speed up the algorithm of [CGH+1] for training (mildly overetrized) ReparamLU networks.
The centerpiece of our algorithm is to reformulate the Gauss-Newton as an $ell$-recondition.
arXiv Detail & Related papers (2020-06-20T20:26:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.