Related papers: A new Linear Time Bi-level $\ell_{1,\infty}$ projection ; Application to the sparsification of auto-encoders neural networks

A new Linear Time Bi-level $\ell_{1,\infty}$ projection ; Application to the sparsification of auto-encoders neural networks

URL: http://arxiv.org/abs/2407.16293v1
Date: Tue, 23 Jul 2024 08:51:29 GMT
Title: A new Linear Time Bi-level $\ell_{1,\infty}$ projection ; Application to the sparsification of auto-encoders neural networks
Authors: Michel Barlaud, Guillaume Perez, Jean-Paul Marmorat,
Abstract summary: We show that the time complexity for the $ell_1,infty$ norm is only $mathcalObig(n m big)$ for a matrix $ntimes m$. Experiments show that our bi-level $ell_1,infty$ projection is $2.5$ times faster than the actual fastest algorithm.
Score: 2.014710510332479
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The $\ell_{1,\infty}$ norm is an efficient-structured projection, but the complexity of the best algorithm is, unfortunately, $\mathcal{O}\big(n m \log(n m)\big)$ for a matrix $n\times m$.\\ In this paper, we propose a new bi-level projection method, for which we show that the time complexity for the $\ell_{1,\infty}$ norm is only $\mathcal{O}\big(n m \big)$ for a matrix $n\times m$. Moreover, we provide a new $\ell_{1,\infty}$ identity with mathematical proof and experimental validation. Experiments show that our bi-level $\ell_{1,\infty}$ projection is $2.5$ times faster than the actual fastest algorithm and provides the best sparsity while keeping the same accuracy in classification applications.

Related papers

Learning and Computation of $Φ$-Equilibria at the Frontier of Tractability [85.07238533644636]
$Phi$-equilibria is a powerful and flexible framework at the heart of online learning and game theory. We show that an efficient online algorithm incurs average $Phi$-regret at most $epsilon$ using $textpoly(d, k)/epsilon2$ rounds. We also show nearly matching lower bounds in the online setting, thereby obtaining for the first time a family of deviations that captures the learnability of $Phi$-regret.
arXiv Detail & Related papers (2025-02-25T19:08:26Z)
Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms [50.15964512954274]
We study the problem of residual error estimation for matrix and vector norms using a linear sketch. We demonstrate that this gives a substantial advantage empirically, for roughly the same sketch size and accuracy as in previous work. We also show an $Omega(k2/pn1-2/p)$ lower bound for the sparse recovery problem, which is tight up to a $mathrmpoly(log n)$ factor.
arXiv Detail & Related papers (2024-08-16T02:33:07Z)
Multi-level projection with exponential parallel speedup; Application to sparse auto-encoders neural networks [2.264332709661011]
We show that the time complexity for the $ell_1,infty$ norm is only $mathcalObig(n m big)$ for a matrix in $mathbbRntimes m$. Experiments show that our projection is $2$ times faster than the actual fastest Euclidean algorithms.
arXiv Detail & Related papers (2024-05-03T13:21:49Z)
A Scalable Algorithm for Individually Fair K-means Clustering [77.93955971520549]
We present a scalable algorithm for the individually fair ($p$, $k$)-clustering problem introduced by Jung et al. and Mahabadi et al. A clustering is then called individually fair if it has centers within distance $delta(x)$ of $x$ for each $xin P$. We show empirically that not only is our algorithm much faster than prior work, but it also produces lower-cost solutions.
arXiv Detail & Related papers (2024-02-09T19:01:48Z)
Near-Linear Time Projection onto the $\ell_{1,\infty}$ Ball; Application to Sparse Autoencoders [12.010310883787911]
We introduce a new projection algorithm for the $ell_1,infty$ norm ball. We show that both in the biological case and in the general case of sparsity that our method is the fastest.
arXiv Detail & Related papers (2023-07-19T08:47:41Z)
Fast and Practical Quantum-Inspired Classical Algorithms for Solving Linear Systems [11.929584800629673]
We propose fast and practical quantum-inspired classical algorithms for solving linear systems. Our main contribution is the application of the heavy ball momentum method to quantum-inspired classical algorithms for solving linear systems.
arXiv Detail & Related papers (2023-07-13T08:46:19Z)
Fast $(1+\varepsilon)$-Approximation Algorithms for Binary Matrix Factorization [54.29685789885059]
We introduce efficient $(1+varepsilon)$-approximation algorithms for the binary matrix factorization (BMF) problem. The goal is to approximate $mathbfA$ as a product of low-rank factors. Our techniques generalize to other common variants of the BMF problem.
arXiv Detail & Related papers (2023-06-02T18:55:27Z)
Randomized and Deterministic Attention Sparsification Algorithms for Over-parameterized Feature Dimension [18.57735939471469]
We consider the sparsification of the attention problem. For any super large feature dimension, we can reduce it down to the size nearly linear in length of sentence.
arXiv Detail & Related papers (2023-04-10T05:52:38Z)
Fast Attention Requires Bounded Entries [19.17278873525312]
inner product attention computation is a fundamental task for training large language models such as Transformer, GPT-1, BERT, GPT-2, GPT-3 and ChatGPT. We investigate whether faster algorithms are possible by implicitly making use of the matrix $A$. This gives a theoretical explanation for the phenomenon observed in practice that attention computation is much more efficient when the input matrices have smaller entries.
arXiv Detail & Related papers (2023-02-26T02:42:39Z)
Sketching Algorithms and Lower Bounds for Ridge Regression [65.0720777731368]
We give a sketching-based iterative algorithm that computes $1+varepsilon$ approximate solutions for the ridge regression problem. We also show that this algorithm can be used to give faster algorithms for kernel ridge regression.
arXiv Detail & Related papers (2022-04-13T22:18:47Z)
Contextual Recommendations and Low-Regret Cutting-Plane Algorithms [49.91214213074933]
We consider the following variant of contextual linear bandits motivated by routing applications in navigational engines and recommendation systems. We design novel cutting-plane algorithms with low "regret" -- the total distance between the true point $w*$ and the hyperplanes the separation oracle returns.
arXiv Detail & Related papers (2021-06-09T05:39:05Z)
Learning a Latent Simplex in Input-Sparsity Time [58.30321592603066]
We consider the problem of learning a latent $k$-vertex simplex $KsubsetmathbbRdtimes n$, given access to $AinmathbbRdtimes n$. We show that the dependence on $k$ in the running time is unnecessary given a natural assumption about the mass of the top $k$ singular values of $A$.
arXiv Detail & Related papers (2021-05-17T16:40:48Z)
On Efficient Low Distortion Ultrametric Embedding [18.227854382422112]
A widely-used method to preserve the underlying hierarchical structure of the data is to find an embedding of the data into a tree or an ultrametric. In this paper, we provide a new algorithm which takes as input a set of points isometric in $mathbbRd2 (for universal constant $rho>1$) to output an ultrametric $Delta. We show that the output of our algorithm is comparable to the output of the linkage algorithms while achieving a much faster running time.
arXiv Detail & Related papers (2020-08-15T11:06:45Z)
The Average-Case Time Complexity of Certifying the Restricted Isometry Property [66.65353643599899]
In compressed sensing, the restricted isometry property (RIP) on $M times N$ sensing matrices guarantees efficient reconstruction of sparse vectors. We investigate the exact average-case time complexity of certifying the RIP property for $Mtimes N$ matrices with i.i.d. $mathcalN(0,1/M)$ entries.
arXiv Detail & Related papers (2020-05-22T16:55:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.