Near-Linear Time Projection onto the $\ell_{1,\infty}$ Ball; Application
to Sparse Autoencoders
- URL: http://arxiv.org/abs/2307.09836v1
- Date: Wed, 19 Jul 2023 08:47:41 GMT
- Title: Near-Linear Time Projection onto the $\ell_{1,\infty}$ Ball; Application
to Sparse Autoencoders
- Authors: Guillaume Perez and Laurent Condat and Michel Barlaud
- Abstract summary: We introduce a new projection algorithm for the $ell_1,infty$ norm ball.
We show that both in the biological case and in the general case of sparsity that our method is the fastest.
- Score: 12.010310883787911
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Looking for sparsity is nowadays crucial to speed up the training of
large-scale neural networks. Projections onto the $\ell_{1,2}$ and
$\ell_{1,\infty}$ are among the most efficient techniques to sparsify and
reduce the overall cost of neural networks. In this paper, we introduce a new
projection algorithm for the $\ell_{1,\infty}$ norm ball. The worst-case time
complexity of this algorithm is $\mathcal{O}\big(nm+J\log(nm)\big)$ for a
matrix in $\mathbb{R}^{n\times m}$. $J$ is a term that tends to 0 when the
sparsity is high, and to $nm$ when the sparsity is low. Its implementation is
easy and it is guaranteed to converge to the exact solution in a finite time.
Moreover, we propose to incorporate the $\ell_{1,\infty}$ ball projection while
training an autoencoder to enforce feature selection and sparsity of the
weights. Sparsification appears in the encoder to primarily do feature
selection due to our application in biology, where only a very small part
($<2\%$) of the data is relevant. We show that both in the biological case and
in the general case of sparsity that our method is the fastest.
Related papers
- Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms [50.15964512954274]
We study the problem of residual error estimation for matrix and vector norms using a linear sketch.
We demonstrate that this gives a substantial advantage empirically, for roughly the same sketch size and accuracy as in previous work.
We also show an $Omega(k2/pn1-2/p)$ lower bound for the sparse recovery problem, which is tight up to a $mathrmpoly(log n)$ factor.
arXiv Detail & Related papers (2024-08-16T02:33:07Z) - A new Linear Time Bi-level $\ell_{1,\infty}$ projection ; Application to the sparsification of auto-encoders neural networks [2.014710510332479]
We show that the time complexity for the $ell_1,infty$ norm is only $mathcalObig(n m big)$ for a matrix $ntimes m$.
Experiments show that our bi-level $ell_1,infty$ projection is $2.5$ times faster than the actual fastest algorithm.
arXiv Detail & Related papers (2024-07-23T08:51:29Z) - A Scalable Algorithm for Individually Fair K-means Clustering [77.93955971520549]
We present a scalable algorithm for the individually fair ($p$, $k$)-clustering problem introduced by Jung et al. and Mahabadi et al.
A clustering is then called individually fair if it has centers within distance $delta(x)$ of $x$ for each $xin P$.
We show empirically that not only is our algorithm much faster than prior work, but it also produces lower-cost solutions.
arXiv Detail & Related papers (2024-02-09T19:01:48Z) - A Smoothing Algorithm for l1 Support Vector Machines [0.07499722271664144]
A smoothing algorithm is presented for solving the soft-margin Support Vector Machine (SVM) optimization problem with an $ell1$ penalty.
The algorithm uses smoothing for the hinge-loss function, and an active set approach for the $ell1$ penalty.
Experiments show that our algorithm is capable of strong test accuracy without sacrificing training speed.
arXiv Detail & Related papers (2023-12-17T00:54:56Z) - Scaling Up Differentially Private LASSO Regularized Logistic Regression
via Faster Frank-Wolfe Iterations [51.14495595270775]
We adapt the Frank-Wolfe algorithm for $L_1$ penalized linear regression to be aware of sparse inputs and to use them effectively.
Our results demonstrate that this procedure can reduce runtime by a factor of up to $2,200times$, depending on the value of the privacy parameter $epsilon$ and the sparsity of the dataset.
arXiv Detail & Related papers (2023-10-30T19:52:43Z) - Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - Optimal Approximation Rates for Deep ReLU Neural Networks on Sobolev and Besov Spaces [2.7195102129095003]
Deep neural networks with the ReLU activation function can approximate functions in the Sobolev spaces $Ws(L_q(Omega))$ and Besov spaces $Bs_r(L_q(Omega))$.
This problem is important when studying the application of neural networks in a variety of fields.
arXiv Detail & Related papers (2022-11-25T23:32:26Z) - Learning a Latent Simplex in Input-Sparsity Time [58.30321592603066]
We consider the problem of learning a latent $k$-vertex simplex $KsubsetmathbbRdtimes n$, given access to $AinmathbbRdtimes n$.
We show that the dependence on $k$ in the running time is unnecessary given a natural assumption about the mass of the top $k$ singular values of $A$.
arXiv Detail & Related papers (2021-05-17T16:40:48Z) - On Efficient Low Distortion Ultrametric Embedding [18.227854382422112]
A widely-used method to preserve the underlying hierarchical structure of the data is to find an embedding of the data into a tree or an ultrametric.
In this paper, we provide a new algorithm which takes as input a set of points isometric in $mathbbRd2 (for universal constant $rho>1$) to output an ultrametric $Delta.
We show that the output of our algorithm is comparable to the output of the linkage algorithms while achieving a much faster running time.
arXiv Detail & Related papers (2020-08-15T11:06:45Z) - Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model.
We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z) - Training (Overparametrized) Neural Networks in Near-Linear Time [21.616949485102342]
We show how to speed up the algorithm of [CGH+1] for training (mildly overetrized) ReparamLU networks.
The centerpiece of our algorithm is to reformulate the Gauss-Newton as an $ell$-recondition.
arXiv Detail & Related papers (2020-06-20T20:26:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.