spred: Solving $L_1$ Penalty with SGD
- URL: http://arxiv.org/abs/2210.01212v5
- Date: Wed, 12 Jul 2023 15:09:24 GMT
- Title: spred: Solving $L_1$ Penalty with SGD
- Authors: Liu Ziyin, Zihao Wang
- Abstract summary: We propose to minimize a differentiable objective with $L_$ using a simple reparametrization.
We prove that the reparametrization trick is completely benign" with an exactiable non function.
- Score: 6.2255027793924285
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose to minimize a generic differentiable objective with $L_1$
constraint using a simple reparametrization and straightforward stochastic
gradient descent. Our proposal is the direct generalization of previous ideas
that the $L_1$ penalty may be equivalent to a differentiable reparametrization
with weight decay. We prove that the proposed method, \textit{spred}, is an
exact differentiable solver of $L_1$ and that the reparametrization trick is
completely ``benign" for a generic nonconvex function. Practically, we
demonstrate the usefulness of the method in (1) training sparse neural networks
to perform gene selection tasks, which involves finding relevant features in a
very high dimensional space, and (2) neural network compression task, to which
previous attempts at applying the $L_1$-penalty have been unsuccessful.
Conceptually, our result bridges the gap between the sparsity in deep learning
and conventional statistical learning.
Related papers
- Matching the Statistical Query Lower Bound for k-sparse Parity Problems with Stochastic Gradient Descent [83.85536329832722]
We show that gradient descent (SGD) can efficiently solve the $k$-parity problem on a $d$dimensional hypercube.
We then demonstrate how a trained neural network with SGD, solving the $k$-parity problem with small statistical errors.
arXiv Detail & Related papers (2024-04-18T17:57:53Z) - Retire: Robust Expectile Regression in High Dimensions [3.9391041278203978]
Penalized quantile and expectile regression methods offer useful tools to detect heteroscedasticity in high-dimensional data.
We propose and study (penalized) robust expectile regression (retire)
We show that the proposed procedure can be efficiently solved by a semismooth Newton coordinate descent algorithm.
arXiv Detail & Related papers (2022-12-11T18:03:12Z) - Adaptive Stochastic Variance Reduction for Non-convex Finite-Sum
Minimization [52.25843977506935]
We propose an adaptive variance method, called AdaSpider, for $L$-smooth, non-reduction functions with a finitesum structure.
In doing so, we are able to compute an $epsilon-stationary point with $tildeOleft + st/epsilon calls.
arXiv Detail & Related papers (2022-11-03T14:41:46Z) - Sharper Rates and Flexible Framework for Nonconvex SGD with Client and
Data Sampling [64.31011847952006]
We revisit the problem of finding an approximately stationary point of the average of $n$ smooth and possibly non-color functions.
We generalize the $smallsfcolorgreen$ so that it can provably work with virtually any sampling mechanism.
We provide the most general and most accurate analysis of optimal bound in the smooth non-color regime.
arXiv Detail & Related papers (2022-06-05T21:32:33Z) - $\ell_p$ Slack Norm Support Vector Data Description [0.0]
We generalise this modelling formalism to a general $ell_p$-norm ($pgeq1$) slack penalty function.
By virtue of an $ell_p$ slack norm, the proposed approach enables formulating a non-linear cost function with respect to slacks.
arXiv Detail & Related papers (2022-03-16T20:38:37Z) - Target Network and Truncation Overcome The Deadly triad in $Q$-Learning [7.532013242448151]
We propose a stable design for $Q$-learning with linear function approximation using target network and truncation.
Our result implies an $mathcalO(epsilon-2)$ sample complexity up to a function approximation error.
This is the first variant of $Q$-learning with linear function approximation that is provably stable without requiring strong assumptions or modifying the problem parameters.
arXiv Detail & Related papers (2022-03-05T00:54:58Z) - Provably Efficient Convergence of Primal-Dual Actor-Critic with
Nonlinear Function Approximation [15.319335698574932]
We show the first efficient convergence result with primal-dual actor-critic with a convergence of $mathcalOleft ascent(Nright)Nright)$ under Polyian sampling.
Results on Open GymAI continuous control tasks.
arXiv Detail & Related papers (2022-02-28T15:16:23Z) - Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free
Reinforcement Learning [52.76230802067506]
A novel model-free algorithm is proposed to minimize regret in episodic reinforcement learning.
The proposed algorithm employs an em early-settled reference update rule, with the aid of two Q-learning sequences.
The design principle of our early-settled variance reduction method might be of independent interest to other RL settings.
arXiv Detail & Related papers (2021-10-09T21:13:48Z) - An efficient projection neural network for $\ell_1$-regularized logistic
regression [10.517079029721257]
This paper presents a simple projection neural network for $ell_$-regularized logistics regression.
The proposed neural network does not require any extra auxiliary variable nor any smooth approximation.
We also investigate the convergence of the proposed neural network by using the Lyapunov theory and show that it converges to a solution of the problem with any arbitrary initial value.
arXiv Detail & Related papers (2021-05-12T06:13:44Z) - Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions [84.49087114959872]
We provide the first non-asymptotic analysis for finding stationary points of nonsmooth, nonsmooth functions.
In particular, we study Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions.
arXiv Detail & Related papers (2020-02-10T23:23:04Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.