Orthant Based Proximal Stochastic Gradient Method for
$\ell_1$-Regularized Optimization
- URL: http://arxiv.org/abs/2004.03639v2
- Date: Thu, 23 Jul 2020 04:54:42 GMT
- Title: Orthant Based Proximal Stochastic Gradient Method for
$\ell_1$-Regularized Optimization
- Authors: Tianyi Chen, Tianyu Ding, Bo Ji, Guanyi Wang, Jing Tian, Yixin Shi,
Sheng Yi, Xiao Tu, Zhihui Zhu
- Abstract summary: Sparsity-inducing regularization problems are ubiquitous in machine learning applications.
We present a novel method -- Orthanximal Gradient Method (OBProx-SG)
- Score: 35.236001071182855
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sparsity-inducing regularization problems are ubiquitous in machine learning
applications, ranging from feature selection to model compression. In this
paper, we present a novel stochastic method -- Orthant Based Proximal
Stochastic Gradient Method (OBProx-SG) -- to solve perhaps the most popular
instance, i.e., the l1-regularized problem. The OBProx-SG method contains two
steps: (i) a proximal stochastic gradient step to predict a support cover of
the solution; and (ii) an orthant step to aggressively enhance the sparsity
level via orthant face projection. Compared to the state-of-the-art methods,
e.g., Prox-SG, RDA and Prox-SVRG, the OBProx-SG not only converges to the
global optimal solutions (in convex scenario) or the stationary points (in
non-convex scenario), but also promotes the sparsity of the solutions
substantially. Particularly, on a large number of convex problems, OBProx-SG
outperforms the existing methods comprehensively in the aspect of sparsity
exploration and objective values. Moreover, the experiments on non-convex deep
neural networks, e.g., MobileNetV1 and ResNet18, further demonstrate its
superiority by achieving the solutions of much higher sparsity without
sacrificing generalization accuracy.
Related papers
- Trust-Region Sequential Quadratic Programming for Stochastic Optimization with Random Models [57.52124921268249]
We propose a Trust Sequential Quadratic Programming method to find both first and second-order stationary points.
To converge to first-order stationary points, our method computes a gradient step in each iteration defined by minimizing a approximation of the objective subject.
To converge to second-order stationary points, our method additionally computes an eigen step to explore the negative curvature the reduced Hessian matrix.
arXiv Detail & Related papers (2024-09-24T04:39:47Z) - Diffusion Stochastic Optimization for Min-Max Problems [33.73046548872663]
The optimistic gradient method is useful in addressing minimax optimization problems.
Motivated by the observation that the conventional version suffers from the need for a large batch size, we introduce and analyze a new formulation termed Samevareps-generativeOGOG.
arXiv Detail & Related papers (2024-01-26T01:16:59Z) - Stochastic Inexact Augmented Lagrangian Method for Nonconvex Expectation
Constrained Optimization [88.0031283949404]
Many real-world problems have complicated non functional constraints and use a large number of data points.
Our proposed method outperforms an existing method with the previously best-known result.
arXiv Detail & Related papers (2022-12-19T14:48:54Z) - On the Convergence to a Global Solution of Shuffling-Type Gradient
Algorithms [18.663264755108703]
gradient descent (SGD) algorithm is the method of choice in many machine learning tasks.
In this paper, we show that SGD has achieved the desired computational general complexity as convex setting.
arXiv Detail & Related papers (2022-06-13T01:25:59Z) - Faster Algorithm and Sharper Analysis for Constrained Markov Decision
Process [56.55075925645864]
The problem of constrained decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated discounted reward subject to multiple constraints.
A new utilities-dual convex approach is proposed with novel integration of three ingredients: regularized policy, dual regularizer, and Nesterov's gradient descent dual.
This is the first demonstration that nonconcave CMDP problems can attain the lower bound of $mathcal O (1/epsilon)$ for all complexity optimization subject to convex constraints.
arXiv Detail & Related papers (2021-10-20T02:57:21Z) - A general sample complexity analysis of vanilla policy gradient [101.16957584135767]
Policy gradient (PG) is one of the most popular reinforcement learning (RL) problems.
"vanilla" theoretical understanding of PG trajectory is one of the most popular methods for solving RL problems.
arXiv Detail & Related papers (2021-07-23T19:38:17Z) - Momentum Accelerates the Convergence of Stochastic AUPRC Maximization [80.8226518642952]
We study optimization of areas under precision-recall curves (AUPRC), which is widely used for imbalanced tasks.
We develop novel momentum methods with a better iteration of $O (1/epsilon4)$ for finding an $epsilon$stationary solution.
We also design a novel family of adaptive methods with the same complexity of $O (1/epsilon4)$, which enjoy faster convergence in practice.
arXiv Detail & Related papers (2021-07-02T16:21:52Z) - A Retrospective Approximation Approach for Smooth Stochastic
Optimization [0.2867517731896504]
Gradient (SG) is the defactorandom iterative technique to solve optimization (SO) problems with a smooth (non-fimation) objective $imation.
arXiv Detail & Related papers (2021-03-07T16:29:36Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - Convergence Analysis of Homotopy-SGD for non-convex optimization [43.71213126039448]
We present a first-order algorithm based on a combination of homotopy methods and SGD, called Gradienty-Stoch Descent (H-SGD)
Under some assumptions, we conduct a theoretical analysis of the proposed problem.
Experimental results show that H-SGD can outperform SGD.
arXiv Detail & Related papers (2020-11-20T09:50:40Z) - An adaptive stochastic gradient-free approach for high-dimensional
blackbox optimization [0.0]
We propose an adaptive gradient-free (ASGF) approach for high-dimensional non-smoothing problems.
We illustrate the performance of this method on benchmark global problems and learning tasks.
arXiv Detail & Related papers (2020-06-18T22:47:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.