Accelerated Gradient Methods for Sparse Statistical Learning with
Nonconvex Penalties
- URL: http://arxiv.org/abs/2009.10629v3
- Date: Fri, 7 Jan 2022 20:44:29 GMT
- Title: Accelerated Gradient Methods for Sparse Statistical Learning with
Nonconvex Penalties
- Authors: Kai Yang, Masoud Asgharian, Sahir Bhatnagar
- Abstract summary: Nesterov's accelerated simulation (AG) is a popular technique to optimize objective functions two: a convex loss and a penalty function.
A recent Nesterov's AG proposal generalizes but has never been applied to statistical learning problems.
In this article, we consider the application non AG algorithm to high-dimensional and sparse statistical learning problems.
- Score: 10.913266721195916
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Nesterov's accelerated gradient (AG) is a popular technique to optimize
objective functions comprising two components: a convex loss and a penalty
function. While AG methods perform well for convex penalties, such as the
LASSO, convergence issues may arise when it is applied to nonconvex penalties,
such as SCAD. A recent proposal generalizes Nesterov's AG method to the
nonconvex setting but has never been applied to sparse statistical learning
problems. There are several hyperparameters to be set before running the
proposed algorithm. However, there is no explicit rule as to how the
hyperparameters should be selected. In this article, we consider the
application of this nonconvex AG algorithm to high-dimensional linear and
logistic sparse learning problems, and propose a hyperparameter setting based
on the complexity upper bound to accelerate convergence. We further establish
the rate of convergence and present a simple and useful bound for the damping
sequence. Simulation studies show that convergence can be made, on average,
considerably faster than that of the conventional ISTA algorithm. Our
experiments also show that the proposed method generally outperforms the
current state-of-the-art method in terms of signal recovery.
Related papers
- PROMISE: Preconditioned Stochastic Optimization Methods by Incorporating Scalable Curvature Estimates [17.777466668123886]
We introduce PROMISE ($textbfPr$econditioned $textbfO$ptimization $textbfM$ethods by $textbfI$ncorporating $textbfS$calable Curvature $textbfE$stimates), a suite of sketching-based preconditioned gradient algorithms.
PROMISE includes preconditioned versions of SVRG, SAGA, and Katyusha.
arXiv Detail & Related papers (2023-09-05T07:49:10Z) - Smoothing ADMM for Sparse-Penalized Quantile Regression with Non-Convex
Penalties [8.294148737585543]
This paper investigates concave and clipped quantile regression in the presence of nonsecondary absolute and non-smooth convergence penalties.
We introduce a novel-loop ADM algorithm with an increasing penalty multiplier, named SIAD, specifically for sparse regression.
arXiv Detail & Related papers (2023-09-04T21:48:51Z) - Stochastic Inexact Augmented Lagrangian Method for Nonconvex Expectation
Constrained Optimization [88.0031283949404]
Many real-world problems have complicated non functional constraints and use a large number of data points.
Our proposed method outperforms an existing method with the previously best-known result.
arXiv Detail & Related papers (2022-12-19T14:48:54Z) - Faster One-Sample Stochastic Conditional Gradient Method for Composite
Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms.
The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z) - Faster Algorithm and Sharper Analysis for Constrained Markov Decision
Process [56.55075925645864]
The problem of constrained decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated discounted reward subject to multiple constraints.
A new utilities-dual convex approach is proposed with novel integration of three ingredients: regularized policy, dual regularizer, and Nesterov's gradient descent dual.
This is the first demonstration that nonconcave CMDP problems can attain the lower bound of $mathcal O (1/epsilon)$ for all complexity optimization subject to convex constraints.
arXiv Detail & Related papers (2021-10-20T02:57:21Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - A Unified Analysis of First-Order Methods for Smooth Games via Integral
Quadratic Constraints [10.578409461429626]
In this work, we adapt the integral quadratic constraints theory to first-order methods for smooth and strongly-varying games and iteration.
We provide emphfor the first time a global convergence rate for the negative momentum method(NM) with an complexity $mathcalO(kappa1.5)$, which matches its known lower bound.
We show that it is impossible for an algorithm with one step of memory to achieve acceleration if it only queries the gradient once per batch.
arXiv Detail & Related papers (2020-09-23T20:02:00Z) - Proximal Gradient Algorithm with Momentum and Flexible Parameter Restart
for Nonconvex Optimization [73.38702974136102]
Various types of parameter restart schemes have been proposed for accelerated algorithms to facilitate their practical convergence in rates.
In this paper, we propose an algorithm for solving nonsmooth problems.
arXiv Detail & Related papers (2020-02-26T16:06:27Z) - Global Convergence and Variance-Reduced Optimization for a Class of
Nonconvex-Nonconcave Minimax Problems [39.13924898972611]
Non minimaxewicz problems appear frequently in emerging machine learning applications generative adversarial networks and adversarial learning.
GDA algorithms with constant size can potentially diverge even in the convex setting.
AGDA algorithm converges globally at a rate that attains a sub rate.
arXiv Detail & Related papers (2020-02-22T04:20:37Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.