Performance of $\ell_1$ Regularization for Sparse Convex Optimization
- URL: http://arxiv.org/abs/2307.07405v1
- Date: Fri, 14 Jul 2023 15:31:45 GMT
- Title: Performance of $\ell_1$ Regularization for Sparse Convex Optimization
- Authors: Kyriakos Axiotis, Taisuke Yasuda
- Abstract summary: We give the first recovery guarantees for the Group LASSO for sparse convex optimization with vector-valued features.
Our result is the first to theoretically explain the empirical success of the Group LASSO for convex functions under general input instances.
We give the first result for general loss functions for this problem that requires only restricted strong convexity and smoothness.
- Score: 10.37112414795167
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite widespread adoption in practice, guarantees for the LASSO and Group
LASSO are strikingly lacking in settings beyond statistical problems, and these
algorithms are usually considered to be a heuristic in the context of sparse
convex optimization on deterministic inputs. We give the first recovery
guarantees for the Group LASSO for sparse convex optimization with
vector-valued features. We show that if a sufficiently large Group LASSO
regularization is applied when minimizing a strictly convex function $l$, then
the minimizer is a sparse vector supported on vector-valued features with the
largest $\ell_2$ norm of the gradient. Thus, repeating this procedure selects
the same set of features as the Orthogonal Matching Pursuit algorithm, which
admits recovery guarantees for any function $l$ with restricted strong
convexity and smoothness via weak submodularity arguments. This answers open
questions of Tibshirani et al. and Yasuda et al. Our result is the first to
theoretically explain the empirical success of the Group LASSO for convex
functions under general input instances assuming only restricted strong
convexity and smoothness. Our result also generalizes provable guarantees for
the Sequential Attention algorithm, which is a feature selection algorithm
inspired by the attention mechanism proposed by Yasuda et al.
As an application of our result, we give new results for the column subset
selection problem, which is well-studied when the loss is the Frobenius norm or
other entrywise matrix losses. We give the first result for general loss
functions for this problem that requires only restricted strong convexity and
smoothness.
Related papers
- Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity [59.75300530380427]
We consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries.
We provide the first tight characterization for the rate of the minimax simple regret by developing matching upper and lower bounds.
arXiv Detail & Related papers (2024-06-28T02:56:22Z) - Nonsmooth Projection-Free Optimization with Functional Constraints [12.20060970177791]
This paper presents a subgradient-based algorithm for constrained nonsmooth convex computation.
Our proposed algorithm can handle nonsmooth problems with general convex functional inequality constraints.
Similar performance is observed when deterministic subgradients are replaced with subgradients.
arXiv Detail & Related papers (2023-11-18T23:06:33Z) - Universal Online Learning with Gradient Variations: A Multi-layer Online Ensemble Approach [57.92727189589498]
We propose an online convex optimization approach with two different levels of adaptivity.
We obtain $mathcalO(log V_T)$, $mathcalO(d log V_T)$ and $hatmathcalO(sqrtV_T)$ regret bounds for strongly convex, exp-concave and convex loss functions.
arXiv Detail & Related papers (2023-07-17T09:55:35Z) - Efficient First-order Methods for Convex Optimization with Strongly
Convex Function Constraints [3.667453772837954]
We show how to minimize a convex function subject to strongly convex function constraints.
We identify the sparsity pattern within a finite number result that appears to have independent significance.
arXiv Detail & Related papers (2022-12-21T16:04:53Z) - Faster Projection-Free Augmented Lagrangian Methods via Weak Proximal
Oracle [16.290192687098383]
This paper considers a convex composite optimization problem with affine constraints.
Motivated by high-dimensional applications in which exact projection/proximal computations are not tractable, we propose a textitprojection-free augmented Lagrangian-based method.
arXiv Detail & Related papers (2022-10-25T12:51:43Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - A Simple yet Universal Strategy for Online Convex Optimization [97.64589722655612]
We propose a simple strategy for universal online convex optimization.
The key idea is to construct a set of experts to process the original online functions, and deploy a meta-algorithm over the emphlinearized losses.
Our strategy inherits the theoretical guarantee of any expert designed for strongly convex functions and exponentially concave functions.
arXiv Detail & Related papers (2021-05-08T11:43:49Z) - Recent Theoretical Advances in Non-Convex Optimization [56.88981258425256]
Motivated by recent increased interest in analysis of optimization algorithms for non- optimization in deep networks and other problems in data, we give an overview of recent results of theoretical optimization algorithms for non- optimization.
arXiv Detail & Related papers (2020-12-11T08:28:51Z) - Provably Convergent Working Set Algorithm for Non-Convex Regularized
Regression [0.0]
This paper proposes a working set algorithm for non-regular regularizers with convergence guarantees.
Our results demonstrate high gain compared to the full problem solver for both block-coordinates or a gradient solver.
arXiv Detail & Related papers (2020-06-24T07:40:31Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.