Smooth over-parameterized solvers for non-smooth structured optimization
- URL: http://arxiv.org/abs/2205.01385v1
- Date: Tue, 3 May 2022 09:23:07 GMT
- Title: Smooth over-parameterized solvers for non-smooth structured optimization
- Authors: Clarice Poon and Gabriel Peyr\'e
- Abstract summary: Non-smoothness encodes structural constraints on the solutions, such as sparsity, group sparsity, low-rank edges and sharp edges.
We operate a non-weighted but smooth overparametrization of the underlying nonsmooth optimization problems.
Our main contribution is to apply the Variable Projection (VarPro) which defines a new formulation by explicitly minimizing over part of the variables.
- Score: 3.756550107432323
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Non-smooth optimization is a core ingredient of many imaging or machine
learning pipelines. Non-smoothness encodes structural constraints on the
solutions, such as sparsity, group sparsity, low-rank and sharp edges. It is
also the basis for the definition of robust loss functions and scale-free
functionals such as square-root Lasso. Standard approaches to deal with
non-smoothness leverage either proximal splitting or coordinate descent. These
approaches are effective but usually require parameter tuning, preconditioning
or some sort of support pruning. In this work, we advocate and study a
different route, which operates a non-convex but smooth over-parametrization of
the underlying non-smooth optimization problems. This generalizes quadratic
variational forms that are at the heart of the popular Iterative Reweighted
Least Squares (IRLS). Our main theoretical contribution connects gradient
descent on this reformulation to a mirror descent flow with a varying Hessian
metric. This analysis is crucial to derive convergence bounds that are
dimension-free. This explains the efficiency of the method when using small
grid sizes in imaging. Our main algorithmic contribution is to apply the
Variable Projection (VarPro) method which defines a new formulation by
explicitly minimizing over part of the variables. This leads to a better
conditioning of the minimized functional and improves the convergence of simple
but very efficient gradient-based methods, for instance quasi-Newton solvers.
We exemplify the use of this new solver for the resolution of regularized
regression problems for inverse problems and supervised learning, including
total variation prior and non-convex regularizers.
Related papers
- Gradient-free neural topology optimization [0.0]
gradient-free algorithms require many more iterations to converge when compared to gradient-based algorithms.
This has made them unviable for topology optimization due to the high computational cost per iteration and high dimensionality of these problems.
We propose a pre-trained neural reparameterization strategy that leads to at least one order of magnitude decrease in iteration count when optimizing the designs in latent space.
arXiv Detail & Related papers (2024-03-07T23:00:49Z) - Constrained Optimization via Exact Augmented Lagrangian and Randomized
Iterative Sketching [55.28394191394675]
We develop an adaptive inexact Newton method for equality-constrained nonlinear, nonIBS optimization problems.
We demonstrate the superior performance of our method on benchmark nonlinear problems, constrained logistic regression with data from LVM, and a PDE-constrained problem.
arXiv Detail & Related papers (2023-05-28T06:33:37Z) - Infeasible Deterministic, Stochastic, and Variance-Reduction Algorithms for Optimization under Orthogonality Constraints [9.301728976515255]
This article provides new practical and theoretical developments for the landing algorithm.
First, the method is extended to the Stiefel manifold.
We also consider variance reduction algorithms when the cost function is an average of many functions.
arXiv Detail & Related papers (2023-03-29T07:36:54Z) - A framework for bilevel optimization that enables stochastic and global
variance reduction algorithms [17.12280360174073]
Bilevel optimization is a problem of minimizing a value function which involves the arg-minimum of another function.
We introduce a novel framework, in which the solution of the inner problem, the solution of the linear system, and the main variable evolve at the same time.
We demonstrate that SABA, an adaptation of the celebrated SAGA algorithm in our framework, has $O(frac1T)$ convergence rate, and that it achieves linear convergence under Polyak-Lojasciewicz assumption.
arXiv Detail & Related papers (2022-01-31T18:17:25Z) - STORM+: Fully Adaptive SGD with Momentum for Nonconvex Optimization [74.1615979057429]
We investigate non-batch optimization problems where the objective is an expectation over smooth loss functions.
Our work builds on the STORM algorithm, in conjunction with a novel approach to adaptively set the learning rate and momentum parameters.
arXiv Detail & Related papers (2021-11-01T15:43:36Z) - Smooth Bilevel Programming for Sparse Regularization [5.177947445379688]
Iteratively reweighted least square (IRLS) is a popular approach to solve sparsity-enforcing regression problems in machine learning.
We show how a surprisingly reparametrization of IRLS, coupled with a bilevel scheme, achieves topranging of sparsity.
arXiv Detail & Related papers (2021-06-02T19:18:22Z) - Balancing Rates and Variance via Adaptive Batch-Size for Stochastic
Optimization Problems [120.21685755278509]
In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster in time up to an error.
Rather than fixing the minibatch the step-size at the outset, we propose to allow parameters to evolve adaptively.
arXiv Detail & Related papers (2020-07-02T16:02:02Z) - Hybrid Variance-Reduced SGD Algorithms For Nonconvex-Concave Minimax
Problems [26.24895953952318]
We develop an algorithm to solve a class of non-gence minimax problems.
They can also work with both single or two mini-batch derivatives.
arXiv Detail & Related papers (2020-06-27T03:05:18Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z) - Implicit differentiation of Lasso-type models for hyperparameter
optimization [82.73138686390514]
We introduce an efficient implicit differentiation algorithm, without matrix inversion, tailored for Lasso-type problems.
Our approach scales to high-dimensional data by leveraging the sparsity of the solutions.
arXiv Detail & Related papers (2020-02-20T18:43:42Z) - Support recovery and sup-norm convergence rates for sparse pivotal
estimation [79.13844065776928]
In high dimensional sparse regression, pivotal estimators are estimators for which the optimal regularization parameter is independent of the noise level.
We show minimax sup-norm convergence rates for non smoothed and smoothed, single task and multitask square-root Lasso-type estimators.
arXiv Detail & Related papers (2020-01-15T16:11:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.