Smooth Bilevel Programming for Sparse Regularization
- URL: http://arxiv.org/abs/2106.01429v1
- Date: Wed, 2 Jun 2021 19:18:22 GMT
- Title: Smooth Bilevel Programming for Sparse Regularization
- Authors: Clarice Poon and Gabriel Peyr\'e
- Abstract summary: Iteratively reweighted least square (IRLS) is a popular approach to solve sparsity-enforcing regression problems in machine learning.
We show how a surprisingly reparametrization of IRLS, coupled with a bilevel scheme, achieves topranging of sparsity.
- Score: 5.177947445379688
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Iteratively reweighted least square (IRLS) is a popular approach to solve
sparsity-enforcing regression problems in machine learning. State of the art
approaches are more efficient but typically rely on specific coordinate pruning
schemes. In this work, we show how a surprisingly simple reparametrization of
IRLS, coupled with a bilevel resolution (instead of an alternating scheme) is
able to achieve top performances on a wide range of sparsity (such as Lasso,
group Lasso and trace norm regularizations), regularization strength (including
hard constraints), and design matrices (ranging from correlated designs to
differential operators). Similarly to IRLS, our method only involves linear
systems resolutions, but in sharp contrast, corresponds to the minimization of
a smooth function. Despite being non-convex, we show that there is no spurious
minima and that saddle points are "ridable", so that there always exists a
descent direction. We thus advocate for the use of a BFGS quasi-Newton solver,
which makes our approach simple, robust and efficient. We perform a numerical
benchmark of the convergence speed of our algorithm against state of the art
solvers for Lasso, group Lasso, trace norm and linearly constrained problems.
These results highlight the versatility of our approach, removing the need to
use different solvers depending on the specificity of the ML problem under
study.
Related papers
- Constrained Optimization via Exact Augmented Lagrangian and Randomized
Iterative Sketching [55.28394191394675]
We develop an adaptive inexact Newton method for equality-constrained nonlinear, nonIBS optimization problems.
We demonstrate the superior performance of our method on benchmark nonlinear problems, constrained logistic regression with data from LVM, and a PDE-constrained problem.
arXiv Detail & Related papers (2023-05-28T06:33:37Z) - Infeasible Deterministic, Stochastic, and Variance-Reduction Algorithms for Optimization under Orthogonality Constraints [9.301728976515255]
This article provides new practical and theoretical developments for the landing algorithm.
First, the method is extended to the Stiefel manifold.
We also consider variance reduction algorithms when the cost function is an average of many functions.
arXiv Detail & Related papers (2023-03-29T07:36:54Z) - Smooth over-parameterized solvers for non-smooth structured optimization [3.756550107432323]
Non-smoothness encodes structural constraints on the solutions, such as sparsity, group sparsity, low-rank edges and sharp edges.
We operate a non-weighted but smooth overparametrization of the underlying nonsmooth optimization problems.
Our main contribution is to apply the Variable Projection (VarPro) which defines a new formulation by explicitly minimizing over part of the variables.
arXiv Detail & Related papers (2022-05-03T09:23:07Z) - Solving weakly supervised regression problem using low-rank manifold
regularization [77.34726150561087]
We solve a weakly supervised regression problem.
Under "weakly" we understand that for some training points the labels are known, for some unknown, and for others uncertain due to the presence of random noise or other reasons such as lack of resources.
In the numerical section, we applied the suggested method to artificial and real datasets using Monte-Carlo modeling.
arXiv Detail & Related papers (2021-04-13T23:21:01Z) - Hybrid Trilinear and Bilinear Programming for Aligning Partially
Overlapping Point Sets [85.71360365315128]
In many applications, we need algorithms which can align partially overlapping point sets are invariant to the corresponding corresponding RPM algorithm.
We first show that the objective is a cubic bound function. We then utilize the convex envelopes of trilinear and bilinear monomial transformations to derive its lower bound.
We next develop a branch-and-bound (BnB) algorithm which only branches over the transformation variables and runs efficiently.
arXiv Detail & Related papers (2021-01-19T04:24:23Z) - Conditional gradient methods for stochastically constrained convex
minimization [54.53786593679331]
We propose two novel conditional gradient-based methods for solving structured convex optimization problems.
The most important feature of our framework is that only a subset of the constraints is processed at each iteration.
Our algorithms rely on variance reduction and smoothing used in conjunction with conditional gradient steps, and are accompanied by rigorous convergence guarantees.
arXiv Detail & Related papers (2020-07-07T21:26:35Z) - Physarum Powered Differentiable Linear Programming Layers and
Applications [48.77235931652611]
We propose an efficient and differentiable solver for general linear programming problems.
We show the use of our solver in a video segmentation task and meta-learning for few-shot learning.
arXiv Detail & Related papers (2020-04-30T01:50:37Z) - Stochastic gradient algorithms from ODE splitting perspective [0.0]
We present a different view on optimization, which goes back to the splitting schemes for approximate solutions of ODE.
In this work, we provide a connection between descent approach and gradient first-order splitting scheme for ODE.
We consider the special case of splitting, which is inspired by machine learning applications and derive a new upper bound on the global splitting error for it.
arXiv Detail & Related papers (2020-04-19T22:45:32Z) - Multi-Objective Matrix Normalization for Fine-grained Visual Recognition [153.49014114484424]
Bilinear pooling achieves great success in fine-grained visual recognition (FGVC)
Recent methods have shown that the matrix power normalization can stabilize the second-order information in bilinear features.
We propose an efficient Multi-Objective Matrix Normalization (MOMN) method that can simultaneously normalize a bilinear representation.
arXiv Detail & Related papers (2020-03-30T08:40:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.