Related papers: Smoothing the Edges: Smooth Optimization for Sparse Regularization using Hadamard Overparametrization

Smoothing the Edges: Smooth Optimization for Sparse Regularization using Hadamard Overparametrization

URL: http://arxiv.org/abs/2307.03571v3
Date: Fri, 26 Apr 2024 23:45:41 GMT
Title: Smoothing the Edges: Smooth Optimization for Sparse Regularization using Hadamard Overparametrization
Authors: Chris Kolb, Christian L. Müller, Bernd Bischl, David Rügamer,
Abstract summary: We present a framework for smooth optimization of explicitly regularized objectives for (structured) sparsity. Our method enables fully differentiable approximation-free optimization and is thus compatible with the ubiquitous gradient descent paradigm in deep learning.
Score: 10.009748368458409
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present a framework for smooth optimization of explicitly regularized objectives for (structured) sparsity. These non-smooth and possibly non-convex problems typically rely on solvers tailored to specific models and regularizers. In contrast, our method enables fully differentiable and approximation-free optimization and is thus compatible with the ubiquitous gradient descent paradigm in deep learning. The proposed optimization transfer comprises an overparameterization of selected parameters and a change of penalties. In the overparametrized problem, smooth surrogate regularization induces non-smooth, sparse regularization in the base parametrization. We prove that the surrogate objective is equivalent in the sense that it not only has identical global minima but also matching local minima, thereby avoiding the introduction of spurious solutions. Additionally, our theory establishes results of independent interest regarding matching local minima for arbitrary, potentially unregularized, objectives. We comprehensively review sparsity-inducing parametrizations across different fields that are covered by our general theory, extend their scope, and propose improvements in several aspects. Numerical experiments further demonstrate the correctness and effectiveness of our approach on several sparse learning problems ranging from high-dimensional regression to sparse neural network training.

Related papers

Mirror Descent Under Generalized Smoothness [23.5387392871236]
We introduce a new $ell*$-smoothness concept that measures the norm of Hessian terms of a general norm and its dual. We establish convergence for mirror-descent-type algorithms, matching the rates under the classic smoothness.
arXiv Detail & Related papers (2025-02-02T11:23:10Z)
Error Feedback under $(L_0,L_1)$-Smoothness: Normalization and Momentum [56.37522020675243]
We provide the first proof of convergence for normalized error feedback algorithms across a wide range of machine learning problems. We show that due to their larger allowable stepsizes, our new normalized error feedback algorithms outperform their non-normalized counterparts on various tasks.
arXiv Detail & Related papers (2024-10-22T10:19:27Z)
Distributed Fractional Bayesian Learning for Adaptive Optimization [7.16937736207894]
This paper considers a distributed adaptive optimization problem, where all agents only have access to their local cost functions with a common unknown parameter. We aim to provide valuable insights for addressing parameter uncertainty in distributed optimization problems and simultaneously find the optimal solution.
arXiv Detail & Related papers (2024-04-17T13:09:33Z)
Low-Rank Extragradient Methods for Scalable Semidefinite Optimization [0.0]
We focus on high-dimensional and plausible settings in which the problem admits a low-rank solution. We provide several theoretical results proving that, under these circumstances, the well-known Extragradient method converges to a solution of the constrained optimization problem.
arXiv Detail & Related papers (2024-02-14T10:48:00Z)
Scalable Bayesian Meta-Learning through Generalized Implicit Gradients [64.21628447579772]
Implicit Bayesian meta-learning (iBaML) method broadens the scope of learnable priors, but also quantifies the associated uncertainty. Analytical error bounds are established to demonstrate the precision and efficiency of the generalized implicit gradient over the explicit one.
arXiv Detail & Related papers (2023-03-31T02:10:30Z)
Optimization of Annealed Importance Sampling Hyperparameters [77.34726150561087]
Annealed Importance Sampling (AIS) is a popular algorithm used to estimates the intractable marginal likelihood of deep generative models. We present a parameteric AIS process with flexible intermediary distributions and optimize the bridging distributions to use fewer number of steps for sampling. We assess the performance of our optimized AIS for marginal likelihood estimation of deep generative models and compare it to other estimators.
arXiv Detail & Related papers (2022-09-27T07:58:25Z)
Smooth over-parameterized solvers for non-smooth structured optimization [3.756550107432323]
Non-smoothness encodes structural constraints on the solutions, such as sparsity, group sparsity, low-rank edges and sharp edges. We operate a non-weighted but smooth overparametrization of the underlying nonsmooth optimization problems. Our main contribution is to apply the Variable Projection (VarPro) which defines a new formulation by explicitly minimizing over part of the variables.
arXiv Detail & Related papers (2022-05-03T09:23:07Z)
Distributed Sketching for Randomized Optimization: Exact Characterization, Concentration and Lower Bounds [54.51566432934556]
We consider distributed optimization methods for problems where forming the Hessian is computationally challenging. We leverage randomized sketches for reducing the problem dimensions as well as preserving privacy and improving straggler resilience in asynchronous distributed systems.
arXiv Detail & Related papers (2022-03-18T05:49:13Z)
Analysis of Generalized Bregman Surrogate Algorithms for Nonsmooth Nonconvex Statistical Learning [2.049702429898688]
This paper focuses on a broad Bregman-surrogate framework including the adaptive approximation, mirror, iterative threshold descent, DC programming and many others as examples.
arXiv Detail & Related papers (2021-12-16T20:37:40Z)
Obtaining Adjustable Regularization for Free via Iterate Averaging [43.75491612671571]
Regularization for optimization is a crucial technique to avoid overfitting in machine learning. We establish an averaging scheme that converts the iterates of SGD on an arbitrary strongly convex and smooth objective function to its regularized counterpart. Our approaches can be used for accelerated and preconditioned optimization methods as well.
arXiv Detail & Related papers (2020-08-15T15:28:05Z)
Stochastic batch size for adaptive regularization in deep network optimization [63.68104397173262]
We propose a first-order optimization algorithm incorporating adaptive regularization applicable to machine learning problems in deep learning framework. We empirically demonstrate the effectiveness of our algorithm using an image classification task based on conventional network models applied to commonly used benchmark datasets.
arXiv Detail & Related papers (2020-04-14T07:54:53Z)
Support recovery and sup-norm convergence rates for sparse pivotal estimation [79.13844065776928]
In high dimensional sparse regression, pivotal estimators are estimators for which the optimal regularization parameter is independent of the noise level. We show minimax sup-norm convergence rates for non smoothed and smoothed, single task and multitask square-root Lasso-type estimators.
arXiv Detail & Related papers (2020-01-15T16:11:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.