Gradient flows and randomised thresholding: sparse inversion and
classification
- URL: http://arxiv.org/abs/2203.11555v1
- Date: Tue, 22 Mar 2022 09:21:14 GMT
- Title: Gradient flows and randomised thresholding: sparse inversion and
classification
- Authors: Jonas Latz
- Abstract summary: Sparse inversion and classification problems are ubiquitous in modern data science and imaging.
In classification, we consider, e.g., the sum of a data fidelity term and a non-smooth Ginzburg--Landau energy.
Standard (sub)gradient descent methods have shown to be inefficient when approaching such problems.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sparse inversion and classification problems are ubiquitous in modern data
science and imaging. They are often formulated as non-smooth minimisation
problems. In sparse inversion, we minimise, e.g., the sum of a data fidelity
term and an L1/LASSO regulariser. In classification, we consider, e.g., the sum
of a data fidelity term and a non-smooth Ginzburg--Landau energy. Standard
(sub)gradient descent methods have shown to be inefficient when approaching
such problems. Splitting techniques are much more useful: here, the target
function is partitioned into a sum of two subtarget functions -- each of which
can be efficiently optimised. Splitting proceeds by performing optimisation
steps alternately with respect to each of the two subtarget functions.
In this work, we study splitting from a stochastic continuous-time
perspective. Indeed, we define a differential inclusion that follows one of the
two subtarget function's negative subgradient at each point in time. The choice
of the subtarget function is controlled by a binary continuous-time Markov
process. The resulting dynamical system is a stochastic approximation of the
underlying subgradient flow. We investigate this stochastic approximation for
an L1-regularised sparse inversion flow and for a discrete Allen-Cahn equation
minimising a Ginzburg--Landau energy. In both cases, we study the longtime
behaviour of the stochastic dynamical system and its ability to approximate the
underlying subgradient flow at any accuracy. We illustrate our theoretical
findings in a simple sparse estimation problem and also in a low-dimensional
classification problem.
Related papers
- Efficient Low-rank Identification via Accelerated Iteratively Reweighted Nuclear Norm Minimization [8.879403568685499]
We introduce an adaptive updating strategy for smoothing parameters.
This behavior transforms the algorithm into one that effectively solves problems after a few iterations.
We prove the global proposed experiment, guaranteeing that every iteration is a critical one.
arXiv Detail & Related papers (2024-06-22T02:37:13Z) - A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimiax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - On Learning Gaussian Multi-index Models with Gradient Flow [57.170617397894404]
We study gradient flow on the multi-index regression problem for high-dimensional Gaussian data.
We consider a two-timescale algorithm, whereby the low-dimensional link function is learnt with a non-parametric model infinitely faster than the subspace parametrizing the low-rank projection.
arXiv Detail & Related papers (2023-10-30T17:55:28Z) - Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with
Variance Reduction and its Application to Optimization [50.83356836818667]
gradient Langevin Dynamics is one of the most fundamental algorithms to solve non-eps optimization problems.
In this paper, we show two variants of this kind, namely the Variance Reduced Langevin Dynamics and the Recursive Gradient Langevin Dynamics.
arXiv Detail & Related papers (2022-03-30T11:39:00Z) - Subgradient methods near active manifolds: saddle point avoidance, local
convergence, and asymptotic normality [4.709588811674973]
We show that aiming and subgradient approximation fully expose the smooth substructure of the problem.
We prove these properties hold for a wide class of problems, including cone reducible/decomposable functions and generic semialgebraic problems.
The normality results appear to be new even in the most classical setting.
arXiv Detail & Related papers (2021-08-26T15:02:16Z) - A Retrospective Approximation Approach for Smooth Stochastic
Optimization [0.2867517731896504]
Gradient (SG) is the defactorandom iterative technique to solve optimization (SO) problems with a smooth (non-fimation) objective $imation.
arXiv Detail & Related papers (2021-03-07T16:29:36Z) - A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm
for Bilevel Optimization [112.59170319105971]
We propose a new algorithm -- the Momentum- Single-timescale Approximation (MSTSA) -- for tackling problems.
MSTSA allows us to control the error in iterations due to inaccurate solution to the lower level subproblem.
arXiv Detail & Related papers (2021-02-15T07:10:33Z) - Exploiting Higher Order Smoothness in Derivative-free Optimization and
Continuous Bandits [99.70167985955352]
We study the problem of zero-order optimization of a strongly convex function.
We consider a randomized approximation of the projected gradient descent algorithm.
Our results imply that the zero-order algorithm is nearly optimal in terms of sample complexity and the problem parameters.
arXiv Detail & Related papers (2020-06-14T10:42:23Z) - Explicit Regularization of Stochastic Gradient Methods through Duality [9.131027490864938]
We propose randomized Dykstra-style algorithms based on randomized dual coordinate ascent.
For accelerated coordinate descent, we obtain a new algorithm that has better convergence properties than existing gradient methods in the interpolating regime.
arXiv Detail & Related papers (2020-03-30T20:44:56Z) - The Wasserstein Proximal Gradient Algorithm [23.143814848127295]
Wasserstein gradient flows are continuous time dynamics that define curves of steepest descent to minimize an objective function over the space of probability measures.
We propose a Forward Backward (FB) discretization scheme that can tackle the case where the objective function is the sum of a smooth and a nonsmooth geodesically convex terms.
arXiv Detail & Related papers (2020-02-07T22:19:32Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.