Related papers: Fixed-Point Automatic Differentiation of Forward--Backward Splitting Algorithms for Partly Smooth Functions

Fixed-Point Automatic Differentiation of Forward--Backward Splitting Algorithms for Partly Smooth Functions

URL: http://arxiv.org/abs/2208.03107v3
Date: Thu, 24 Oct 2024 18:04:35 GMT
Title: Fixed-Point Automatic Differentiation of Forward--Backward Splitting Algorithms for Partly Smooth Functions
Authors: Sheheryar Mehmood, Peter Ochs,
Abstract summary: Implicit (ID) and Automatic Differentiation (AD) are applied to the fixed-point iterations of proximal splitting algorithms. We show that AD of the sequence generated by these algorithms converges to the derivative of the solution mapping. For a variant of automatic differentiation, which we call Fixed-Point Automatic Differentiation (FPAD), we remedy the memory overhead problem of the Reverse Mode AD.
Score: 4.389150156866014
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A large class of non-smooth practical optimization problems can be written as minimization of a sum of smooth and partly smooth functions. We examine such structured problems which also depend on a parameter vector and study the problem of differentiating its solution mapping with respect to the parameter which has far reaching applications in sensitivity analysis and parameter learning problems. Under partial smoothness and other mild assumptions, we apply Implicit (ID) and Automatic Differentiation (AD) to the fixed-point iterations of proximal splitting algorithms. We show that AD of the sequence generated by these algorithms converges (linearly under further assumptions) to the derivative of the solution mapping. For a variant of automatic differentiation, which we call Fixed-Point Automatic Differentiation (FPAD), we remedy the memory overhead problem of the Reverse Mode AD and moreover provide faster convergence theoretically. We numerically illustrate the convergence and convergence rates of AD and FPAD on Lasso and Group Lasso problems and demonstrate the working of FPAD on prototypical image denoising problems by learning the regularization term.

Related papers

A Robust Algorithm for Non-IID Machine Learning Problems with Convergence Analysis [2.4462606119036456]
We propose an improved numerical algorithm for solving minimax problems based on nonsmooth optimization, quadratic programming and iterative process.<n>Such an algorithm can be widely applied in various fields such as robust optimization, imbalanced learning, etc.
arXiv Detail & Related papers (2025-07-01T14:41:59Z)
The Distributionally Robust Optimization Model of Sparse Principal Component Analysis [7.695578200868269]
We consider sparse principal component analysis (PCA) under a setting where the underlying probability distribution of the random parameter is uncertain. This problem is formulated as a distributionally robust optimization (DRO) model based on a constructive approach to capturing uncertainty. We prove that the inner problem admits a closed-form solution, reformulating the original DRO model into an equivalent minimization problem on the Stiefel manifold.
arXiv Detail & Related papers (2025-03-04T11:00:08Z)
A Globally Convergent Algorithm for Neural Network Parameter Optimization Based on Difference-of-Convex Functions [29.58728073957055]
We propose an algorithm for optimizing parameters of hidden layer networks. Specifically, we derive a blockwise (DC-of-the-art) difference function.
arXiv Detail & Related papers (2024-01-15T19:53:35Z)
Moreau Envelope ADMM for Decentralized Weakly Convex Optimization [55.2289666758254]
This paper proposes a proximal variant of the alternating direction method of multipliers (ADMM) for distributed optimization. The results of our numerical experiments indicate that our method is faster and more robust than widely-used approaches.
arXiv Detail & Related papers (2023-08-31T14:16:30Z)
Linearization Algorithms for Fully Composite Optimization [61.20539085730636]
This paper studies first-order algorithms for solving fully composite optimization problems convex compact sets. We leverage the structure of the objective by handling differentiable and non-differentiable separately, linearizing only the smooth parts.
arXiv Detail & Related papers (2023-02-24T18:41:48Z)
Adaptive Stochastic Optimisation of Nonconvex Composite Objectives [2.1700203922407493]
We propose and analyse a family of generalised composite mirror descent algorithms. With adaptive step sizes, the proposed algorithms converge without requiring prior knowledge of the problem. We exploit the low-dimensional structure of the decision sets for high-dimensional problems.
arXiv Detail & Related papers (2022-11-21T18:31:43Z)
Learning to Optimize with Stochastic Dominance Constraints [103.26714928625582]
In this paper, we develop a simple yet efficient approach for the problem of comparing uncertain quantities. We recast inner optimization in the Lagrangian as a learning problem for surrogate approximation, which bypasses apparent intractability. The proposed light-SD demonstrates superior performance on several representative problems ranging from finance to supply chain management.
arXiv Detail & Related papers (2022-11-14T21:54:31Z)
TiAda: A Time-scale Adaptive Algorithm for Nonconvex Minimax Optimization [24.784754071913255]
Adaptive methods have shown their ability to adjust the stepsizes on the fly in a parameter-agnostic manner. Current convergence analyses of gradient ascent for non-concave minimax problems require careful tuning of parameters.
arXiv Detail & Related papers (2022-10-31T17:05:36Z)
Continuation Path with Linear Convergence Rate [18.405645120971496]
Path-following algorithms are frequently used in composite optimization problems where a series of subproblems are solved sequentially. We present a primal dual analysis of the path-following algorithm as well as determining how accurately each subproblem should be solved to guarantee a linear convergence rate on a target problem. Considering optimization with a sparsity-inducing penalty, we analyze the change of the active sets with respect to the regularization parameter.
arXiv Detail & Related papers (2021-12-09T18:42:13Z)
Two-Stage Stochastic Optimization via Primal-Dual Decomposition and Deep Unrolling [86.85697555068168]
Two-stage algorithmic optimization plays a critical role in various engineering and scientific applications. There still lack efficient algorithms, especially when the long-term and short-term variables are coupled in the constraints. We show that PDD-SSCA can achieve superior performance over existing solutions.
arXiv Detail & Related papers (2021-05-05T03:36:00Z)
Efficient Consensus Model based on Proximal Gradient Method applied to Convolutional Sparse Problems [2.335152769484957]
We derive and detail a theoretical analysis of an efficient consensus algorithm based on gradient proximal (PG) approach. The proposed algorithm is also applied to another particular convolutional problem for the anomaly detection task.
arXiv Detail & Related papers (2020-11-19T20:52:48Z)
A Multi-Agent Primal-Dual Strategy for Composite Optimization over Distributed Features [52.856801164425086]
We study multi-agent sharing optimization problems with the objective function being the sum of smooth local functions plus a convex (possibly non-smooth) coupling function.
arXiv Detail & Related papers (2020-06-15T19:40:24Z)
Convergence of adaptive algorithms for weakly convex constrained optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope. Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.