DADA: Dual Averaging with Distance Adaptation
- URL: http://arxiv.org/abs/2501.10258v1
- Date: Fri, 17 Jan 2025 15:40:03 GMT
- Title: DADA: Dual Averaging with Distance Adaptation
- Authors: Mohammad Moshtaghifar, Anton Rodomanov, Daniil Vankov, Sebastian Stich,
- Abstract summary: We present a novel universal gradient method for solving convex optimization problems.
Our algorithm -- Dual Averaging with Distance Adaptation (DADA) -- is based on the classical scheme of dual averaging.
DADA is applicable to both unconstrained and constrained problems, even when the domain is unbounded.
- Score: 0.0
- License:
- Abstract: We present a novel universal gradient method for solving convex optimization problems. Our algorithm -- Dual Averaging with Distance Adaptation (DADA) -- is based on the classical scheme of dual averaging and dynamically adjusts its coefficients based on observed gradients and the distance between iterates and the starting point, eliminating the need for problem-specific parameters. DADA is a universal algorithm that simultaneously works for a broad spectrum of problem classes, provided the local growth of the objective function around its minimizer can be bounded. Particular examples of such problem classes are nonsmooth Lipschitz functions, Lipschitz-smooth functions, H\"older-smooth functions, functions with high-order Lipschitz derivative, quasi-self-concordant functions, and $(L_0,L_1)$-smooth functions. Crucially, DADA is applicable to both unconstrained and constrained problems, even when the domain is unbounded, without requiring prior knowledge of the number of iterations or desired accuracy.
Related papers
- New Lower Bounds for Stochastic Non-Convex Optimization through Divergence Composition [11.530542389959347]
We present fundamental limits of first-order optimization in a range of non-dimensional settings, including L-Convexity (QC), Quadratic Growth (smoothQG), and Restricted Inequalities (RSI)
arXiv Detail & Related papers (2025-02-19T19:21:00Z) - Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity [59.75300530380427]
We consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries.
We provide the first tight characterization for the rate of the minimax simple regret by developing matching upper and lower bounds.
arXiv Detail & Related papers (2024-06-28T02:56:22Z) - Optimization on a Finer Scale: Bounded Local Subgradient Variation Perspective [17.5796206457693]
We study nonsmooth optimization problems under bounded local subgradient variation.
The resulting class of objective encapsulates the classes of objective based on the defined classes.
arXiv Detail & Related papers (2024-03-24T22:42:40Z) - Convex Bi-Level Optimization Problems with Non-smooth Outer Objective
Function [0.0]
We show that Bi-SG tackles bi-level optimization problems and sub-linear rates both in terms of the inner and outer objective functions.
We prove that the distance of the generated sequence to the set of optimal solutions of the bi-level problem converges to zero.
arXiv Detail & Related papers (2023-07-17T05:03:53Z) - Learning Globally Smooth Functions on Manifolds [94.22412028413102]
Learning smooth functions is generally challenging, except in simple cases such as learning linear or kernel models.
This work proposes to overcome these obstacles by combining techniques from semi-infinite constrained learning and manifold regularization.
We prove that, under mild conditions, this method estimates the Lipschitz constant of the solution, learning a globally smooth solution as a byproduct.
arXiv Detail & Related papers (2022-10-01T15:45:35Z) - Fixed-Point Automatic Differentiation of Forward--Backward Splitting Algorithms for Partly Smooth Functions [4.389150156866014]
Implicit (ID) and Automatic Differentiation (AD) are applied to the fixed-point iterations of proximal splitting algorithms.
We show that AD of the sequence generated by these algorithms converges to the derivative of the solution mapping.
For a variant of automatic differentiation, which we call Fixed-Point Automatic Differentiation (FPAD), we remedy the memory overhead problem of the Reverse Mode AD.
arXiv Detail & Related papers (2022-08-05T11:27:55Z) - Universal Online Convex Optimization Meets Second-order Bounds [74.0120666722487]
We propose a simple strategy for universal online convex optimization.
The key idea is to construct a set of experts to process the original online functions, and deploy a meta-algorithm over the linearized losses.
In this way, we can plug in off-the-shelf online solvers as black-box experts to deliver problem-dependent regret bounds.
arXiv Detail & Related papers (2021-05-08T11:43:49Z) - Two-Stage Stochastic Optimization via Primal-Dual Decomposition and Deep
Unrolling [86.85697555068168]
Two-stage algorithmic optimization plays a critical role in various engineering and scientific applications.
There still lack efficient algorithms, especially when the long-term and short-term variables are coupled in the constraints.
We show that PDD-SSCA can achieve superior performance over existing solutions.
arXiv Detail & Related papers (2021-05-05T03:36:00Z) - Global Convergence of Model Function Based Bregman Proximal Minimization
Algorithms [17.740376367999705]
Lipschitz mapping of a continuously differentiable function plays a crucial role in various optimization algorithms.
We propose a globally convergent algorithm called Model $$L$mad property.
arXiv Detail & Related papers (2020-12-24T08:09:22Z) - A Multi-Agent Primal-Dual Strategy for Composite Optimization over
Distributed Features [52.856801164425086]
We study multi-agent sharing optimization problems with the objective function being the sum of smooth local functions plus a convex (possibly non-smooth) coupling function.
arXiv Detail & Related papers (2020-06-15T19:40:24Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.