A globally convergent fast iterative shrinkage-thresholding algorithm
with a new momentum factor for single and multi-objective convex optimization
- URL: http://arxiv.org/abs/2205.05262v1
- Date: Wed, 11 May 2022 04:26:00 GMT
- Title: A globally convergent fast iterative shrinkage-thresholding algorithm
with a new momentum factor for single and multi-objective convex optimization
- Authors: Hiroki Tanabe, Ellen H. Fukuda, and Nobuo Yamashita
- Abstract summary: We show that our proposed method also has a global convergence rate of $O(1/k2)$ for any $(a,b)$.
We report numerical results with various $(a,b)$, showing that some of these choices give better results than the classical momentum factors.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Convex-composite optimization, which minimizes an objective function
represented by the sum of a differentiable function and a convex one, is widely
used in machine learning and signal/image processing. Fast Iterative Shrinkage
Thresholding Algorithm (FISTA) is a typical method for solving this problem and
has a global convergence rate of $O(1 / k^2)$. Recently, this has been extended
to multi-objective optimization, together with the proof of the $O(1 / k^2)$
global convergence rate. However, its momentum factor is classical, and the
convergence of its iterates has not been proven. In this work, introducing some
additional hyperparameters $(a, b)$, we propose another accelerated proximal
gradient method with a general momentum factor, which is new even for the
single-objective cases. We show that our proposed method also has a global
convergence rate of $O(1/k^2)$ for any $(a,b)$, and further that the generated
sequence of iterates converges to a weak Pareto solution when $a$ is positive,
an essential property for the finite-time manifold identification. Moreover, we
report numerical results with various $(a,b)$, showing that some of these
choices give better results than the classical momentum factors.
Related papers
- Online Learning Guided Quasi-Newton Methods with Global Non-Asymptotic Convergence [20.766358513158206]
We prove a global convergence rate of $O(min1/k,sqrtd/k1.25)$ in terms of the duality gap.
These results are the first global convergence results to demonstrate a provable advantage of a quasi-Newton method over the extragradient method.
arXiv Detail & Related papers (2024-10-03T16:08:16Z) - Adaptive Variance Reduction for Stochastic Optimization under Weaker Assumptions [26.543628010637036]
We introduce a novel adaptive reduction method that achieves an optimal convergence rate of $mathcalO(log T)$ for non- functions.
We also extend the proposed technique to obtain the same optimal rate of $mathcalO(log T)$ for compositional optimization.
arXiv Detail & Related papers (2024-06-04T04:39:51Z) - Stochastic Nonsmooth Convex Optimization with Heavy-Tailed Noises:
High-Probability Bound, In-Expectation Rate and Initial Distance Adaptation [22.758674468435302]
In a heavy-tailed noise regime, the difference between the gradient and the true rate is assumed to have a finite $p-th moment.
This paper provides a comprehensive analysis of nonsmooth convex optimization with heavy-tailed noises.
arXiv Detail & Related papers (2023-03-22T03:05:28Z) - Explicit Second-Order Min-Max Optimization Methods with Optimal Convergence Guarantee [86.05440220344755]
We propose and analyze inexact regularized Newton-type methods for finding a global saddle point of emphcon unconstrained min-max optimization problems.
We show that the proposed methods generate iterates that remain within a bounded set and that the iterations converge to an $epsilon$-saddle point within $O(epsilon-2/3)$ in terms of a restricted function.
arXiv Detail & Related papers (2022-10-23T21:24:37Z) - Zero-Order One-Point Estimate with Distributed Stochastic
Gradient-Tracking Technique [23.63073074337495]
We consider a distributed multi-agent optimization problem where each agent holds a local objective function that is smooth and convex.
We extend the distributed gradient-tracking method to the bandit setting where we don't have an estimate of the gradient.
We analyze the convergence of this novel technique for smooth and convex objectives using approximation tools.
arXiv Detail & Related papers (2022-10-11T17:04:45Z) - Faster Algorithm and Sharper Analysis for Constrained Markov Decision
Process [56.55075925645864]
The problem of constrained decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated discounted reward subject to multiple constraints.
A new utilities-dual convex approach is proposed with novel integration of three ingredients: regularized policy, dual regularizer, and Nesterov's gradient descent dual.
This is the first demonstration that nonconcave CMDP problems can attain the lower bound of $mathcal O (1/epsilon)$ for all complexity optimization subject to convex constraints.
arXiv Detail & Related papers (2021-10-20T02:57:21Z) - Finding Global Minima via Kernel Approximations [90.42048080064849]
We consider the global minimization of smooth functions based solely on function evaluations.
In this paper, we consider an approach that jointly models the function to approximate and finds a global minimum.
arXiv Detail & Related papers (2020-12-22T12:59:30Z) - Accelerate the Warm-up Stage in the Lasso Computation via a Homotopic
Approach [2.538209532048867]
Homotopic method is used to approximate the $ell_1$ penalty that is used in the Lasso-type of estimators.
We prove a faster numerical convergence rate than any existing methods in computing for the Lasso-type of estimators.
arXiv Detail & Related papers (2020-10-26T22:37:49Z) - Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and
Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP)
We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z) - Revisiting SGD with Increasingly Weighted Averaging: Optimization and
Generalization Perspectives [50.12802772165797]
The averaging technique combines all iterative solutions into a single solution.
Experiments have demonstrated trade-off and the effectiveness of averaging compared with other averaging schemes.
arXiv Detail & Related papers (2020-03-09T18:14:00Z) - A Simple Convergence Proof of Adam and Adagrad [74.24716715922759]
We show a proof of convergence between the Adam Adagrad and $O(d(N)/st)$ algorithms.
Adam converges with the same convergence $O(d(N)/st)$ when used with the default parameters.
arXiv Detail & Related papers (2020-03-05T01:56:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.