Approximate Frank-Wolfe Algorithms over Graph-structured Support Sets
- URL: http://arxiv.org/abs/2107.00472v1
- Date: Tue, 29 Jun 2021 19:39:43 GMT
- Title: Approximate Frank-Wolfe Algorithms over Graph-structured Support Sets
- Authors: Baojian Zhou, Yifan Sun
- Abstract summary: Two popular approximation assumptions (textitadditive and textitmultiplicative gap errors) are not valid for our problem.
A new textitapproximate dual oracle (DMO) is proposed, which approximates the inner product rather than the gap.
Our empirical results suggest that even these improved bounds are pessimistic, with significant improvement in real-world images with graph-structured sparsity.
- Score: 27.913569257545554
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose approximate Frank-Wolfe (FW) algorithms to solve
convex optimization problems over graph-structured support sets where the
\textit{linear minimization oracle} (LMO) cannot be efficiently obtained in
general. We first demonstrate that two popular approximation assumptions
(\textit{additive} and \textit{multiplicative gap errors)}, are not valid for
our problem, in that no cheap gap-approximate LMO oracle exists in general.
Instead, a new \textit{approximate dual maximization oracle} (DMO) is proposed,
which approximates the inner product rather than the gap. When the objective is
$L$-smooth, we prove that the standard FW method using a $\delta$-approximate
DMO converges as $\mathcal{O}(L / \delta t + (1-\delta)(\delta^{-1} +
\delta^{-2}))$ in general, and as $\mathcal{O}(L/(\delta^2(t+2)))$ over a
$\delta$-relaxation of the constraint set. Additionally, when the objective is
$\mu$-strongly convex and the solution is unique, a variant of FW converges to
$\mathcal{O}(L^2\log(t)/(\mu \delta^6 t^2))$ with the same per-iteration
complexity. Our empirical results suggest that even these improved bounds are
pessimistic, with significant improvement in recovering real-world images with
graph-structured sparsity.
Related papers
- Two-Timescale Gradient Descent Ascent Algorithms for Nonconvex Minimax Optimization [77.3396841985172]
We provide a unified analysis of two-timescale gradient ascent (TTGDA) for solving structured non minimax optimization problems.
Our contribution is to design TTGDA algorithms are effective beyond the setting.
arXiv Detail & Related papers (2024-08-21T20:14:54Z) - Efficient Continual Finite-Sum Minimization [52.5238287567572]
We propose a key twist into the finite-sum minimization, dubbed as continual finite-sum minimization.
Our approach significantly improves upon the $mathcalO(n/epsilon)$ FOs that $mathrmStochasticGradientDescent$ requires.
We also prove that there is no natural first-order method with $mathcalOleft(n/epsilonalpharight)$ complexity gradient for $alpha 1/4$, establishing that the first-order complexity of our method is nearly tight.
arXiv Detail & Related papers (2024-06-07T08:26:31Z) - An Oblivious Stochastic Composite Optimization Algorithm for Eigenvalue
Optimization Problems [76.2042837251496]
We introduce two oblivious mirror descent algorithms based on a complementary composite setting.
Remarkably, both algorithms work without prior knowledge of the Lipschitz constant or smoothness of the objective function.
We show how to extend our framework to scale and demonstrate the efficiency and robustness of our methods on large scale semidefinite programs.
arXiv Detail & Related papers (2023-06-30T08:34:29Z) - Tight Bounds for $\gamma$-Regret via the Decision-Estimation Coefficient [88.86699022151598]
We give a statistical characterization of the $gamma$-regret for arbitrary structured bandit problems.
The $gamma$-regret emerges in structured bandit problems over a function class $mathcalF$ where finding an exact optimum of $f in mathcalF$ is intractable.
arXiv Detail & Related papers (2023-03-06T17:54:33Z) - An Optimal Stochastic Algorithm for Decentralized Nonconvex Finite-sum
Optimization [25.21457349137344]
We show a proof to show DEAREST requires at most $mathcal O(+sqrtmnLvarepsilon-2)$ first-order oracle (IFO) calls and $mathcal O(Lvarepsilon-2/sqrt1-lambda_W)$ communication rounds.
arXiv Detail & Related papers (2022-10-25T11:37:11Z) - Decomposable Non-Smooth Convex Optimization with Nearly-Linear Gradient
Oracle Complexity [15.18055488087588]
We give an algorithm that minimizes the above convex formulation to $epsilon$-accuracy in $widetildeO(sum_i=1n d_i log (1 /epsilon))$ gradient computations.
Our main technical contribution is an adaptive procedure to select an $f_i$ term at every iteration via a novel combination of cutting-plane and interior-point methods.
arXiv Detail & Related papers (2022-08-07T20:53:42Z) - Optimal Gradient Sliding and its Application to Distributed Optimization
Under Similarity [121.83085611327654]
We structured convex optimization problems with additive objective $r:=p + q$, where $r$ is $mu$-strong convex similarity.
We proposed a method to solve problems master to agents' communication and local calls.
The proposed method is much sharper than the $mathcalO(sqrtL_q/mu)$ method.
arXiv Detail & Related papers (2022-05-30T14:28:02Z) - Decentralized Stochastic Variance Reduced Extragradient Method [25.21457349137344]
This paper studies decentralized convex-concave minimax optimization problems of the form $min_xmax_y fx,y triqfrac1msumi=1m f_i triqfrac1msumi=1m f_i triqfrac1msumi=1m f_i triqfrac1msumi=1m f_i triqfrac1msum
arXiv Detail & Related papers (2022-02-01T16:06:20Z) - Lifted Primal-Dual Method for Bilinearly Coupled Smooth Minimax
Optimization [47.27237492375659]
We study the bilinearly coupled minimax problem: $min_x max_y f(x) + ytop A x - h(y)$, where $f$ and $h$ are both strongly convex smooth functions.
No known first-order algorithms have hitherto achieved the lower complexity bound of $Omega(sqrtfracL_xmu_x + frac|A|sqrtmu_x,mu_y) log(frac1vareps
arXiv Detail & Related papers (2022-01-19T05:56:19Z) - The Complexity of Dynamic Least-Squares Regression [11.815510373329337]
complexity of dynamic least-squares regression.
Goal is to maintain an $epsilon-approximate solution to $min_mathbfx(t)| mathbfA(t) mathbfb(t) |$ for all $tin.
arXiv Detail & Related papers (2022-01-01T18:36:17Z) - Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model.
We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.