Efficient Continual Finite-Sum Minimization
- URL: http://arxiv.org/abs/2406.04731v1
- Date: Fri, 7 Jun 2024 08:26:31 GMT
- Title: Efficient Continual Finite-Sum Minimization
- Authors: Ioannis Mavrothalassitis, Stratis Skoulakis, Leello Tadesse Dadi, Volkan Cevher,
- Abstract summary: We propose a key twist into the finite-sum minimization, dubbed as continual finite-sum minimization.
Our approach significantly improves upon the $mathcalO(n/epsilon)$ FOs that $mathrmStochasticGradientDescent$ requires.
We also prove that there is no natural first-order method with $mathcalOleft(n/epsilonalpharight)$ complexity gradient for $alpha 1/4$, establishing that the first-order complexity of our method is nearly tight.
- Score: 52.5238287567572
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Given a sequence of functions $f_1,\ldots,f_n$ with $f_i:\mathcal{D}\mapsto \mathbb{R}$, finite-sum minimization seeks a point ${x}^\star \in \mathcal{D}$ minimizing $\sum_{j=1}^n f_j(x)/n$. In this work, we propose a key twist into the finite-sum minimization, dubbed as continual finite-sum minimization, that asks for a sequence of points ${x}_1^\star,\ldots,{x}_n^\star \in \mathcal{D}$ such that each ${x}^\star_i \in \mathcal{D}$ minimizes the prefix-sum $\sum_{j=1}^if_j(x)/i$. Assuming that each prefix-sum is strongly convex, we develop a first-order continual stochastic variance reduction gradient method ($\mathrm{CSVRG}$) producing an $\epsilon$-optimal sequence with $\mathcal{\tilde{O}}(n/\epsilon^{1/3} + 1/\sqrt{\epsilon})$ overall first-order oracles (FO). An FO corresponds to the computation of a single gradient $\nabla f_j(x)$ at a given $x \in \mathcal{D}$ for some $j \in [n]$. Our approach significantly improves upon the $\mathcal{O}(n/\epsilon)$ FOs that $\mathrm{StochasticGradientDescent}$ requires and the $\mathcal{O}(n^2 \log (1/\epsilon))$ FOs that state-of-the-art variance reduction methods such as $\mathrm{Katyusha}$ require. We also prove that there is no natural first-order method with $\mathcal{O}\left(n/\epsilon^\alpha\right)$ gradient complexity for $\alpha < 1/4$, establishing that the first-order complexity of our method is nearly tight.
Related papers
- Complexity of Minimizing Projected-Gradient-Dominated Functions with Stochastic First-order Oracles [38.45952947660789]
This work investigates the performance limits of projected first-order methods for minimizing functions under the $(alpha,tau,mathcal)$-projected-dominance property.
arXiv Detail & Related papers (2024-08-03T18:34:23Z) - Fast $(1+\varepsilon)$-Approximation Algorithms for Binary Matrix
Factorization [54.29685789885059]
We introduce efficient $(1+varepsilon)$-approximation algorithms for the binary matrix factorization (BMF) problem.
The goal is to approximate $mathbfA$ as a product of low-rank factors.
Our techniques generalize to other common variants of the BMF problem.
arXiv Detail & Related papers (2023-06-02T18:55:27Z) - Faster Gradient-Free Algorithms for Nonsmooth Nonconvex Stochastic Optimization [20.54801745090522]
We consider the problem of the form $min_x in mathbbRd f(x) triangleq mathbbE_xi [Fxi]$inf(x)$ Lipschitz.
The recently proposed gradient-free method requires at most $mathcalO( L4 d3/2 epsilon-4 + Goldstein L d3/2 delta-1 epsilon-4)$ zeroth-order complexity
arXiv Detail & Related papers (2023-01-16T13:33:37Z) - An Optimal Algorithm for Strongly Convex Min-min Optimization [79.11017157526815]
Existing optimal first-order methods require $mathcalO(sqrtmaxkappa_x,kappa_y log 1/epsilon)$ of computations of both $nabla_x f(x,y)$ and $nabla_y f(x,y)$.
We propose a new algorithm that only requires $mathcalO(sqrtkappa_x log 1/epsilon)$ of computations of $nabla_x f(x,
arXiv Detail & Related papers (2022-12-29T19:26:12Z) - Decentralized Stochastic Variance Reduced Extragradient Method [25.21457349137344]
This paper studies decentralized convex-concave minimax optimization problems of the form $min_xmax_y fx,y triqfrac1msumi=1m f_i triqfrac1msumi=1m f_i triqfrac1msumi=1m f_i triqfrac1msumi=1m f_i triqfrac1msum
arXiv Detail & Related papers (2022-02-01T16:06:20Z) - The planted matching problem: Sharp threshold and infinite-order phase
transition [25.41713098167692]
We study the problem of reconstructing a perfect matching $M*$ hidden in a randomly weighted $ntimes n$ bipartite graph.
We show that if $sqrtd B(mathcalP,mathcalQ) ge 1+epsilon$ for an arbitrarily small constant $epsilon>0$, the reconstruction error for any estimator is shown to be bounded away from $0$.
arXiv Detail & Related papers (2021-03-17T00:59:33Z) - Linear Bandits on Uniformly Convex Sets [88.3673525964507]
Linear bandit algorithms yield $tildemathcalO(nsqrtT)$ pseudo-regret bounds on compact convex action sets.
Two types of structural assumptions lead to better pseudo-regret bounds.
arXiv Detail & Related papers (2021-03-10T07:33:03Z) - Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample
Complexity [59.34067736545355]
Given an MDP with $S$ states, $A$ actions, the discount factor $gamma in (0,1)$, and an approximation threshold $epsilon > 0$, we provide a model-free algorithm to learn an $epsilon$-optimal policy.
For small enough $epsilon$, we show an improved algorithm with sample complexity.
arXiv Detail & Related papers (2020-06-06T13:34:41Z) - On the Complexity of Minimizing Convex Finite Sums Without Using the
Indices of the Individual Functions [62.01594253618911]
We exploit the finite noise structure of finite sums to derive a matching $O(n2)$-upper bound under the global oracle model.
Following a similar approach, we propose a novel adaptation of SVRG which is both emphcompatible with oracles, and achieves complexity bounds of $tildeO(n2+nsqrtL/mu)log (1/epsilon)$ and $O(nsqrtL/epsilon)$, for $mu>0$ and $mu=0$
arXiv Detail & Related papers (2020-02-09T03:39:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.