Multi-block-Single-probe Variance Reduced Estimator for Coupled
Compositional Optimization
- URL: http://arxiv.org/abs/2207.08540v1
- Date: Mon, 18 Jul 2022 12:03:26 GMT
- Title: Multi-block-Single-probe Variance Reduced Estimator for Coupled
Compositional Optimization
- Authors: Wei Jiang, Gang Li, Yibo Wang, Lijun Zhang, Tianbao Yang
- Abstract summary: We propose a novel method named Multi-block-probe Variance Reduced (MSVR) to alleviate the complexity of compositional problems.
Our results improve upon prior ones in several aspects, including the order of sample complexities and dependence on strongity.
- Score: 49.58290066287418
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Variance reduction techniques such as SPIDER/SARAH/STORM have been
extensively studied to improve the convergence rates of stochastic non-convex
optimization, which usually maintain and update a sequence of estimators for a
single function across iterations. {\it What if we need to track multiple
functional mappings across iterations but only with access to stochastic
samples of $\mathcal{O}(1)$ functional mappings at each iteration?} There is an
important application in solving an emerging family of coupled compositional
optimization problems in the form of $\sum_{i=1}^m f_i(g_i(\mathbf{w}))$, where
$g_i$ is accessible through a stochastic oracle. The key issue is to track and
estimate a sequence of $\mathbf g(\mathbf{w})=(g_1(\mathbf{w}), \ldots,
g_m(\mathbf{w}))$ across iterations, where $\mathbf g(\mathbf{w})$ has $m$
blocks and it is only allowed to probe $\mathcal{O}(1)$ blocks to attain their
stochastic values and Jacobians. To improve the complexity for solving these
problems, we propose a novel stochastic method named Multi-block-Single-probe
Variance Reduced (MSVR) estimator to track the sequence of $\mathbf
g(\mathbf{w})$. It is inspired by STORM but introduces a customized error
correction term to alleviate the noise not only in stochastic samples for the
selected blocks but also in those blocks that are not sampled. With the help of
the MSVR estimator, we develop several algorithms for solving the
aforementioned compositional problems with improved complexities across a
spectrum of settings with non-convex/convex/strongly convex objectives. Our
results improve upon prior ones in several aspects, including the order of
sample complexities and dependence on the strong convexity parameter. Empirical
studies on multi-task deep AUC maximization demonstrate the better performance
of using the new estimator.
Related papers
- Obtaining Lower Query Complexities through Lightweight Zeroth-Order Proximal Gradient Algorithms [65.42376001308064]
We propose two variance reduced ZO estimators for complex gradient problems.
We improve the state-of-the-art function complexities from $mathcalOleft(minfracdn1/2epsilon2, fracdepsilon3right)$ to $tildecalOleft(fracdepsilon2right)$.
arXiv Detail & Related papers (2024-10-03T15:04:01Z) - An Oblivious Stochastic Composite Optimization Algorithm for Eigenvalue
Optimization Problems [76.2042837251496]
We introduce two oblivious mirror descent algorithms based on a complementary composite setting.
Remarkably, both algorithms work without prior knowledge of the Lipschitz constant or smoothness of the objective function.
We show how to extend our framework to scale and demonstrate the efficiency and robustness of our methods on large scale semidefinite programs.
arXiv Detail & Related papers (2023-06-30T08:34:29Z) - Stochastic Inexact Augmented Lagrangian Method for Nonconvex Expectation
Constrained Optimization [88.0031283949404]
Many real-world problems have complicated non functional constraints and use a large number of data points.
Our proposed method outperforms an existing method with the previously best-known result.
arXiv Detail & Related papers (2022-12-19T14:48:54Z) - Optimal Algorithms for Stochastic Multi-Level Compositional Optimization [46.77664277596764]
We solve the problem of multi-level compositional optimization, where the objective function is a limitation of multiple but possibly non-optimal functions.
To make use of the Adaptive Multi-level Variance Reduction method (SMVR), we also achieves the same optimal complexities but converges faster in practice.
arXiv Detail & Related papers (2022-02-15T16:02:32Z) - A Projection-free Algorithm for Constrained Stochastic Multi-level
Composition Optimization [12.096252285460814]
We propose a projection-free conditional gradient-type algorithm for composition optimization.
We show that the number of oracles and the linear-minimization oracle required by the proposed algorithm, are of order $mathcalO_T(epsilon-2)$ and $mathcalO_T(epsilon-3)$ respectively.
arXiv Detail & Related papers (2022-02-09T06:05:38Z) - DoCoM: Compressed Decentralized Optimization with Near-Optimal Sample
Complexity [25.775517797956237]
This paper proposes the Doubly Compressed Momentum-assisted tracking algorithm $ttDoCoM$ for communication.
We show that our algorithm outperforms several state-of-the-art algorithms in practice.
arXiv Detail & Related papers (2022-02-01T07:27:34Z) - Randomized Stochastic Variance-Reduced Methods for Stochastic Bilevel
Optimization [62.87181271021217]
We consider non-SBO problems that have many applications in machine learning.
This paper proposes fast randomized algorithms for non-SBO problems.
arXiv Detail & Related papers (2021-05-05T18:28:42Z) - Stochastic Multi-level Composition Optimization Algorithms with
Level-Independent Convergence Rates [12.783783498844022]
We study smooth multi-level composition optimization problems, where the objective function is a nested composition of $T$ functions.
We show that the first algorithm, which is a generalization of citeGhaRuswan20 to the $T$ level case, can achieve a sample complexity of $mathcalO (1/epsilon$6)
This is the first time that such an online algorithm designed for the (un) multi-level setting, obtains the same sample complexity under standard assumptions.
arXiv Detail & Related papers (2020-08-24T15:57:50Z) - Gradient Free Minimax Optimization: Variance Reduction and Faster
Convergence [120.9336529957224]
In this paper, we denote the non-strongly setting on the magnitude of a gradient-free minimax optimization problem.
We show that a novel zeroth-order variance reduced descent algorithm achieves the best known query complexity.
arXiv Detail & Related papers (2020-06-16T17:55:46Z) - Stochastic Proximal Gradient Algorithm with Minibatches. Application to
Large Scale Learning Models [2.384873896423002]
We develop and analyze minibatch variants of gradient algorithm for general composite objective functions with nonsmooth components.
We provide complexity for constant and variable stepsize iteration policies obtaining that, for minibatch size $N$, after $mathcalO(frac1Nepsilon)$ $epsilon-$subity is attained in expected quadratic distance to optimal solution.
arXiv Detail & Related papers (2020-03-30T10:43:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.