Stochastic Nested Compositional Bi-level Optimization for Robust Feature
Learning
- URL: http://arxiv.org/abs/2307.05384v1
- Date: Tue, 11 Jul 2023 15:52:04 GMT
- Title: Stochastic Nested Compositional Bi-level Optimization for Robust Feature
Learning
- Authors: Xuxing Chen, Krishnakumar Balasubramanian, Saeed Ghadimi
- Abstract summary: We develop and analyze algorithms for solving nested bi-level optimization problems.
Our proposed algorithm does not rely on matrix complexity or mini-batches.
- Score: 11.236838268731804
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We develop and analyze stochastic approximation algorithms for solving nested
compositional bi-level optimization problems. These problems involve a nested
composition of $T$ potentially non-convex smooth functions in the upper-level,
and a smooth and strongly convex function in the lower-level. Our proposed
algorithm does not rely on matrix inversions or mini-batches and can achieve an
$\epsilon$-stationary solution with an oracle complexity of approximately
$\tilde{O}_T(1/\epsilon^{2})$, assuming the availability of stochastic
first-order oracles for the individual functions in the composition and the
lower-level, which are unbiased and have bounded moments. Here, $\tilde{O}_T$
hides polylog factors and constants that depend on $T$. The key challenge we
address in establishing this result relates to handling three distinct sources
of bias in the stochastic gradients. The first source arises from the
compositional nature of the upper-level, the second stems from the bi-level
structure, and the third emerges due to the utilization of Neumann series
approximations to avoid matrix inversion. To demonstrate the effectiveness of
our approach, we apply it to the problem of robust feature learning for deep
neural networks under covariate shift, showcasing the benefits and advantages
of our methodology in that context.
Related papers
- An Accelerated Algorithm for Stochastic Bilevel Optimization under Unbounded Smoothness [15.656614304616006]
This paper investigates a class of bilevel optimization problems where the upper-level function is non- unbounded smoothness and the lower-level problem is strongly convex.
These problems have significant applications in data learning, such as text classification using neural networks.
arXiv Detail & Related papers (2024-09-28T02:30:44Z) - Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity [59.75300530380427]
We consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries.
We provide the first tight characterization for the rate of the minimax simple regret by developing matching upper and lower bounds.
arXiv Detail & Related papers (2024-06-28T02:56:22Z) - Gradient-Free Methods for Deterministic and Stochastic Nonsmooth
Nonconvex Optimization [94.19177623349947]
Non-smooth non optimization problems emerge in machine learning and business making.
Two core challenges impede the development of efficient methods with finitetime convergence guarantee.
Two-phase versions of GFM and SGFM are also proposed and proven to achieve improved large-deviation results.
arXiv Detail & Related papers (2022-09-12T06:53:24Z) - Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization [116.89941263390769]
We consider the smooth convex-concave bilinearly-coupled saddle-point problem, $min_mathbfxmax_mathbfyF(mathbfx) + H(mathbfx,mathbfy)$, where one has access to first-order oracles for $F$, $G$ as well as the bilinear coupling function $H$.
We present a emphaccelerated gradient-extragradient (AG-EG) descent-ascent algorithm that combines extragrad
arXiv Detail & Related papers (2022-06-17T06:10:20Z) - A Projection-free Algorithm for Constrained Stochastic Multi-level
Composition Optimization [12.096252285460814]
We propose a projection-free conditional gradient-type algorithm for composition optimization.
We show that the number of oracles and the linear-minimization oracle required by the proposed algorithm, are of order $mathcalO_T(epsilon-2)$ and $mathcalO_T(epsilon-3)$ respectively.
arXiv Detail & Related papers (2022-02-09T06:05:38Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Randomized Stochastic Variance-Reduced Methods for Stochastic Bilevel
Optimization [62.87181271021217]
We consider non-SBO problems that have many applications in machine learning.
This paper proposes fast randomized algorithms for non-SBO problems.
arXiv Detail & Related papers (2021-05-05T18:28:42Z) - Lower Complexity Bounds of Finite-Sum Optimization Problems: The Results
and Construction [18.65143269806133]
We consider Proximal Incremental First-order (PIFO) algorithms which have access to gradient and proximal oracle for each individual component.
We develop a novel approach for constructing adversarial problems, which partitions the tridiagonal matrix of classical examples into $n$ groups.
arXiv Detail & Related papers (2021-03-15T11:20:31Z) - Stochastic Multi-level Composition Optimization Algorithms with
Level-Independent Convergence Rates [12.783783498844022]
We study smooth multi-level composition optimization problems, where the objective function is a nested composition of $T$ functions.
We show that the first algorithm, which is a generalization of citeGhaRuswan20 to the $T$ level case, can achieve a sample complexity of $mathcalO (1/epsilon$6)
This is the first time that such an online algorithm designed for the (un) multi-level setting, obtains the same sample complexity under standard assumptions.
arXiv Detail & Related papers (2020-08-24T15:57:50Z) - Nearly Optimal Robust Method for Convex Compositional Problems with
Heavy-Tailed Noise [46.94819585976063]
We propose robust algorithms for solving convex compositional problems of the form $f(E_xi g(cdot; xi)) + r(cdot)$.
To the best of our knowledge, this is the first work that establishes nearly optimal sub-Gaussian confidence bounds for compositional problems under heavy-tailed assumptions.
arXiv Detail & Related papers (2020-06-17T18:36:05Z) - Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions [84.49087114959872]
We provide the first non-asymptotic analysis for finding stationary points of nonsmooth, nonsmooth functions.
In particular, we study Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions.
arXiv Detail & Related papers (2020-02-10T23:23:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.