Debiasing a First-order Heuristic for Approximate Bi-level Optimization
- URL: http://arxiv.org/abs/2106.02487v2
- Date: Tue, 8 Jun 2021 10:16:35 GMT
- Title: Debiasing a First-order Heuristic for Approximate Bi-level Optimization
- Authors: Valerii Likhosherstov, Xingyou Song, Krzysztof Choromanski, Jared
Davis, Adrian Weller
- Abstract summary: Approximate bi-level optimization (ABLO) consists of (outer-level) optimization problems, involving numerical (inner-level) optimization loops.
There is a lack of theoretical understanding of FOM's convergence properties.
We propose an unbiased FOM enjoying constant memory complexity as a function of $r$.
- Score: 38.068090269482425
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Approximate bi-level optimization (ABLO) consists of (outer-level)
optimization problems, involving numerical (inner-level) optimization loops.
While ABLO has many applications across deep learning, it suffers from time and
memory complexity proportional to the length $r$ of its inner optimization
loop. To address this complexity, an earlier first-order method (FOM) was
proposed as a heuristic that omits second derivative terms, yielding
significant speed gains and requiring only constant memory. Despite FOM's
popularity, there is a lack of theoretical understanding of its convergence
properties. We contribute by theoretically characterizing FOM's gradient bias
under mild assumptions. We further demonstrate a rich family of examples where
FOM-based SGD does not converge to a stationary point of the ABLO objective. We
address this concern by proposing an unbiased FOM (UFOM) enjoying constant
memory complexity as a function of $r$. We characterize the introduced
time-variance tradeoff, demonstrate convergence bounds, and find an optimal
UFOM for a given ABLO problem. Finally, we propose an efficient adaptive UFOM
scheme.
Related papers
- Variational Entropic Optimal Transport [67.76725267984578]
We propose Variational Entropic Optimal Transport (VarEOT) for domain translation problems.<n>VarEOT is based on an exact variational reformulation of the log-partition $log mathbbE[exp(cdot)$ as a tractable generalization over an auxiliary positive normalizer.<n> Experiments on synthetic data and unpaired image-to-image translation demonstrate competitive or improved translation quality.
arXiv Detail & Related papers (2026-02-02T15:48:44Z) - Hi-ZFO: Hierarchical Zeroth- and First-Order LLM Fine-Tuning via Importance-Guided Tensor Selection [4.808936079900314]
We propose textbfHi-ZFO (textbfHierarchical textbfZeroth- and textbfFirst-textbfOrder optimization) to synergize FO gradients with ZO estimation.<n>We show that Hi-ZFO consistently achieves superior performance while significantly reducing the training time.
arXiv Detail & Related papers (2026-01-09T03:20:54Z) - VAMO: Efficient Large-Scale Nonconvex Optimization via Adaptive Zeroth Order Variance Reduction [3.130722489512822]
VAMO combines FO mini-batch gradients with ZO finite-difference probes under an ZOG-style framework.<n>VAMO outperforms established FO and ZO methods, offering a faster, more flexible option for improved efficiency.
arXiv Detail & Related papers (2025-05-20T05:31:15Z) - Iterative Interpolation Schedules for Quantum Approximate Optimization Algorithm [1.845978975395919]
We present an iterative method that exploits the smoothness of optimal parameter schedules by expressing them in a basis of functions.
We demonstrate our method achieves better performance with fewer optimization steps than current approaches.
For the largest LABS instance, we achieve near-optimal merit factors with schedules exceeding 1000 layers, an order of magnitude beyond previous methods.
arXiv Detail & Related papers (2025-04-02T12:53:21Z) - Elucidating Subspace Perturbation in Zeroth-Order Optimization: Theory and Practice at Scale [33.38543010618118]
Zeroth-order (ZO) optimization has emerged as a promising alternative to gradient-based backpropagation methods.<n>We show that high dimensionality is the primary bottleneck and introduce the notion of textitsubspace alignment to explain how the subspace perturbations reduce gradient noise and accelerate convergence.<n>We propose an efficient ZO method using block coordinate descent (MeZO-BCD), which perturbs and updates only a subset of parameters at each step.
arXiv Detail & Related papers (2025-01-31T12:46:04Z) - A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning.
These problems are often formalized as Bi-Level optimizations (BLO)
We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z) - Enhancing Zeroth-order Fine-tuning for Language Models with Low-rank Structures [21.18741772731095]
Zeroth-order (ZO) algorithms offer a promising alternative by approximating gradients using finite differences of function values.
Existing ZO methods struggle to capture the low-rank gradient structure common in LLM fine-tuning, leading to suboptimal performance.
This paper proposes a low-rank ZO algorithm (LOZO) that effectively captures this structure in LLMs.
arXiv Detail & Related papers (2024-10-10T08:10:53Z) - Double Variance Reduction: A Smoothing Trick for Composite Optimization Problems without First-Order Gradient [40.22217106270146]
Variance reduction techniques are designed to decrease the sampling variance, thereby accelerating convergence rates of first-order (FO) and zeroth-order (ZO) optimization methods.
In composite optimization problems, ZO methods encounter an additional variance called the coordinate-wise variance, which stems from the random estimation.
This paper proposes the Zeroth-order Proximal Double Variance Reduction (ZPDVR) method, which utilizes the averaging trick to reduce both sampling and coordinate-wise variances.
arXiv Detail & Related papers (2024-05-28T02:27:53Z) - Asymptotically Unbiased Instance-wise Regularized Partial AUC
Optimization: Theory and Algorithm [101.44676036551537]
One-way Partial AUC (OPAUC) and Two-way Partial AUC (TPAUC) measures the average performance of a binary classifier.
Most of the existing methods could only optimize PAUC approximately, leading to inevitable biases that are not controllable.
We present a simpler reformulation of the PAUC problem via distributional robust optimization AUC.
arXiv Detail & Related papers (2022-10-08T08:26:22Z) - Gradient-Free Methods for Deterministic and Stochastic Nonsmooth
Nonconvex Optimization [94.19177623349947]
Non-smooth non optimization problems emerge in machine learning and business making.
Two core challenges impede the development of efficient methods with finitetime convergence guarantee.
Two-phase versions of GFM and SGFM are also proposed and proven to achieve improved large-deviation results.
arXiv Detail & Related papers (2022-09-12T06:53:24Z) - Versatile Single-Loop Method for Gradient Estimator: First and Second
Order Optimality, and its Application to Federated Learning [45.78238792836363]
We present a single-loop algorithm named SLEDGE (Single-Loop-E Gradient Estimator) for periodic convergence.
Unlike existing methods, SLEDGE has the advantage of versatility; (ii) second-order optimal, (ii) in the PL region, and (iii) smaller complexity under less of data.
arXiv Detail & Related papers (2022-09-01T11:05:26Z) - FEDNEST: Federated Bilevel, Minimax, and Compositional Optimization [53.78643974257301]
Many contemporary ML problems fall under nested bilevel programming that subsumes minimax and compositional optimization.
We propose FedNest: A federated alternating gradient method to address general nested problems.
arXiv Detail & Related papers (2022-05-04T17:48:55Z) - Global convergence of optimized adaptive importance samplers [0.0]
We analyze the optimized adaptive importance sampler (OAIS) for performing Monte Carlo integration with general proposals.
We derive nonasymptotic bounds for the global gradient of $chi2$-divergence for proposals.
arXiv Detail & Related papers (2022-01-02T19:56:36Z) - Faster Algorithm and Sharper Analysis for Constrained Markov Decision
Process [56.55075925645864]
The problem of constrained decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated discounted reward subject to multiple constraints.
A new utilities-dual convex approach is proposed with novel integration of three ingredients: regularized policy, dual regularizer, and Nesterov's gradient descent dual.
This is the first demonstration that nonconcave CMDP problems can attain the lower bound of $mathcal O (1/epsilon)$ for all complexity optimization subject to convex constraints.
arXiv Detail & Related papers (2021-10-20T02:57:21Z) - Distributed stochastic optimization with large delays [59.95552973784946]
One of the most widely used methods for solving large-scale optimization problems is distributed asynchronous gradient descent (DASGD)
We show that DASGD converges to a global optimal implementation model under same delay assumptions.
arXiv Detail & Related papers (2021-07-06T21:59:49Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z) - UFO-BLO: Unbiased First-Order Bilevel Optimization [42.49533978193117]
We propose a new FOBLO-based unbiased estimate of outer-level gradients, enabling us to theoretically guarantee this convergence.
Our findings are supported by experimental results on Omniglot and Mini-ImageNet, popular few-shot meta-learning benchmarks.
arXiv Detail & Related papers (2020-06-05T18:53:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.