UFO-BLO: Unbiased First-Order Bilevel Optimization
- URL: http://arxiv.org/abs/2006.03631v2
- Date: Mon, 7 Jun 2021 16:35:35 GMT
- Title: UFO-BLO: Unbiased First-Order Bilevel Optimization
- Authors: Valerii Likhosherstov, Xingyou Song, Krzysztof Choromanski, Jared
Davis, Adrian Weller
- Abstract summary: We propose a new FOBLO-based unbiased estimate of outer-level gradients, enabling us to theoretically guarantee this convergence.
Our findings are supported by experimental results on Omniglot and Mini-ImageNet, popular few-shot meta-learning benchmarks.
- Score: 42.49533978193117
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bilevel optimization (BLO) is a popular approach with many applications
including hyperparameter optimization, neural architecture search, adversarial
robustness and model-agnostic meta-learning. However, the approach suffers from
time and memory complexity proportional to the length $r$ of its inner
optimization loop, which has led to several modifications being proposed. One
such modification is \textit{first-order} BLO (FO-BLO) which approximates
outer-level gradients by zeroing out second derivative terms, yielding
significant speed gains and requiring only constant memory as $r$ varies.
Despite FO-BLO's popularity, there is a lack of theoretical understanding of
its convergence properties. We make progress by demonstrating a rich family of
examples where FO-BLO-based stochastic optimization does not converge to a
stationary point of the BLO objective. We address this concern by proposing a
new FO-BLO-based unbiased estimate of outer-level gradients, enabling us to
theoretically guarantee this convergence, with no harm to memory and expected
time complexity. Our findings are supported by experimental results on Omniglot
and Mini-ImageNet, popular few-shot meta-learning benchmarks.
Related papers
- Hi-ZFO: Hierarchical Zeroth- and First-Order LLM Fine-Tuning via Importance-Guided Tensor Selection [4.808936079900314]
We propose textbfHi-ZFO (textbfHierarchical textbfZeroth- and textbfFirst-textbfOrder optimization) to synergize FO gradients with ZO estimation.<n>We show that Hi-ZFO consistently achieves superior performance while significantly reducing the training time.
arXiv Detail & Related papers (2026-01-09T03:20:54Z) - Efficient Penalty-Based Bilevel Methods: Improved Analysis, Novel Updates, and Flatness Condition [51.22672287601796]
Penalty-based methods have become popular for solving bilevel optimization (BLO) problems.<n>They often require inner-loop iteration to solve the lower-level (LL) problem and small outer-loop step sizes to handle the increased smoothness induced by large penalty terms.<n>This work considers the general BLO problems with coupled constraints (CCs) and leverages a novel penalty reformulation that decouples the upper- and lower-level variables.
arXiv Detail & Related papers (2025-11-20T20:48:14Z) - Decentralized Nonconvex Composite Federated Learning with Gradient Tracking and Momentum [78.27945336558987]
Decentralized server (DFL) eliminates reliance on client-client architecture.
Non-smooth regularization is often incorporated into machine learning tasks.
We propose a novel novel DNCFL algorithm to solve these problems.
arXiv Detail & Related papers (2025-04-17T08:32:25Z) - Harmony in Divergence: Towards Fast, Accurate, and Memory-efficient Zeroth-order LLM Fine-tuning [37.507489928116804]
Large language models (LLMs) excel across various tasks, but standard first-order (FO) fine-tuning demands considerable memory.
We introduce a novel layer-wise divergence analysis that uncovers the distinct update pattern of FO and ZO optimization.
We propose textbfDivergence-driven textbfZeroth-textbfOrder (textbfDiZO) optimization.
arXiv Detail & Related papers (2025-02-05T16:03:17Z) - A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning.
These problems are often formalized as Bi-Level optimizations (BLO)
We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z) - Bayesian Optimisation with Unknown Hyperparameters: Regret Bounds Logarithmically Closer to Optimal [18.93478528448966]
We introduce Length scale Balancing (LB) - a novel approach aggregating multiple base surrogate models with varying length scales.
LB adds length scale candidate values while retaining longer scales, balancing exploration and exploitation.
We show that LB is only $log g(T) away from oracle regret.
arXiv Detail & Related papers (2024-10-14T11:17:00Z) - Enhancing Zeroth-order Fine-tuning for Language Models with Low-rank Structures [21.18741772731095]
Zeroth-order (ZO) algorithms offer a promising alternative by approximating gradients using finite differences of function values.
Existing ZO methods struggle to capture the low-rank gradient structure common in LLM fine-tuning, leading to suboptimal performance.
This paper proposes a low-rank ZO algorithm (LOZO) that effectively captures this structure in LLMs.
arXiv Detail & Related papers (2024-10-10T08:10:53Z) - Contextual Stochastic Bilevel Optimization [50.36775806399861]
We introduce contextual bilevel optimization (CSBO) -- a bilevel optimization framework with the lower-level problem minimizing an expectation on some contextual information and the upper-level variable.
It is important for applications such as meta-learning, personalized learning, end-to-end learning, and Wasserstein distributionally robustly optimization with side information (WDRO-SI)
arXiv Detail & Related papers (2023-10-27T23:24:37Z) - AdaLomo: Low-memory Optimization with Adaptive Learning Rate [59.64965955386855]
We introduce low-memory optimization with adaptive learning rate (AdaLomo) for large language models.
AdaLomo results on par with AdamW, while significantly reducing memory requirements, thereby lowering the hardware barrier to training large language models.
arXiv Detail & Related papers (2023-10-16T09:04:28Z) - Towards Extremely Fast Bilevel Optimization with Self-governed
Convergence Guarantees [42.514612465664605]
We propose a single-level formulation to uniformly understand existing explicit and implicit Gradient-based BLOs.
A striking feature of our convergence result is that, compared to those original unaccelerated GBLO versions, the fast BAGDC admits a unified non-asymptotic convergence theory towards stationarity.
arXiv Detail & Related papers (2022-05-20T09:46:10Z) - Value-Function-based Sequential Minimization for Bi-level Optimization [52.39882976848064]
gradient-based Bi-Level Optimization (BLO) methods have been widely applied to handle modern learning tasks.
There are almost no gradient-based methods able to solve BLO in challenging scenarios, such as BLO with functional constraints and pessimistic BLO.
We provide Bi-level Value-Function-based Sequential Minimization (BVFSM) to address the above issues.
arXiv Detail & Related papers (2021-10-11T03:13:39Z) - Debiasing a First-order Heuristic for Approximate Bi-level Optimization [38.068090269482425]
Approximate bi-level optimization (ABLO) consists of (outer-level) optimization problems, involving numerical (inner-level) optimization loops.
There is a lack of theoretical understanding of FOM's convergence properties.
We propose an unbiased FOM enjoying constant memory complexity as a function of $r$.
arXiv Detail & Related papers (2021-06-04T13:46:48Z) - A Generic First-Order Algorithmic Framework for Bi-Level Programming
Beyond Lower-Level Singleton [49.23948907229656]
Bi-level Descent Aggregation is a flexible and modularized algorithmic framework for generic bi-level optimization.
We derive a new methodology to prove the convergence of BDA without the LLS condition.
Our investigations also demonstrate that BDA is indeed compatible to a verify of particular first-order computation modules.
arXiv Detail & Related papers (2020-06-07T05:18:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.