Related papers: Tuning-Free Bilevel Optimization: New Algorithms and Convergence Analysis

Tuning-Free Bilevel Optimization: New Algorithms and Convergence Analysis

URL: http://arxiv.org/abs/2410.05140v2
Date: Tue, 8 Oct 2024 21:38:43 GMT
Title: Tuning-Free Bilevel Optimization: New Algorithms and Convergence Analysis
Authors: Yifan Yang, Hao Ban, Minhui Huang, Shiqian Ma, Kaiyi Ji,
Abstract summary: We propose two novel tuning-free algorithms, D-TFBO and S-TFBO. D-TFBO employs a double-loop structure with stepsizes adaptively adjusted by the "inverse of cumulative gradient norms" strategy. S-TFBO features a simpler fully single-loop structure that updates three variables simultaneously with a theory-motivated joint design of adaptive stepsizes for all variables.
Score: 21.932550214810533
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Bilevel optimization has recently attracted considerable attention due to its abundant applications in machine learning problems. However, existing methods rely on prior knowledge of problem parameters to determine stepsizes, resulting in significant effort in tuning stepsizes when these parameters are unknown. In this paper, we propose two novel tuning-free algorithms, D-TFBO and S-TFBO. D-TFBO employs a double-loop structure with stepsizes adaptively adjusted by the "inverse of cumulative gradient norms" strategy. S-TFBO features a simpler fully single-loop structure that updates three variables simultaneously with a theory-motivated joint design of adaptive stepsizes for all variables. We provide a comprehensive convergence analysis for both algorithms and show that D-TFBO and S-TFBO respectively require $O(\frac{1}{\epsilon})$ and $O(\frac{1}{\epsilon}\log^4(\frac{1}{\epsilon}))$ iterations to find an $\epsilon$-accurate stationary point, (nearly) matching their well-tuned counterparts using the information of problem parameters. Experiments on various problems show that our methods achieve performance comparable to existing well-tuned approaches, while being more robust to the selection of initial stepsizes. To the best of our knowledge, our methods are the first to completely eliminate the need for stepsize tuning, while achieving theoretical guarantees.

Related papers

Problem-Parameter-Free Decentralized Bilevel Optimization [31.15538292038612]
Decentralized bilevel optimization has garnered significant attention due to its critical role in solving large-scale machine learning problems.<n>In this paper, we propose AdaSDBO, a fully problem- parameter-free algorithm for decentralized bilevel optimization with a single-loop structure.<n>Through rigorous theoretical analysis, we establish that AdaSDBO achieves a convergence rate of $widetildemathcalOleft(frac1Tright)$, matching the performance of well-tuned state-of-the-art methods up to polylogarithmic factors.
arXiv Detail & Related papers (2025-10-28T10:50:04Z)
Adaptive Algorithms with Sharp Convergence Rates for Stochastic Hierarchical Optimization [31.032959636901086]
We propose novel adaptive algorithms for hierarchical optimization problems.<n>Our algorithms achieve sharp convergence rates without prior knowledge of the noise level.<n>Experiments on synthetic and deep learning tasks demonstrate the effectiveness of our proposed algorithms.
arXiv Detail & Related papers (2025-09-18T20:17:18Z)
Neural Network Training via Stochastic Alternating Minimization with Trainable Step Sizes [3.246129789918632]
The training of deep neural networks is inherently a non- optimization problem.<n>Standard approaches such as gradient descent (SGD) require simultaneous updates to parameters.<n>We propose a novel method Alternating Train Miniization with tailored step sizes (SAMT)<n>SAMT achieves better performance with fewer parameter updates compared to state-of-the-art methods.
arXiv Detail & Related papers (2025-08-06T08:23:38Z)
Accelerating Cutting-Plane Algorithms via Reinforcement Learning Surrogates [49.84541884653309]
A current standard approach to solving convex discrete optimization problems is the use of cutting-plane algorithms. Despite the existence of a number of general-purpose cut-generating algorithms, large-scale discrete optimization problems continue to suffer from intractability. We propose a method for accelerating cutting-plane algorithms via reinforcement learning.
arXiv Detail & Related papers (2023-07-17T20:11:56Z)
BiSLS/SPS: Auto-tune Step Sizes for Stable Bi-level Optimization [33.082961718280245]
Existing algorithms involve two coupled learning rates that can be affected by approximation errors when computing hypergradients. We investigate the use of adaptive step-size methods, namely line search (SLS) and Polyak step size (SPS), for computing both the upper and lower-level learning rates. New algorithms, which are available in both SGD and Adam versions, can find large learning rates with minimal tuning and converge faster than corresponding vanilla BO algorithms.
arXiv Detail & Related papers (2023-05-30T00:37:50Z)
Dynamical softassign and adaptive parameter tuning for graph matching [0.7456521449098222]
We study a unified framework for graph matching problems called the constrained gradient algorithms. Our contributed adaptive step size parameter can guarantee the underlying algorithms' convergence. We propose a novel graph matching algorithm: the softassign constrained gradient method.
arXiv Detail & Related papers (2022-08-17T11:25:03Z)
Formal guarantees for heuristic optimization algorithms used in machine learning [6.978625807687497]
Gradient Descent (SGD) and its variants have become the dominant methods in the large-scale optimization machine learning (ML) problems. We provide formal guarantees of a few convex optimization methods and proposing improved algorithms.
arXiv Detail & Related papers (2022-07-31T19:41:22Z)
A Fully Single Loop Algorithm for Bilevel Optimization without Hessian Inverse [121.54116938140754]
We propose a new Hessian inverse free Fully Single Loop Algorithm for bilevel optimization problems. We show that our algorithm converges with the rate of $O(epsilon-2)$.
arXiv Detail & Related papers (2021-12-09T02:27:52Z)
Bolstering Stochastic Gradient Descent with Model Building [0.0]
gradient descent method and its variants constitute the core optimization algorithms that achieve good convergence rates. We propose an alternative approach to line search by using a new algorithm based on forward step model building. We show that the proposed algorithm achieves faster convergence and better generalization in well-known test problems.
arXiv Detail & Related papers (2021-11-13T06:54:36Z)
BiAdam: Fast Adaptive Bilevel Optimization Methods [104.96004056928474]
Bilevel optimization has attracted increased interest in machine learning due to its many applications. We provide a useful analysis framework for both the constrained and unconstrained optimization.
arXiv Detail & Related papers (2021-06-21T20:16:40Z)
Randomized Stochastic Variance-Reduced Methods for Stochastic Bilevel Optimization [62.87181271021217]
We consider non-SBO problems that have many applications in machine learning. This paper proposes fast randomized algorithms for non-SBO problems.
arXiv Detail & Related papers (2021-05-05T18:28:42Z)
Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step. Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z)
Balancing Rates and Variance via Adaptive Batch-Size for Stochastic Optimization Problems [120.21685755278509]
In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster in time up to an error. Rather than fixing the minibatch the step-size at the outset, we propose to allow parameters to evolve adaptively.
arXiv Detail & Related papers (2020-07-02T16:02:02Z)
Optimizing generalization on the train set: a novel gradient-based framework to train parameters and hyperparameters simultaneously [0.0]
Generalization is a central problem in Machine Learning. We present a novel approach based on a new measure of risk that allows us to develop novel fully automatic procedures for generalization.
arXiv Detail & Related papers (2020-06-11T18:04:36Z)
Convergence of adaptive algorithms for weakly convex constrained optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope. Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.