Related papers: Stepsize anything: A unified learning rate schedule for budgeted-iteration training

Stepsize anything: A unified learning rate schedule for budgeted-iteration training

URL: http://arxiv.org/abs/2505.24452v3
Date: Wed, 06 Aug 2025 13:37:34 GMT
Title: Stepsize anything: A unified learning rate schedule for budgeted-iteration training
Authors: Anda Tang, Yiming Dong, Yutao Zeng, zhou Xun, Zhouchen Lin,
Abstract summary: Budgeted-iteration training aims to achieve optimal learning within predetermined budgets.<n>While learning rate schedules govern the performance of different networks and tasks, their design remains largely lacking theoretical foundations.<n>We propose the Unified Budget-Aware (UBA) schedule, a theoretically grounded learning rate schedule that consistently outperforms commonly-used schedules.
Score: 43.52874155421866
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The expanding computational costs and limited resources underscore the critical need for budgeted-iteration training, which aims to achieve optimal learning within predetermined iteration budgets. While learning rate schedules fundamentally govern the performance of different networks and tasks, particularly in budgeted-iteration scenarios, their design remains largely heuristic, lacking theoretical foundations. In addition, the optimal learning rate schedule requires extensive trial-and-error selection, making the training process inefficient. In this work, we propose the Unified Budget-Aware (UBA) schedule, a theoretically grounded learning rate schedule that consistently outperforms commonly-used schedules among diverse architectures and tasks under different constrained training budgets. First, we bridge the gap by constructing a novel training budget-aware optimization framework, which explicitly accounts for the robustness to landscape curvature variations. From this framework, we derive the UBA schedule, controlled by a single hyper-parameter \varphi that provides a trade-off between flexibility and simplicity, eliminating the need for per-network numerical optimization. Moreover, we establish a theoretical connection between \varphi and the condition number, adding interpretation and justification to our approach. Besides, we prove the convergence for different values of \varphi. We offer practical guidelines for its selection via theoretical analysis and empirical results. Extensive experimental results show that UBA consistently surpasses the commonly-used schedules across diverse vision and language tasks, spanning network architectures (e.g., ResNet, OLMo) and scales, under different training-iteration budgets.

Related papers

Optimizing Anytime Reasoning via Budget Relative Policy Optimization [38.57672572913099]
We present a novel framework, AnytimeReasoner, to optimize anytime reasoning performance.<n>We truncate the complete thinking process to fit within sampled token budgets from a prior distribution.<n>We then optimize the thinking and summary policies in a decoupled manner to maximize the cumulative reward.
arXiv Detail & Related papers (2025-05-19T17:58:44Z)
Scalable Chain of Thoughts via Elastic Reasoning [61.75753924952059]
Elastic Reasoning is a novel framework for scalable chain of thoughts.<n>It separates reasoning into two phases--thinking and solution--with independently allocated budgets.<n>Our approach produces more concise and efficient reasoning even in unconstrained settings.
arXiv Detail & Related papers (2025-05-08T15:01:06Z)
Optimizing LLM Inference for Database Systems: Cost-Aware Scheduling for Concurrent Requests [8.552242818726347]
This paper first analyzes the LLM inference performance and focuses on a data management issue in LLM inference.<n>We reveal that the root of the problem is the lack of an adequate resource cost model and optimization strategy when executing multiple concurrent inference requests.
arXiv Detail & Related papers (2024-11-12T00:10:34Z)
Optimal Query Allocation in Extractive QA with LLMs: A Learning-to-Defer Framework with Theoretical Guarantees [3.4289478404209826]
Large Language Models excel in generative tasks but exhibit inefficiencies in structured text selection.<n>We propose a Learning-to-Defer framework that allocates queries to specialized experts, ensuring high-confidence predictions.
arXiv Detail & Related papers (2024-10-21T08:21:00Z)
Robustifying and Boosting Training-Free Neural Architecture Search [49.828875134088904]
We propose a robustifying and boosting training-free NAS (RoBoT) algorithm to develop a robust and consistently better-performing metric on diverse tasks. Remarkably, the expected performance of our RoBoT can be theoretically guaranteed, which improves over the existing training-free NAS.
arXiv Detail & Related papers (2024-03-12T12:24:11Z)
Overcoming Recency Bias of Normalization Statistics in Continual Learning: Balance and Adaptation [67.77048565738728]
Continual learning involves learning a sequence of tasks and balancing their knowledge appropriately. We propose Adaptive Balance of BN (AdaB$2$N), which incorporates appropriately a Bayesian-based strategy to adapt task-wise contributions. Our approach achieves significant performance gains across a wide range of benchmarks.
arXiv Detail & Related papers (2023-10-13T04:50:40Z)
When Computing Power Network Meets Distributed Machine Learning: An Efficient Federated Split Learning Framework [6.871107511111629]
CPN-FedSL is a Federated Split Learning (FedSL) framework over Computing Power Network (CPN) We build a dedicated model to capture the basic settings and learning characteristics (e.g., latency, flow, convergence)
arXiv Detail & Related papers (2023-05-22T12:36:52Z)
Efficient Training of Multi-task Neural Solver for Combinatorial Optimization [23.694457372640912]
We propose a general and efficient training paradigm to deliver a unified multi-task neural solver.<n>Our method significantly enhances overall performance, regardless of whether it is within constrained training budgets.<n>Our method also achieved the best results compared to single task learning and multitask learning approaches.
arXiv Detail & Related papers (2023-05-10T14:20:34Z)
Unifying Synergies between Self-supervised Learning and Dynamic Computation [53.66628188936682]
We present a novel perspective on the interplay between SSL and DC paradigms. We show that it is feasible to simultaneously learn a dense and gated sub-network from scratch in a SSL setting. The co-evolution during pre-training of both dense and gated encoder offers a good accuracy-efficiency trade-off.
arXiv Detail & Related papers (2023-01-22T17:12:58Z)
Optimization-Derived Learning with Essential Convergence Analysis of Training and Hyper-training [52.39882976848064]
We design a Generalized Krasnoselskii-Mann (GKM) scheme based on fixed-point iterations as our fundamental ODL module. Under the GKM scheme, a Bilevel Meta Optimization (BMO) algorithmic framework is constructed to solve the optimal training and hyper-training variables together.
arXiv Detail & Related papers (2022-06-16T01:50:25Z)
RANK-NOSH: Efficient Predictor-Based Architecture Search via Non-Uniform Successive Halving [74.61723678821049]
We propose NOn-uniform Successive Halving (NOSH), a hierarchical scheduling algorithm that terminates the training of underperforming architectures early to avoid wasting budget. We formulate predictor-based architecture search as learning to rank with pairwise comparisons. The resulting method - RANK-NOSH, reduces the search budget by 5x while achieving competitive or even better performance than previous state-of-the-art predictor-based methods on various spaces and datasets.
arXiv Detail & Related papers (2021-08-18T07:45:21Z)
REX: Revisiting Budgeted Training with an Improved Schedule [14.618325490983052]
We propose a novel profile and sampling rate combination called the Reflected Exponential (REX) schedule. REX outperforms the linear schedule in the low budget regime, while matching or exceeding the performance of several state-of-the-art learning rate schedules.
arXiv Detail & Related papers (2021-07-09T04:17:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.