Learning an Efficient Optimizer via Hybrid-Policy Sub-Trajectory Balance
- URL: http://arxiv.org/abs/2511.00543v1
- Date: Sat, 01 Nov 2025 13:08:28 GMT
- Title: Learning an Efficient Optimizer via Hybrid-Policy Sub-Trajectory Balance
- Authors: Yunchuan Guan, Yu Liu, Ke Zhou, Hui Li, Sen Jia, Zhiqi Shen, Ziyang Wang, Xinglin Zhang, Tao Chen, Jenq-Neng Hwang, Lei Li,
- Abstract summary: Recent advances in generative modeling enable neural networks to generate weights without relying on gradient-based optimization.<n>Lo-Hp is a decoupled two-stage weight generation framework that enhances flexibility through learning various optimization policies.<n>We demonstrate that learning solely local optimization policies can address the long-horizon issue while enhancing the generation of global optimal weights.
- Score: 42.630489353592786
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in generative modeling enable neural networks to generate weights without relying on gradient-based optimization. However, current methods are limited by issues of over-coupling and long-horizon. The former tightly binds weight generation with task-specific objectives, thereby limiting the flexibility of the learned optimizer. The latter leads to inefficiency and low accuracy during inference, caused by the lack of local constraints. In this paper, we propose Lo-Hp, a decoupled two-stage weight generation framework that enhances flexibility through learning various optimization policies. It adopts a hybrid-policy sub-trajectory balance objective, which integrates on-policy and off-policy learning to capture local optimization policies. Theoretically, we demonstrate that learning solely local optimization policies can address the long-horizon issue while enhancing the generation of global optimal weights. In addition, we validate Lo-Hp's superior accuracy and inference efficiency in tasks that require frequent weight updates, such as transfer learning, few-shot learning, domain generalization, and large language model adaptation.
Related papers
- Stabilizing Policy Gradients for Sample-Efficient Reinforcement Learning in LLM Reasoning [77.92320830700797]
Reinforcement Learning has played a central role in enabling reasoning capabilities of Large Language Models.<n>We propose a tractable computational framework that tracks and leverages curvature information during policy updates.<n>The algorithm, Curvature-Aware Policy Optimization (CAPO), identifies samples that contribute to unstable updates and masks them out.
arXiv Detail & Related papers (2025-10-01T12:29:32Z) - ACPO: Adaptive Curriculum Policy Optimization for Aligning Vision-Language Models in Complex Reasoning [17.928214942495412]
ACPO employs a dynamic curriculum that orchestrates a principled transition from a stable, near on-policy exploration phase to an efficient, off-policy exploitation phase.<n>We conduct extensive experiments on a suite of challenging multimodal reasoning benchmarks, including MathVista, LogicVista, and MMMU-Pro.<n>Results demonstrate that ACPO consistently outperforms strong baselines such as DAPO and PAPO, achieving state-of-the-art performance, accelerated convergence, and superior training stability.
arXiv Detail & Related papers (2025-10-01T09:11:27Z) - Learning to Optimize Multi-Objective Alignment Through Dynamic Reward Weighting [48.87957020168614]
Prior works in multi-reward learning typically use linear scalarization with fixed weights, which fail to capture effective online learning.<n>We introduce two approaches to increasing objective alignment, one for online learning, the other for space exploration.
arXiv Detail & Related papers (2025-09-14T21:56:35Z) - Principled Data Augmentation for Learning to Solve Quadratic Programming Problems [11.574125752787156]
Recently, learning-to-optimize methods (L2O) for linear (LPs) or quadratic programs (QPs) have gained traction.<n>MPNNs promise lightweight, data-driven proxies for solving such optimization problems.<n>However, robust L2O MPNNs remain challenging in data-scarce settings.<n>This work introduces a principled approach to data augmentation tailored for QPs via MPNNs.
arXiv Detail & Related papers (2025-06-02T14:40:18Z) - HoP: Homeomorphic Polar Learning for Hard Constrained Optimization [3.8166443770130822]
Constrained optimization demands highly efficient synthetic training approaches.<n>As a data-driven learning method, L2O leverages neural networks efficiently produce approximate solutions.<n>HoP achieves solutions closer to the optimum than existing L2O methods.<n>In all cases, HoP achieves solutions closer to the optimum than existing L2O methods.
arXiv Detail & Related papers (2025-02-01T03:59:15Z) - Enhancing Spectrum Efficiency in 6G Satellite Networks: A GAIL-Powered Policy Learning via Asynchronous Federated Inverse Reinforcement Learning [67.95280175998792]
A novel adversarial imitation learning (GAIL)-powered policy learning approach is proposed for optimizing beamforming, spectrum allocation, and remote user equipment (RUE) association ins.
We employ inverse RL (IRL) to automatically learn reward functions without manual tuning.
We show that the proposed MA-AL method outperforms traditional RL approaches, achieving a $14.6%$ improvement in convergence and reward value.
arXiv Detail & Related papers (2024-09-27T13:05:02Z) - Self-Supervised Learning for Large-Scale Preventive Security Constrained DC Optimal Power Flow [20.078717680640214]
Security-Constrained Optimal Power Flow (SCOPF) plays a crucial role in power grid stability but becomes increasingly complex as systems grow.
This paper introduces PDL-SCOPF, a self-supervised end-to-end primal-dual learning framework for producing near-optimal solutions to large-scale SCOPF problems.
arXiv Detail & Related papers (2023-11-29T20:36:35Z) - Multiplicative update rules for accelerating deep learning training and
increasing robustness [69.90473612073767]
We propose an optimization framework that fits to a wide range of machine learning algorithms and enables one to apply alternative update rules.
We claim that the proposed framework accelerates training, while leading to more robust models in contrast to traditionally used additive update rule.
arXiv Detail & Related papers (2023-07-14T06:44:43Z) - Accelerated Federated Learning with Decoupled Adaptive Optimization [53.230515878096426]
federated learning (FL) framework enables clients to collaboratively learn a shared model while keeping privacy of training data on clients.
Recently, many iterations efforts have been made to generalize centralized adaptive optimization methods, such as SGDM, Adam, AdaGrad, etc., to federated settings.
This work aims to develop novel adaptive optimization methods for FL from the perspective of dynamics of ordinary differential equations (ODEs)
arXiv Detail & Related papers (2022-07-14T22:46:43Z) - Hyper-Learning for Gradient-Based Batch Size Adaptation [2.944323057176686]
Scheduling the batch size to increase is an effective strategy to control noise when training deep neural networks.
We introduce Arbiter as a new hyper-optimization algorithm to perform batch size adaptations for learnable schedulings.
We demonstrate Arbiter's effectiveness in several illustrative experiments.
arXiv Detail & Related papers (2022-05-17T11:01:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.