Related papers: A Lyapunov Drift-Plus-Penalty Method Tailored for Reinforcement Learning with Queue Stability

A Lyapunov Drift-Plus-Penalty Method Tailored for Reinforcement Learning with Queue Stability

URL: http://arxiv.org/abs/2506.04291v1
Date: Wed, 04 Jun 2025 10:56:24 GMT
Title: A Lyapunov Drift-Plus-Penalty Method Tailored for Reinforcement Learning with Queue Stability
Authors: Wenhan Xu, Jiashuo Jiang, Lei Deng, Danny Hin-Kwok Tsang,
Abstract summary: In this paper, we investigate the adaptation of the Lyapunov Drift-Plus-Penalty algorithm for reinforcement learning (RL) applications.<n>Our proposed algorithm offers theoretical superiority by effectively balancing the greedy optimization of Lyapunov Drift-Plus-Penalty with the long-term perspective of RL.
Score: 7.359722946713891
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: With the proliferation of Internet of Things (IoT) devices, the demand for addressing complex optimization challenges has intensified. The Lyapunov Drift-Plus-Penalty algorithm is a widely adopted approach for ensuring queue stability, and some research has preliminarily explored its integration with reinforcement learning (RL). In this paper, we investigate the adaptation of the Lyapunov Drift-Plus-Penalty algorithm for RL applications, deriving an effective method for combining Lyapunov Drift-Plus-Penalty with RL under a set of common and reasonable conditions through rigorous theoretical analysis. Unlike existing approaches that directly merge the two frameworks, our proposed algorithm, termed Lyapunov drift-plus-penalty method tailored for reinforcement learning with queue stability (LDPTRLQ) algorithm, offers theoretical superiority by effectively balancing the greedy optimization of Lyapunov Drift-Plus-Penalty with the long-term perspective of RL. Simulation results for multiple problems demonstrate that LDPTRLQ outperforms the baseline methods using the Lyapunov drift-plus-penalty method and RL, corroborating the validity of our theoretical derivations. The results also demonstrate that our proposed algorithm outperforms other benchmarks in terms of compatibility and stability.

Related papers

Preference Optimization for Combinatorial Optimization Problems [54.87466279363487]
Reinforcement Learning (RL) has emerged as a powerful tool for neural optimization, enabling models learns that solve complex problems without requiring expert knowledge.<n>Despite significant progress, existing RL approaches face challenges such as diminishing reward signals and inefficient exploration in vast action spaces.<n>We propose Preference Optimization, a novel method that transforms quantitative reward signals into qualitative preference signals via statistical comparison modeling.
arXiv Detail & Related papers (2025-05-13T16:47:00Z)
RL-finetuning LLMs from on- and off-policy data with a single algorithm [53.70731390624718]
We introduce a novel reinforcement learning algorithm (AGRO) for fine-tuning large-language models.<n>AGRO leverages the concept of generation consistency, which states that the optimal policy satisfies the notion of consistency across any possible generation of the model.<n>We derive algorithms that find optimal solutions via the sample-based policy gradient and provide theoretical guarantees on their convergence.
arXiv Detail & Related papers (2025-03-25T12:52:38Z)
Adaptive Primal-Dual Method for Safe Reinforcement Learning [9.5147410074115]
We propose, analyze and evaluate adaptive primal-dual (APD) methods for Safe Reinforcement Learning (SRL) Two adaptive LRs are adjusted to the Lagrangian multipliers so as to optimize the policy in each iteration. Experiments show that the practical APD algorithm outperforms (or achieves comparable performance) and attains more stable training than the constant LR cases.
arXiv Detail & Related papers (2024-02-01T05:53:44Z)
Asynchronous Parallel Reinforcement Learning for Optimizing Propulsive Performance in Fin Ray Control [3.889677386753812]
Fish fin rays constitute a sophisticated control system for ray-finned fish, facilitating versatile locomotion. Despite extensive research on the kinematics and hydrodynamics of fish locomotion, the intricate control strategies in fin-ray actuation remain largely unexplored. This study introduces a cutting-edge off-policy DRL algorithm, interacting with a fluid-structure interaction (FSI) environment to acquire intricate fin-ray control strategies tailored for various propulsive performance objectives.
arXiv Detail & Related papers (2024-01-21T00:06:17Z)
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint [56.74058752955209]
This paper studies the alignment process of generative models with Reinforcement Learning from Human Feedback (RLHF) We first identify the primary challenges of existing popular methods like offline PPO and offline DPO as lacking in strategical exploration of the environment. We propose efficient algorithms with finite-sample theoretical guarantees.
arXiv Detail & Related papers (2023-12-18T18:58:42Z)
Provable Reward-Agnostic Preference-Based Reinforcement Learning [61.39541986848391]
Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories. We propose a theoretical reward-agnostic PbRL framework where exploratory trajectories that enable accurate learning of hidden reward functions are acquired.
arXiv Detail & Related papers (2023-05-29T15:00:09Z)
Stochastic Unrolled Federated Learning [85.6993263983062]
We introduce UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning. Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolleds and the decentralized nature of federated learning.
arXiv Detail & Related papers (2023-05-24T17:26:22Z)
False Correlation Reduction for Offline Reinforcement Learning [115.11954432080749]
We propose falSe COrrelation REduction (SCORE) for offline RL, a practically effective and theoretically provable algorithm. We empirically show that SCORE achieves the SoTA performance with 3.1x acceleration on various tasks in a standard benchmark (D4RL)
arXiv Detail & Related papers (2021-10-24T15:34:03Z)
Controlled Deep Reinforcement Learning for Optimized Slice Placement [0.8459686722437155]
We present a hybrid ML-heuristic approach that we name "Heuristically Assisted Deep Reinforcement Learning (HA-DRL)" The proposed approach leverages recent works on Deep Reinforcement Learning (DRL) for slice placement and Virtual Network Embedding (VNE) The evaluation results show that the proposed HA-DRL algorithm can accelerate the learning of an efficient slice placement policy.
arXiv Detail & Related papers (2021-08-03T14:54:00Z)
A Reinforcement Learning Formulation of the Lyapunov Optimization: Application to Edge Computing Systems with Queue Stability [12.693545159861857]
A deep reinforcement learning (DRL)-based approach to the Lyapunov optimization is considered to minimize the time-average penalty while maintaining queue stability. The proposed DRL-based RL approach is applied to resource allocation in edge computing systems with queue stability and numerical results demonstrate its successful operation.
arXiv Detail & Related papers (2020-12-14T05:55:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.