Related papers: A Reinforcement Learning Formulation of the Lyapunov Optimization: Application to Edge Computing Systems with Queue Stability

A Reinforcement Learning Formulation of the Lyapunov Optimization: Application to Edge Computing Systems with Queue Stability

URL: http://arxiv.org/abs/2012.07279v2
Date: Tue, 15 Dec 2020 11:02:51 GMT
Title: A Reinforcement Learning Formulation of the Lyapunov Optimization: Application to Edge Computing Systems with Queue Stability
Authors: Sohee Bae, Seungyul Han, and Youngchul Sung
Abstract summary: A deep reinforcement learning (DRL)-based approach to the Lyapunov optimization is considered to minimize the time-average penalty while maintaining queue stability. The proposed DRL-based RL approach is applied to resource allocation in edge computing systems with queue stability and numerical results demonstrate its successful operation.
Score: 12.693545159861857
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, a deep reinforcement learning (DRL)-based approach to the Lyapunov optimization is considered to minimize the time-average penalty while maintaining queue stability. A proper construction of state and action spaces is provided to form a proper Markov decision process (MDP) for the Lyapunov optimization. A condition for the reward function of reinforcement learning (RL) for queue stability is derived. Based on the analysis and practical RL with reward discounting, a class of reward functions is proposed for the DRL-based approach to the Lyapunov optimization. The proposed DRL-based approach to the Lyapunov optimization does not required complicated optimization at each time step and operates with general non-convex and discontinuous penalty functions. Hence, it provides an alternative to the conventional drift-plus-penalty (DPP) algorithm for the Lyapunov optimization. The proposed DRL-based approach is applied to resource allocation in edge computing systems with queue stability and numerical results demonstrate its successful operation.

Related papers

RL-PINNs: Reinforcement Learning-Driven Adaptive Sampling for Efficient Training of PINNs [0.0]
Physics-Informed Neural Networks (PINNs) have emerged as a powerful framework for solving partial differential equations (PDEs) Their performance heavily relies on the strategy used to select training points. We propose RL-PINNs, a reinforcement learning-driven adaptive sampling framework that enables efficient training with only a single round of sampling.
arXiv Detail & Related papers (2025-04-17T13:50:55Z)
Accelerated Preference Optimization for Large Language Model Alignment [60.22606527763201]
Reinforcement Learning from Human Feedback (RLHF) has emerged as a pivotal tool for aligning large language models (LLMs) with human preferences. Direct Preference Optimization (DPO) formulates RLHF as a policy optimization problem without explicitly estimating the reward function. We propose a general Accelerated Preference Optimization (APO) framework, which unifies many existing preference optimization algorithms.
arXiv Detail & Related papers (2024-10-08T18:51:01Z)
Optimization Solution Functions as Deterministic Policies for Offline Reinforcement Learning [7.07623669995408]
We propose an implicit actor-critic (iAC) framework that employs optimization solution functions as a deterministic policy (actor) and a monotone function over the optimal value of optimization as a critic. We show that the learned policies are robust to the suboptimality of the learned actor parameters via the exponentially decaying sensitivity (EDS) property. We validate the proposed framework on two real-world applications and show a significant improvement over state-of-the-art (SOTA) offline RL methods.
arXiv Detail & Related papers (2024-08-27T19:04:32Z)
Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer [52.09480867526656]
We identify the source of misalignment as a form of distributional shift and uncertainty in learning human preferences. To mitigate overoptimization, we first propose a theoretical algorithm that chooses the best policy for an adversarially chosen reward model. Using the equivalence between reward models and the corresponding optimal policy, the algorithm features a simple objective that combines a preference optimization loss and a supervised learning loss.
arXiv Detail & Related papers (2024-05-26T05:38:50Z)
REBEL: Reinforcement Learning via Regressing Relative Rewards [59.68420022466047]
We propose REBEL, a minimalist RL algorithm for the era of generative models. In theory, we prove that fundamental RL algorithms like Natural Policy Gradient can be seen as variants of REBEL. We find that REBEL provides a unified approach to language modeling and image generation with stronger or similar performance as PPO and DPO.
arXiv Detail & Related papers (2024-04-25T17:20:45Z)
Analyzing and Enhancing the Backward-Pass Convergence of Unrolled Optimization [50.38518771642365]
The integration of constrained optimization models as components in deep networks has led to promising advances on many specialized learning tasks. A central challenge in this setting is backpropagation through the solution of an optimization problem, which often lacks a closed form. This paper provides theoretical insights into the backward pass of unrolled optimization, showing that it is equivalent to the solution of a linear system by a particular iterative method. A system called Folded Optimization is proposed to construct more efficient backpropagation rules from unrolled solver implementations.
arXiv Detail & Related papers (2023-12-28T23:15:18Z)
Offline Policy Optimization in RL with Variance Regularizaton [142.87345258222942]
We propose variance regularization for offline RL algorithms, using stationary distribution corrections. We show that by using Fenchel duality, we can avoid double sampling issues for computing the gradient of the variance regularizer. The proposed algorithm for offline variance regularization (OVAR) can be used to augment any existing offline policy optimization algorithms.
arXiv Detail & Related papers (2022-12-29T18:25:01Z)
Revisiting the Linear-Programming Framework for Offline RL with General Function Approximation [24.577243536475233]
offline reinforcement learning (RL) concerns pursuing an optimal policy for sequential decision-making from a pre-collected dataset. Recent theoretical progress has focused on developing sample-efficient offline RL algorithms with various relaxed assumptions on data coverage and function approximators. We revisit the linear-programming framework for offline RL, and advance the existing results in several aspects.
arXiv Detail & Related papers (2022-12-28T15:28:12Z)
A Reinforcement Learning Approach to Parameter Selection for Distributed Optimization in Power Systems [1.1199585259018459]
We develop an adaptive penalty parameter selection policy for the AC optimal power flow (ACOPF) problem solved via ADMM. We show that our RL policy demonstrates promise for generalizability, performing well under unseen loading schemes as well as under unseen losses of lines and generators. This work thus provides a proof-of-concept for using RL for parameter selection in ADMM for power systems applications.
arXiv Detail & Related papers (2021-10-22T18:17:32Z)
Optimistic Policy Optimization is Provably Efficient in Non-stationary MDPs [113.8752163061151]
We study episodic reinforcement learning (RL) in non-stationary linear kernel Markov decision processes (MDPs) We propose the underlineperiodically underlinerestarted underlineoptimistic underlinepolicy underlineoptimization algorithm (PROPO) PROPO features two mechanisms: sliding-window-based policy evaluation and periodic-restart-based policy improvement.
arXiv Detail & Related papers (2021-10-18T02:33:20Z)
Controlled Deep Reinforcement Learning for Optimized Slice Placement [0.8459686722437155]
We present a hybrid ML-heuristic approach that we name "Heuristically Assisted Deep Reinforcement Learning (HA-DRL)" The proposed approach leverages recent works on Deep Reinforcement Learning (DRL) for slice placement and Virtual Network Embedding (VNE) The evaluation results show that the proposed HA-DRL algorithm can accelerate the learning of an efficient slice placement policy.
arXiv Detail & Related papers (2021-08-03T14:54:00Z)
Momentum Accelerates the Convergence of Stochastic AUPRC Maximization [80.8226518642952]
We study optimization of areas under precision-recall curves (AUPRC), which is widely used for imbalanced tasks. We develop novel momentum methods with a better iteration of $O (1/epsilon4)$ for finding an $epsilon$stationary solution. We also design a novel family of adaptive methods with the same complexity of $O (1/epsilon4)$, which enjoy faster convergence in practice.
arXiv Detail & Related papers (2021-07-02T16:21:52Z)
Optimistic Reinforcement Learning by Forward Kullback-Leibler Divergence Optimization [1.7970523486905976]
This paper addresses a new interpretation of reinforcement learning (RL) as reverse Kullback-Leibler (KL) divergence optimization. It derives a new optimization method using forward KL divergence. In a realistic robotic simulation, the proposed method with the moderate optimism outperformed one of the state-of-the-art RL method.
arXiv Detail & Related papers (2021-05-27T08:24:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.