A Reinforcement Learning Formulation of the Lyapunov Optimization:
Application to Edge Computing Systems with Queue Stability
- URL: http://arxiv.org/abs/2012.07279v2
- Date: Tue, 15 Dec 2020 11:02:51 GMT
- Title: A Reinforcement Learning Formulation of the Lyapunov Optimization:
Application to Edge Computing Systems with Queue Stability
- Authors: Sohee Bae, Seungyul Han, and Youngchul Sung
- Abstract summary: A deep reinforcement learning (DRL)-based approach to the Lyapunov optimization is considered to minimize the time-average penalty while maintaining queue stability.
The proposed DRL-based RL approach is applied to resource allocation in edge computing systems with queue stability and numerical results demonstrate its successful operation.
- Score: 12.693545159861857
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, a deep reinforcement learning (DRL)-based approach to the
Lyapunov optimization is considered to minimize the time-average penalty while
maintaining queue stability. A proper construction of state and action spaces
is provided to form a proper Markov decision process (MDP) for the Lyapunov
optimization. A condition for the reward function of reinforcement learning
(RL) for queue stability is derived. Based on the analysis and practical RL
with reward discounting, a class of reward functions is proposed for the
DRL-based approach to the Lyapunov optimization. The proposed DRL-based
approach to the Lyapunov optimization does not required complicated
optimization at each time step and operates with general non-convex and
discontinuous penalty functions. Hence, it provides an alternative to the
conventional drift-plus-penalty (DPP) algorithm for the Lyapunov optimization.
The proposed DRL-based approach is applied to resource allocation in edge
computing systems with queue stability and numerical results demonstrate its
successful operation.
Related papers
- Accelerated Preference Optimization for Large Language Model Alignment [60.22606527763201]
Reinforcement Learning from Human Feedback (RLHF) has emerged as a pivotal tool for aligning large language models (LLMs) with human preferences.
Direct Preference Optimization (DPO) formulates RLHF as a policy optimization problem without explicitly estimating the reward function.
We propose a general Accelerated Preference Optimization (APO) framework, which unifies many existing preference optimization algorithms.
arXiv Detail & Related papers (2024-10-08T18:51:01Z) - Optimization Solution Functions as Deterministic Policies for Offline Reinforcement Learning [7.07623669995408]
We propose an implicit actor-critic (iAC) framework that employs optimization solution functions as a deterministic policy (actor) and a monotone function over the optimal value of optimization as a critic.
We show that the learned policies are robust to the suboptimality of the learned actor parameters via the exponentially decaying sensitivity (EDS) property.
We validate the proposed framework on two real-world applications and show a significant improvement over state-of-the-art (SOTA) offline RL methods.
arXiv Detail & Related papers (2024-08-27T19:04:32Z) - Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer [52.09480867526656]
We identify the source of misalignment as a form of distributional shift and uncertainty in learning human preferences.
To mitigate overoptimization, we first propose a theoretical algorithm that chooses the best policy for an adversarially chosen reward model.
Using the equivalence between reward models and the corresponding optimal policy, the algorithm features a simple objective that combines a preference optimization loss and a supervised learning loss.
arXiv Detail & Related papers (2024-05-26T05:38:50Z) - REBEL: Reinforcement Learning via Regressing Relative Rewards [59.68420022466047]
We propose REBEL, a minimalist RL algorithm for the era of generative models.
In theory, we prove that fundamental RL algorithms like Natural Policy Gradient can be seen as variants of REBEL.
We find that REBEL provides a unified approach to language modeling and image generation with stronger or similar performance as PPO and DPO.
arXiv Detail & Related papers (2024-04-25T17:20:45Z) - Analyzing and Enhancing the Backward-Pass Convergence of Unrolled
Optimization [50.38518771642365]
The integration of constrained optimization models as components in deep networks has led to promising advances on many specialized learning tasks.
A central challenge in this setting is backpropagation through the solution of an optimization problem, which often lacks a closed form.
This paper provides theoretical insights into the backward pass of unrolled optimization, showing that it is equivalent to the solution of a linear system by a particular iterative method.
A system called Folded Optimization is proposed to construct more efficient backpropagation rules from unrolled solver implementations.
arXiv Detail & Related papers (2023-12-28T23:15:18Z) - Offline Policy Optimization in RL with Variance Regularizaton [142.87345258222942]
We propose variance regularization for offline RL algorithms, using stationary distribution corrections.
We show that by using Fenchel duality, we can avoid double sampling issues for computing the gradient of the variance regularizer.
The proposed algorithm for offline variance regularization (OVAR) can be used to augment any existing offline policy optimization algorithms.
arXiv Detail & Related papers (2022-12-29T18:25:01Z) - Revisiting the Linear-Programming Framework for Offline RL with General
Function Approximation [24.577243536475233]
offline reinforcement learning (RL) concerns pursuing an optimal policy for sequential decision-making from a pre-collected dataset.
Recent theoretical progress has focused on developing sample-efficient offline RL algorithms with various relaxed assumptions on data coverage and function approximators.
We revisit the linear-programming framework for offline RL, and advance the existing results in several aspects.
arXiv Detail & Related papers (2022-12-28T15:28:12Z) - A Reinforcement Learning Approach to Parameter Selection for Distributed
Optimization in Power Systems [1.1199585259018459]
We develop an adaptive penalty parameter selection policy for the AC optimal power flow (ACOPF) problem solved via ADMM.
We show that our RL policy demonstrates promise for generalizability, performing well under unseen loading schemes as well as under unseen losses of lines and generators.
This work thus provides a proof-of-concept for using RL for parameter selection in ADMM for power systems applications.
arXiv Detail & Related papers (2021-10-22T18:17:32Z) - Controlled Deep Reinforcement Learning for Optimized Slice Placement [0.8459686722437155]
We present a hybrid ML-heuristic approach that we name "Heuristically Assisted Deep Reinforcement Learning (HA-DRL)"
The proposed approach leverages recent works on Deep Reinforcement Learning (DRL) for slice placement and Virtual Network Embedding (VNE)
The evaluation results show that the proposed HA-DRL algorithm can accelerate the learning of an efficient slice placement policy.
arXiv Detail & Related papers (2021-08-03T14:54:00Z) - Momentum Accelerates the Convergence of Stochastic AUPRC Maximization [80.8226518642952]
We study optimization of areas under precision-recall curves (AUPRC), which is widely used for imbalanced tasks.
We develop novel momentum methods with a better iteration of $O (1/epsilon4)$ for finding an $epsilon$stationary solution.
We also design a novel family of adaptive methods with the same complexity of $O (1/epsilon4)$, which enjoy faster convergence in practice.
arXiv Detail & Related papers (2021-07-02T16:21:52Z) - Optimistic Reinforcement Learning by Forward Kullback-Leibler Divergence
Optimization [1.7970523486905976]
This paper addresses a new interpretation of reinforcement learning (RL) as reverse Kullback-Leibler (KL) divergence optimization.
It derives a new optimization method using forward KL divergence.
In a realistic robotic simulation, the proposed method with the moderate optimism outperformed one of the state-of-the-art RL method.
arXiv Detail & Related papers (2021-05-27T08:24:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.