Multi-Timescale Control and Communications with Deep Reinforcement
Learning -- Part II: Control-Aware Radio Resource Allocation
- URL: http://arxiv.org/abs/2311.11280v1
- Date: Sun, 19 Nov 2023 09:50:21 GMT
- Title: Multi-Timescale Control and Communications with Deep Reinforcement
Learning -- Part II: Control-Aware Radio Resource Allocation
- Authors: Lei Lei, Tong Liu, Kan Zheng, Xuemin (Sherman) Shen
- Abstract summary: We decomposed the multi-timescale control and communications problem in C-V2X system.
We proposed the MTCC-PC algorithm to learn an optimal PC policy given an RRA policy.
In this paper (Part II), we first focus on the RRA sub-problem in MTCC assuming a PC policy is given, and propose the MTCC-RRA algorithm to learn the RRA policy.
- Score: 15.390800228536536
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In Part I of this two-part paper (Multi-Timescale Control and Communications
with Deep Reinforcement Learning -- Part I: Communication-Aware Vehicle
Control), we decomposed the multi-timescale control and communications (MTCC)
problem in Cellular Vehicle-to-Everything (C-V2X) system into a
communication-aware Deep Reinforcement Learning (DRL)-based platoon control
(PC) sub-problem and a control-aware DRL-based radio resource allocation (RRA)
sub-problem. We focused on the PC sub-problem and proposed the MTCC-PC
algorithm to learn an optimal PC policy given an RRA policy. In this paper
(Part II), we first focus on the RRA sub-problem in MTCC assuming a PC policy
is given, and propose the MTCC-RRA algorithm to learn the RRA policy.
Specifically, we incorporate the PC advantage function in the RRA reward
function, which quantifies the amount of PC performance degradation caused by
observation delay. Moreover, we augment the state space of RRA with PC action
history for a more well-informed RRA policy. In addition, we utilize reward
shaping and reward backpropagation prioritized experience replay (RBPER)
techniques to efficiently tackle the multi-agent and sparse reward problems,
respectively. Finally, a sample- and computational-efficient training approach
is proposed to jointly learn the PC and RRA policies in an iterative process.
In order to verify the effectiveness of the proposed MTCC algorithm, we
performed experiments using real driving data for the leading vehicle, where
the performance of MTCC is compared with those of the baseline DRL algorithms.
Related papers
- Wireless Resource Allocation with Collaborative Distributed and Centralized DRL under Control Channel Attacks [9.981962772130025]
We consider a wireless resource allocation problem in a cyber-physical system (CPS) where the control channel is subjected to denial-of-service (DoS) attacks.
We propose a novel concept of collaborative distributed and centralized (CDC) resource allocation to effectively mitigate the impact of these attacks.
We develop a new CDC-deep reinforcement learning (DRL) algorithm, whereas existing DRL frameworks only formulate either centralized or distributed decision-making problems.
arXiv Detail & Related papers (2024-11-16T04:56:23Z) - Deployable Reinforcement Learning with Variable Control Rate [14.838483990647697]
We propose a variant of Reinforcement Learning (RL) with variable control rate.
In this approach, the policy decides the action the agent should take as well as the duration of the time step associated with that action.
We show the efficacy of SEAC through a proof-of-concept simulation driving an agent with Newtonian kinematics.
arXiv Detail & Related papers (2024-01-17T15:40:11Z) - Multi-Timescale Control and Communications with Deep Reinforcement
Learning -- Part I: Communication-Aware Vehicle Control [15.390800228536536]
We propose a joint optimization framework of multi-timescale control and communications based on Deep Reinforcement Learning (DRL)
In this paper (Part I), we first decompose the problem into a communication-aware DRL-based PC sub-problem and a control-aware DRL-based RRA sub-problem.
To improve the PC performance under random observation delay, the PC state space is augmented with the observation delay and PC action history.
It is proved that the optimal policy for the augmented state MDP is optimal for the original PC problem with observation delay.
arXiv Detail & Related papers (2023-11-19T09:51:58Z) - Learning to Sail Dynamic Networks: The MARLIN Reinforcement Learning
Framework for Congestion Control in Tactical Environments [53.08686495706487]
This paper proposes an RL framework that leverages an accurate and parallelizable emulation environment to reenact the conditions of a tactical network.
We evaluate our RL learning framework by training a MARLIN agent in conditions replicating a bottleneck link transition between a Satellite Communication (SATCOM) and an UHF Wide Band (UHF) radio link.
arXiv Detail & Related papers (2023-06-27T16:15:15Z) - Roulette-Wheel Selection-Based PSO Algorithm for Solving the Vehicle
Routing Problem with Time Windows [58.891409372784516]
This paper presents a novel form of the PSO methodology that uses the Roulette Wheel Method (RWPSO)
Experiments using the Solomon VRPTW benchmark datasets on the RWPSO demonstrate that RWPSO is competitive with other state-of-the-art algorithms from the literature.
arXiv Detail & Related papers (2023-06-04T09:18:02Z) - MARLIN: Soft Actor-Critic based Reinforcement Learning for Congestion
Control in Real Networks [63.24965775030673]
We propose a novel Reinforcement Learning (RL) approach to design generic Congestion Control (CC) algorithms.
Our solution, MARLIN, uses the Soft Actor-Critic algorithm to maximize both entropy and return.
We trained MARLIN on a real network with varying background traffic patterns to overcome the sim-to-real mismatch.
arXiv Detail & Related papers (2023-02-02T18:27:20Z) - Fair and Efficient Distributed Edge Learning with Hybrid Multipath TCP [62.81300791178381]
The bottleneck of distributed edge learning over wireless has shifted from computing to communication.
Existing TCP-based data networking schemes for DEL are application-agnostic and fail to deliver adjustments according to application layer requirements.
We develop a hybrid multipath TCP (MP TCP) by combining model-based and deep reinforcement learning (DRL) based MP TCP for DEL.
arXiv Detail & Related papers (2022-11-03T09:08:30Z) - When does return-conditioned supervised learning work for offline
reinforcement learning? [51.899892382786526]
We study the capabilities and limitations of return-conditioned supervised learning.
We find that RCSL returns the optimal policy under a set of assumptions stronger than those needed for the more traditional dynamic programming-based algorithms.
arXiv Detail & Related papers (2022-06-02T15:05:42Z) - Combining Reinforcement Learning with Model Predictive Control for
On-Ramp Merging [10.480121529429631]
Two broad classes of techniques have been proposed to solve motion planning problems in autonomous driving: Model Predictive Control (MPC) and Reinforcement Learning (RL)
We first establish the strengths and weaknesses of state-of-the-art MPC and RL-based techniques through simulations.
We subsequently present an algorithm which blends the model-free RL agent with the MPC solution and show that it provides better trade-offs between all metrics -- passenger comfort, efficiency, crash rate and robustness.
arXiv Detail & Related papers (2020-11-17T07:42:11Z) - Critic Regularized Regression [70.8487887738354]
We propose a novel offline RL algorithm to learn policies from data using a form of critic-regularized regression (CRR)
We find that CRR performs surprisingly well and scales to tasks with high-dimensional state and action spaces.
arXiv Detail & Related papers (2020-06-26T17:50:26Z) - Stacked Auto Encoder Based Deep Reinforcement Learning for Online
Resource Scheduling in Large-Scale MEC Networks [44.40722828581203]
An online resource scheduling framework is proposed for minimizing the sum of weighted task latency for all the Internet of things (IoT) users.
A deep reinforcement learning (DRL) based solution is proposed, which includes the following components.
A preserved and prioritized experience replay (2p-ER) is introduced to assist the DRL to train the policy network and find the optimal offloading policy.
arXiv Detail & Related papers (2020-01-24T23:01:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.