Constrained Deep Reinforcement Learning for Energy Sustainable Multi-UAV
based Random Access IoT Networks with NOMA
- URL: http://arxiv.org/abs/2002.00073v2
- Date: Wed, 5 Feb 2020 02:03:28 GMT
- Title: Constrained Deep Reinforcement Learning for Energy Sustainable Multi-UAV
based Random Access IoT Networks with NOMA
- Authors: Sami Khairy, Prasanna Balaprakash, Lin X. Cai, Yu Cheng
- Abstract summary: We apply the Non-Orthogonal Multiple Access technique to improve massive channel access of a wireless IoT network where solar-powered Unmanned Aerial Vehicles (UAVs) relay data from IoT devices to remote servers.
IoT devices contend for accessing the shared wireless channel using an adaptive $p$-persistent slotted Aloha protocol; and the solar-powered UAVs adopt Successive Interference Cancellation (SIC) to decode multiple received data from IoT devices to improve access efficiency.
- Score: 20.160827428161898
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we apply the Non-Orthogonal Multiple Access (NOMA) technique
to improve the massive channel access of a wireless IoT network where
solar-powered Unmanned Aerial Vehicles (UAVs) relay data from IoT devices to
remote servers. Specifically, IoT devices contend for accessing the shared
wireless channel using an adaptive $p$-persistent slotted Aloha protocol; and
the solar-powered UAVs adopt Successive Interference Cancellation (SIC) to
decode multiple received data from IoT devices to improve access efficiency. To
enable an energy-sustainable capacity-optimal network, we study the joint
problem of dynamic multi-UAV altitude control and multi-cell wireless channel
access management of IoT devices as a stochastic control problem with multiple
energy constraints. To learn an optimal control policy, we first formulate this
problem as a Constrained Markov Decision Process (CMDP), and propose an online
model-free Constrained Deep Reinforcement Learning (CDRL) algorithm based on
Lagrangian primal-dual policy optimization to solve the CMDP. Extensive
simulations demonstrate that our proposed algorithm learns a cooperative policy
among UAVs in which the altitude of UAVs and channel access probability of IoT
devices are dynamically and jointly controlled to attain the maximal long-term
network capacity while maintaining energy sustainability of UAVs. The proposed
algorithm outperforms Deep RL based solutions with reward shaping to account
for energy costs, and achieves a temporal average system capacity which is
$82.4\%$ higher than that of a feasible DRL based solution, and only $6.47\%$
lower compared to that of the energy-constraint-free system.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.