Deep Reinforcement Learning-based UAV Navigation and Control: A Soft
Actor-Critic with Hindsight Experience Replay Approach
- URL: http://arxiv.org/abs/2106.01016v1
- Date: Wed, 2 Jun 2021 08:30:14 GMT
- Title: Deep Reinforcement Learning-based UAV Navigation and Control: A Soft
Actor-Critic with Hindsight Experience Replay Approach
- Authors: Myoung Hoon Lee, Jun Moon
- Abstract summary: We propose SACHER (soft actor-critic (SAC) with hindsight experience replay (HER)) as a class of deep reinforcement learning (DRL) algorithms.
We show that SACHER achieves the desired optimal outcomes faster and more accurately than SAC, since HER improves the sample efficiency of SAC.
We apply SACHER to the navigation and control problem of unmanned aerial vehicles (UAVs), where SACHER generates the optimal navigation path.
- Score: 0.9137554315375919
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose SACHER (soft actor-critic (SAC) with hindsight
experience replay (HER)), which constitutes a class of deep reinforcement
learning (DRL) algorithms. SAC is known as an off-policy model-free DRL
algorithm based on the maximum entropy framework, which outperforms earlier DRL
algorithms in terms of exploration, robustness and learning performance.
However, in SAC, maximizing the entropy-augmented objective may degrade the
optimality of the learning outcomes. HER is known as a sample-efficient replay
method that enhances the performance of off-policy DRL algorithms by allowing
them to learn from both failures and successes. We apply HER to SAC and propose
SACHER to improve the learning performance of SAC. More precisely, SACHER
achieves the desired optimal outcomes faster and more accurately than SAC,
since HER improves the sample efficiency of SAC. We apply SACHER to the
navigation and control problem of unmanned aerial vehicles (UAVs), where SACHER
generates the optimal navigation path of the UAV under various obstacles in
operation. Specifically, we show the effectiveness of SACHER in terms of the
tracking error and cumulative reward in UAV operation by comparing them with
those of state-of-the-art DRL algorithms, SAC and DDPG. Note that SACHER in UAV
navigation and control problems can be applied to arbitrary models of UAVs.
Related papers
- UAV-enabled Collaborative Beamforming via Multi-Agent Deep Reinforcement Learning [79.16150966434299]
We formulate a UAV-enabled collaborative beamforming multi-objective optimization problem (UCBMOP) to maximize the transmission rate of the UVAA and minimize the energy consumption of all UAVs.
We use the heterogeneous-agent trust region policy optimization (HATRPO) as the basic framework, and then propose an improved HATRPO algorithm, namely HATRPO-UCB.
arXiv Detail & Related papers (2024-04-11T03:19:22Z) - DSAC-T: Distributional Soft Actor-Critic with Three Refinements [31.590177154247485]
We introduce an off-policy RL algorithm called distributional soft actor-critic (DSAC)
Standard DSAC has its own shortcomings, including occasionally unstable learning processes and the necessity for task-specific reward scaling.
This paper introduces three important refinements to standard DSAC in order to address these shortcomings.
arXiv Detail & Related papers (2023-10-09T16:52:48Z) - RLSAC: Reinforcement Learning enhanced Sample Consensus for End-to-End
Robust Estimation [74.47709320443998]
We propose RLSAC, a novel Reinforcement Learning enhanced SAmple Consensus framework for end-to-end robust estimation.
RLSAC employs a graph neural network to utilize both data and memory features to guide exploring directions for sampling the next minimum set.
Our experimental results demonstrate that RLSAC can learn from features to gradually explore a better hypothesis.
arXiv Detail & Related papers (2023-08-10T03:14:19Z) - CCE: Sample Efficient Sparse Reward Policy Learning for Robotic Navigation via Confidence-Controlled Exploration [72.24964965882783]
Confidence-Controlled Exploration (CCE) is designed to enhance the training sample efficiency of reinforcement learning algorithms for sparse reward settings such as robot navigation.
CCE is based on a novel relationship we provide between gradient estimation and policy entropy.
We demonstrate through simulated and real-world experiments that CCE outperforms conventional methods that employ constant trajectory lengths and entropy regularization.
arXiv Detail & Related papers (2023-06-09T18:45:15Z) - PAC-Bayesian Soft Actor-Critic Learning [9.752336113724928]
Actor-critic algorithms address the dual goals of reinforcement learning (RL), policy evaluation and improvement via two separate function approximators.
We tackle this bottleneck by employing an existing Probably Approximately Correct (PAC) Bayesian bound for the first time as the critic training objective of the Soft Actor-Critic (SAC) algorithm.
arXiv Detail & Related papers (2023-01-30T10:44:15Z) - Deep Black-Box Reinforcement Learning with Movement Primitives [15.184283143878488]
We present a new algorithm for deep reinforcement learning (RL)
It is based on differentiable trust region layers, a successful on-policy deep RL algorithm.
We compare our ERL algorithm to state-of-the-art step-based algorithms in many complex simulated robotic control tasks.
arXiv Detail & Related papers (2022-10-18T06:34:52Z) - DL-DRL: A double-level deep reinforcement learning approach for
large-scale task scheduling of multi-UAV [65.07776277630228]
We propose a double-level deep reinforcement learning (DL-DRL) approach based on a divide and conquer framework (DCF)
Particularly, we design an encoder-decoder structured policy network in our upper-level DRL model to allocate the tasks to different UAVs.
We also exploit another attention based policy network in our lower-level DRL model to construct the route for each UAV, with the objective to maximize the number of executed tasks.
arXiv Detail & Related papers (2022-08-04T04:35:53Z) - Reward Uncertainty for Exploration in Preference-based Reinforcement
Learning [88.34958680436552]
We present an exploration method specifically for preference-based reinforcement learning algorithms.
Our main idea is to design an intrinsic reward by measuring the novelty based on learned reward.
Our experiments show that exploration bonus from uncertainty in learned reward improves both feedback- and sample-efficiency of preference-based RL algorithms.
arXiv Detail & Related papers (2022-05-24T23:22:10Z) - Reinforcement Learning for Robust Missile Autopilot Design [0.0]
This work is pioneer in proposing Reinforcement Learning as a framework for flight control.
Under TRPO's methodology, the collected experience is augmented according to HER, stored in a replay buffer and sampled according to its significance.
Results show that it is possible both to achieve the optimal performance and to improve the agent's robustness to uncertainties.
arXiv Detail & Related papers (2020-11-26T09:30:04Z) - Robust Deep Reinforcement Learning through Adversarial Loss [74.20501663956604]
Recent studies have shown that deep reinforcement learning agents are vulnerable to small adversarial perturbations on the agent's inputs.
We propose RADIAL-RL, a principled framework to train reinforcement learning agents with improved robustness against adversarial attacks.
arXiv Detail & Related papers (2020-08-05T07:49:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.