Deep Reinforcement Learning-based UAV Navigation and Control: A Soft
Actor-Critic with Hindsight Experience Replay Approach
- URL: http://arxiv.org/abs/2106.01016v1
- Date: Wed, 2 Jun 2021 08:30:14 GMT
- Title: Deep Reinforcement Learning-based UAV Navigation and Control: A Soft
Actor-Critic with Hindsight Experience Replay Approach
- Authors: Myoung Hoon Lee, Jun Moon
- Abstract summary: We propose SACHER (soft actor-critic (SAC) with hindsight experience replay (HER)) as a class of deep reinforcement learning (DRL) algorithms.
We show that SACHER achieves the desired optimal outcomes faster and more accurately than SAC, since HER improves the sample efficiency of SAC.
We apply SACHER to the navigation and control problem of unmanned aerial vehicles (UAVs), where SACHER generates the optimal navigation path.
- Score: 0.9137554315375919
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose SACHER (soft actor-critic (SAC) with hindsight
experience replay (HER)), which constitutes a class of deep reinforcement
learning (DRL) algorithms. SAC is known as an off-policy model-free DRL
algorithm based on the maximum entropy framework, which outperforms earlier DRL
algorithms in terms of exploration, robustness and learning performance.
However, in SAC, maximizing the entropy-augmented objective may degrade the
optimality of the learning outcomes. HER is known as a sample-efficient replay
method that enhances the performance of off-policy DRL algorithms by allowing
them to learn from both failures and successes. We apply HER to SAC and propose
SACHER to improve the learning performance of SAC. More precisely, SACHER
achieves the desired optimal outcomes faster and more accurately than SAC,
since HER improves the sample efficiency of SAC. We apply SACHER to the
navigation and control problem of unmanned aerial vehicles (UAVs), where SACHER
generates the optimal navigation path of the UAV under various obstacles in
operation. Specifically, we show the effectiveness of SACHER in terms of the
tracking error and cumulative reward in UAV operation by comparing them with
those of state-of-the-art DRL algorithms, SAC and DDPG. Note that SACHER in UAV
navigation and control problems can be applied to arbitrary models of UAVs.
Related papers
- Decorrelated Soft Actor-Critic for Efficient Deep Reinforcement Learning [1.2597747768235847]
We propose a novel approach to online decorrelation in deep RL based on the decorrelated backpropagation algorithm.
Experiments on the Atari 100k benchmark with DSAC shows, compared to the regular SAC baseline, faster training in five out of the seven games tested.
arXiv Detail & Related papers (2025-01-31T13:38:57Z) - Langevin Soft Actor-Critic: Efficient Exploration through Uncertainty-Driven Critic Learning [33.42657871152637]
Langevin Soft Actor Critic (LSAC) prioritizes enhancing critic learning through uncertainty estimation over policy optimization.
LSAC outperforms or matches the performance of mainstream model-free RL algorithms for continuous control tasks.
Notably, LSAC marks the first successful application of an LMC based Thompson sampling in continuous control tasks with continuous action spaces.
arXiv Detail & Related papers (2025-01-29T18:18:00Z) - Adaptive Data Exploitation in Deep Reinforcement Learning [50.53705050673944]
We introduce ADEPT, a powerful framework to enhance the **data efficiency** and **generalization** in deep reinforcement learning (RL)
Specifically, ADEPT adaptively manages the use of sampled data across different learning stages via multi-armed bandit (MAB) algorithms.
We test ADEPT on benchmarks including Procgen, MiniGrid, and PyBullet.
arXiv Detail & Related papers (2025-01-22T04:01:17Z) - UAV-enabled Collaborative Beamforming via Multi-Agent Deep Reinforcement Learning [79.16150966434299]
We formulate a UAV-enabled collaborative beamforming multi-objective optimization problem (UCBMOP) to maximize the transmission rate of the UVAA and minimize the energy consumption of all UAVs.
We use the heterogeneous-agent trust region policy optimization (HATRPO) as the basic framework, and then propose an improved HATRPO algorithm, namely HATRPO-UCB.
arXiv Detail & Related papers (2024-04-11T03:19:22Z) - Distributional Soft Actor-Critic with Three Refinements [47.46661939652862]
Reinforcement learning (RL) has shown remarkable success in solving complex decision-making and control tasks.
Many model-free RL algorithms experience performance degradation due to inaccurate value estimation.
This paper introduces three key refinements to DSACv1 to overcome these limitations and further improve Q-value estimation accuracy.
arXiv Detail & Related papers (2023-10-09T16:52:48Z) - RLSAC: Reinforcement Learning enhanced Sample Consensus for End-to-End
Robust Estimation [74.47709320443998]
We propose RLSAC, a novel Reinforcement Learning enhanced SAmple Consensus framework for end-to-end robust estimation.
RLSAC employs a graph neural network to utilize both data and memory features to guide exploring directions for sampling the next minimum set.
Our experimental results demonstrate that RLSAC can learn from features to gradually explore a better hypothesis.
arXiv Detail & Related papers (2023-08-10T03:14:19Z) - PAC-Bayesian Soft Actor-Critic Learning [9.752336113724928]
Actor-critic algorithms address the dual goals of reinforcement learning (RL), policy evaluation and improvement via two separate function approximators.
We tackle this bottleneck by employing an existing Probably Approximately Correct (PAC) Bayesian bound for the first time as the critic training objective of the Soft Actor-Critic (SAC) algorithm.
arXiv Detail & Related papers (2023-01-30T10:44:15Z) - DL-DRL: A double-level deep reinforcement learning approach for
large-scale task scheduling of multi-UAV [65.07776277630228]
We propose a double-level deep reinforcement learning (DL-DRL) approach based on a divide and conquer framework (DCF)
Particularly, we design an encoder-decoder structured policy network in our upper-level DRL model to allocate the tasks to different UAVs.
We also exploit another attention based policy network in our lower-level DRL model to construct the route for each UAV, with the objective to maximize the number of executed tasks.
arXiv Detail & Related papers (2022-08-04T04:35:53Z) - Robust Deep Reinforcement Learning through Adversarial Loss [74.20501663956604]
Recent studies have shown that deep reinforcement learning agents are vulnerable to small adversarial perturbations on the agent's inputs.
We propose RADIAL-RL, a principled framework to train reinforcement learning agents with improved robustness against adversarial attacks.
arXiv Detail & Related papers (2020-08-05T07:49:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.