Related papers: Deep Reinforcement Learning-based UAV Navigation and Control: A Soft Actor-Critic with Hindsight Experience Replay Approach

Deep Reinforcement Learning-based UAV Navigation and Control: A Soft Actor-Critic with Hindsight Experience Replay Approach

URL: http://arxiv.org/abs/2106.01016v1
Date: Wed, 2 Jun 2021 08:30:14 GMT
Title: Deep Reinforcement Learning-based UAV Navigation and Control: A Soft Actor-Critic with Hindsight Experience Replay Approach
Authors: Myoung Hoon Lee, Jun Moon
Abstract summary: We propose SACHER (soft actor-critic (SAC) with hindsight experience replay (HER)) as a class of deep reinforcement learning (DRL) algorithms. We show that SACHER achieves the desired optimal outcomes faster and more accurately than SAC, since HER improves the sample efficiency of SAC. We apply SACHER to the navigation and control problem of unmanned aerial vehicles (UAVs), where SACHER generates the optimal navigation path.
Score: 0.9137554315375919
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we propose SACHER (soft actor-critic (SAC) with hindsight experience replay (HER)), which constitutes a class of deep reinforcement learning (DRL) algorithms. SAC is known as an off-policy model-free DRL algorithm based on the maximum entropy framework, which outperforms earlier DRL algorithms in terms of exploration, robustness and learning performance. However, in SAC, maximizing the entropy-augmented objective may degrade the optimality of the learning outcomes. HER is known as a sample-efficient replay method that enhances the performance of off-policy DRL algorithms by allowing them to learn from both failures and successes. We apply HER to SAC and propose SACHER to improve the learning performance of SAC. More precisely, SACHER achieves the desired optimal outcomes faster and more accurately than SAC, since HER improves the sample efficiency of SAC. We apply SACHER to the navigation and control problem of unmanned aerial vehicles (UAVs), where SACHER generates the optimal navigation path of the UAV under various obstacles in operation. Specifically, we show the effectiveness of SACHER in terms of the tracking error and cumulative reward in UAV operation by comparing them with those of state-of-the-art DRL algorithms, SAC and DDPG. Note that SACHER in UAV navigation and control problems can be applied to arbitrary models of UAVs.

Related papers

Decorrelated Soft Actor-Critic for Efficient Deep Reinforcement Learning [1.2597747768235847]
We propose a novel approach to online decorrelation in deep RL based on the decorrelated backpropagation algorithm. Experiments on the Atari 100k benchmark with DSAC shows, compared to the regular SAC baseline, faster training in five out of the seven games tested.
arXiv Detail & Related papers (2025-01-31T13:38:57Z)
Langevin Soft Actor-Critic: Efficient Exploration through Uncertainty-Driven Critic Learning [33.42657871152637]
Langevin Soft Actor Critic (LSAC) prioritizes enhancing critic learning through uncertainty estimation over policy optimization. LSAC outperforms or matches the performance of mainstream model-free RL algorithms for continuous control tasks. Notably, LSAC marks the first successful application of an LMC based Thompson sampling in continuous control tasks with continuous action spaces.
arXiv Detail & Related papers (2025-01-29T18:18:00Z)
Adaptive Data Exploitation in Deep Reinforcement Learning [50.53705050673944]
We introduce ADEPT, a powerful framework to enhance the **data efficiency** and **generalization** in deep reinforcement learning (RL) Specifically, ADEPT adaptively manages the use of sampled data across different learning stages via multi-armed bandit (MAB) algorithms. We test ADEPT on benchmarks including Procgen, MiniGrid, and PyBullet.
arXiv Detail & Related papers (2025-01-22T04:01:17Z)
UAV-enabled Collaborative Beamforming via Multi-Agent Deep Reinforcement Learning [79.16150966434299]
We formulate a UAV-enabled collaborative beamforming multi-objective optimization problem (UCBMOP) to maximize the transmission rate of the UVAA and minimize the energy consumption of all UAVs. We use the heterogeneous-agent trust region policy optimization (HATRPO) as the basic framework, and then propose an improved HATRPO algorithm, namely HATRPO-UCB.
arXiv Detail & Related papers (2024-04-11T03:19:22Z)
DSAC-T: Distributional Soft Actor-Critic with Three Refinements [31.590177154247485]
We introduce an off-policy RL algorithm called distributional soft actor-critic (DSAC) Standard DSAC has its own shortcomings, including occasionally unstable learning processes and the necessity for task-specific reward scaling. This paper introduces three important refinements to standard DSAC in order to address these shortcomings.
arXiv Detail & Related papers (2023-10-09T16:52:48Z)
RLSAC: Reinforcement Learning enhanced Sample Consensus for End-to-End Robust Estimation [74.47709320443998]
We propose RLSAC, a novel Reinforcement Learning enhanced SAmple Consensus framework for end-to-end robust estimation. RLSAC employs a graph neural network to utilize both data and memory features to guide exploring directions for sampling the next minimum set. Our experimental results demonstrate that RLSAC can learn from features to gradually explore a better hypothesis.
arXiv Detail & Related papers (2023-08-10T03:14:19Z)
CCE: Sample Efficient Sparse Reward Policy Learning for Robotic Navigation via Confidence-Controlled Exploration [72.24964965882783]
Confidence-Controlled Exploration (CCE) is designed to enhance the training sample efficiency of reinforcement learning algorithms for sparse reward settings such as robot navigation. CCE is based on a novel relationship we provide between gradient estimation and policy entropy. We demonstrate through simulated and real-world experiments that CCE outperforms conventional methods that employ constant trajectory lengths and entropy regularization.
arXiv Detail & Related papers (2023-06-09T18:45:15Z)
PAC-Bayesian Soft Actor-Critic Learning [9.752336113724928]
Actor-critic algorithms address the dual goals of reinforcement learning (RL), policy evaluation and improvement via two separate function approximators. We tackle this bottleneck by employing an existing Probably Approximately Correct (PAC) Bayesian bound for the first time as the critic training objective of the Soft Actor-Critic (SAC) algorithm.
arXiv Detail & Related papers (2023-01-30T10:44:15Z)
Deep Black-Box Reinforcement Learning with Movement Primitives [15.184283143878488]
We present a new algorithm for deep reinforcement learning (RL) It is based on differentiable trust region layers, a successful on-policy deep RL algorithm. We compare our ERL algorithm to state-of-the-art step-based algorithms in many complex simulated robotic control tasks.
arXiv Detail & Related papers (2022-10-18T06:34:52Z)
DL-DRL: A double-level deep reinforcement learning approach for large-scale task scheduling of multi-UAV [65.07776277630228]
We propose a double-level deep reinforcement learning (DL-DRL) approach based on a divide and conquer framework (DCF) Particularly, we design an encoder-decoder structured policy network in our upper-level DRL model to allocate the tasks to different UAVs. We also exploit another attention based policy network in our lower-level DRL model to construct the route for each UAV, with the objective to maximize the number of executed tasks.
arXiv Detail & Related papers (2022-08-04T04:35:53Z)
Reward Uncertainty for Exploration in Preference-based Reinforcement Learning [88.34958680436552]
We present an exploration method specifically for preference-based reinforcement learning algorithms. Our main idea is to design an intrinsic reward by measuring the novelty based on learned reward. Our experiments show that exploration bonus from uncertainty in learned reward improves both feedback- and sample-efficiency of preference-based RL algorithms.
arXiv Detail & Related papers (2022-05-24T23:22:10Z)
Reinforcement Learning for Robust Missile Autopilot Design [0.0]
This work is pioneer in proposing Reinforcement Learning as a framework for flight control. Under TRPO's methodology, the collected experience is augmented according to HER, stored in a replay buffer and sampled according to its significance. Results show that it is possible both to achieve the optimal performance and to improve the agent's robustness to uncertainties.
arXiv Detail & Related papers (2020-11-26T09:30:04Z)
Robust Deep Reinforcement Learning through Adversarial Loss [74.20501663956604]
Recent studies have shown that deep reinforcement learning agents are vulnerable to small adversarial perturbations on the agent's inputs. We propose RADIAL-RL, a principled framework to train reinforcement learning agents with improved robustness against adversarial attacks.
arXiv Detail & Related papers (2020-08-05T07:49:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.