Asynchronous Curriculum Experience Replay: A Deep Reinforcement Learning
Approach for UAV Autonomous Motion Control in Unknown Dynamic Environments
- URL: http://arxiv.org/abs/2207.01251v1
- Date: Mon, 4 Jul 2022 08:19:39 GMT
- Title: Asynchronous Curriculum Experience Replay: A Deep Reinforcement Learning
Approach for UAV Autonomous Motion Control in Unknown Dynamic Environments
- Authors: Zijian Hu, Xiaoguang Gao, Kaifang Wan, Qianglong Wang, Yiwei Zhai
- Abstract summary: Unmanned aerial vehicles (UAVs) have been widely used in military warfare.
We formulate the autonomous motion control (AMC) problem as a Markov decision process (MDP)
We propose an advanced deep reinforcement learning (DRL) method that allows UAVs to execute complex tasks in large-scale dynamic three-dimensional (3D) environments.
- Score: 2.635402406262781
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unmanned aerial vehicles (UAVs) have been widely used in military warfare. In
this paper, we formulate the autonomous motion control (AMC) problem as a
Markov decision process (MDP) and propose an advanced deep reinforcement
learning (DRL) method that allows UAVs to execute complex tasks in large-scale
dynamic three-dimensional (3D) environments. To overcome the limitations of the
prioritized experience replay (PER) algorithm and improve performance, the
proposed asynchronous curriculum experience replay (ACER) uses multithreads to
asynchronously update the priorities, assigns the true priorities and applies a
temporary experience pool to make available experiences of higher quality for
learning. A first-in-useless-out (FIUO) experience pool is also introduced to
ensure the higher use value of the stored experiences. In addition, combined
with curriculum learning (CL), a more reasonable training paradigm of sampling
experiences from simple to difficult is designed for training UAVs. By training
in a complex unknown environment constructed based on the parameters of a real
UAV, the proposed ACER improves the convergence speed by 24.66\% and the
convergence result by 5.59\% compared to the state-of-the-art twin delayed deep
deterministic policy gradient (TD3) algorithm. The testing experiments carried
out in environments with different complexities demonstrate the strong
robustness and generalization ability of the ACER agent.
Related papers
- Efficient Diversity-based Experience Replay for Deep Reinforcement Learning [14.96744975805832]
This paper proposes a novel approach, diversity-based experience replay (DBER), which leverages the deterministic point process to prioritize diverse samples in state realizations.
We conducted extensive experiments on Robotic Manipulation tasks in MuJoCo, Atari games, and realistic in-door environments in Habitat.
arXiv Detail & Related papers (2024-10-27T15:51:27Z) - Dual Test-time Training for Out-of-distribution Recommender System [91.15209066874694]
We propose a novel Dual Test-Time-Training framework for OOD Recommendation, termed DT3OR.
In DT3OR, we incorporate a model adaptation mechanism during the test-time phase to carefully update the recommendation model.
To the best of our knowledge, this paper is the first work to address OOD recommendation via a test-time-training strategy.
arXiv Detail & Related papers (2024-07-22T13:27:51Z) - Aquatic Navigation: A Challenging Benchmark for Deep Reinforcement Learning [53.3760591018817]
We propose a new benchmarking environment for aquatic navigation using recent advances in the integration between game engines and Deep Reinforcement Learning.
Specifically, we focus on PPO, one of the most widely accepted algorithms, and we propose advanced training techniques.
Our empirical evaluation shows that a well-designed combination of these ingredients can achieve promising results.
arXiv Detail & Related papers (2024-05-30T23:20:23Z) - UAV-enabled Collaborative Beamforming via Multi-Agent Deep Reinforcement Learning [79.16150966434299]
We formulate a UAV-enabled collaborative beamforming multi-objective optimization problem (UCBMOP) to maximize the transmission rate of the UVAA and minimize the energy consumption of all UAVs.
We use the heterogeneous-agent trust region policy optimization (HATRPO) as the basic framework, and then propose an improved HATRPO algorithm, namely HATRPO-UCB.
arXiv Detail & Related papers (2024-04-11T03:19:22Z) - A Dual Curriculum Learning Framework for Multi-UAV Pursuit-Evasion in Diverse Environments [15.959963737956848]
This paper addresses multi-UAV pursuit-evasion, where a group of drones cooperate to capture a fast evader in a confined environment with obstacles.
Existing algorithms, which simplify the pursuit-evasion problem, often lack expressive coordination strategies and struggle to capture the evader in extreme scenarios.
We introduce a dual curriculum learning framework, named DualCL, which addresses multi-UAV pursuit-evasion in diverse environments and demonstrates zero-shot transfer ability to unseen scenarios.
arXiv Detail & Related papers (2023-12-19T15:39:09Z) - Contrastive Initial State Buffer for Reinforcement Learning [25.849626996870526]
In Reinforcement Learning, the trade-off between exploration and exploitation poses a complex challenge for achieving efficient learning from limited samples.
We introduce the concept of a Contrastive Initial State Buffer, which strategically selects states from past experiences and uses them to initialize the agent in the environment.
We validate our approach on two complex robotic tasks without relying on any prior information about the environment.
arXiv Detail & Related papers (2023-09-18T13:26:40Z) - Activation to Saliency: Forming High-Quality Labels for Unsupervised
Salient Object Detection [54.92703325989853]
We propose a two-stage Activation-to-Saliency (A2S) framework that effectively generates high-quality saliency cues.
No human annotations are involved in our framework during the whole training process.
Our framework reports significant performance compared with existing USOD methods.
arXiv Detail & Related papers (2021-12-07T11:54:06Z) - Transferable Deep Reinforcement Learning Framework for Autonomous
Vehicles with Joint Radar-Data Communications [69.24726496448713]
We propose an intelligent optimization framework based on the Markov Decision Process (MDP) to help the AV make optimal decisions.
We then develop an effective learning algorithm leveraging recent advances of deep reinforcement learning techniques to find the optimal policy for the AV.
We show that the proposed transferable deep reinforcement learning framework reduces the obstacle miss detection probability by the AV up to 67% compared to other conventional deep reinforcement learning approaches.
arXiv Detail & Related papers (2021-05-28T08:45:37Z) - A Vision Based Deep Reinforcement Learning Algorithm for UAV Obstacle
Avoidance [1.2693545159861856]
We present two techniques for improving exploration for UAV obstacle avoidance.
The first is a convergence-based approach that uses convergence error to iterate through unexplored actions and temporal threshold to balance exploration and exploitation.
The second is a guidance-based approach which uses a Gaussian mixture distribution to compare previously seen states to a predicted next state in order to select the next action.
arXiv Detail & Related papers (2021-03-11T01:15:26Z) - Experience Replay with Likelihood-free Importance Weights [123.52005591531194]
We propose to reweight experiences based on their likelihood under the stationary distribution of the current policy.
We apply the proposed approach empirically on two competitive methods, Soft Actor Critic (SAC) and Twin Delayed Deep Deterministic policy gradient (TD3)
arXiv Detail & Related papers (2020-06-23T17:17:44Z) - Experience Augmentation: Boosting and Accelerating Off-Policy
Multi-Agent Reinforcement Learning [6.374722265790792]
We present Experience Augmentation, which enables a time-efficient and boosted learning based on a fast, fair and thorough exploration to the environment.
We demonstrate our approach by combining it with MADDPG and verifing the performance in two homogeneous and one heterogeneous environments.
arXiv Detail & Related papers (2020-05-19T13:57:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.