Related papers: Asynchronous Curriculum Experience Replay: A Deep Reinforcement Learning Approach for UAV Autonomous Motion Control in Unknown Dynamic Environments

Asynchronous Curriculum Experience Replay: A Deep Reinforcement Learning Approach for UAV Autonomous Motion Control in Unknown Dynamic Environments

URL: http://arxiv.org/abs/2207.01251v1
Date: Mon, 4 Jul 2022 08:19:39 GMT
Title: Asynchronous Curriculum Experience Replay: A Deep Reinforcement Learning Approach for UAV Autonomous Motion Control in Unknown Dynamic Environments
Authors: Zijian Hu, Xiaoguang Gao, Kaifang Wan, Qianglong Wang, Yiwei Zhai
Abstract summary: Unmanned aerial vehicles (UAVs) have been widely used in military warfare. We formulate the autonomous motion control (AMC) problem as a Markov decision process (MDP) We propose an advanced deep reinforcement learning (DRL) method that allows UAVs to execute complex tasks in large-scale dynamic three-dimensional (3D) environments.
Score: 2.635402406262781
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Unmanned aerial vehicles (UAVs) have been widely used in military warfare. In this paper, we formulate the autonomous motion control (AMC) problem as a Markov decision process (MDP) and propose an advanced deep reinforcement learning (DRL) method that allows UAVs to execute complex tasks in large-scale dynamic three-dimensional (3D) environments. To overcome the limitations of the prioritized experience replay (PER) algorithm and improve performance, the proposed asynchronous curriculum experience replay (ACER) uses multithreads to asynchronously update the priorities, assigns the true priorities and applies a temporary experience pool to make available experiences of higher quality for learning. A first-in-useless-out (FIUO) experience pool is also introduced to ensure the higher use value of the stored experiences. In addition, combined with curriculum learning (CL), a more reasonable training paradigm of sampling experiences from simple to difficult is designed for training UAVs. By training in a complex unknown environment constructed based on the parameters of a real UAV, the proposed ACER improves the convergence speed by 24.66\% and the convergence result by 5.59\% compared to the state-of-the-art twin delayed deep deterministic policy gradient (TD3) algorithm. The testing experiments carried out in environments with different complexities demonstrate the strong robustness and generalization ability of the ACER agent.

Related papers

Reward Prediction Error Prioritisation in Experience Replay: The RPE-PER Method [1.600323605807673]
We introduce Reward Predictive Error Prioritised Experience Replay (RPE-PER) RPE-PER prioritises experiences in the buffer based on RPEs. Our method employs a critic network, EMCN, that predicts rewards in addition to the Q-values produced by standard critic networks.
arXiv Detail & Related papers (2025-01-30T02:09:35Z)
A Systematic Examination of Preference Learning through the Lens of Instruction-Following [83.71180850955679]
We use a novel synthetic data generation pipeline to generate 48,000 instruction unique-following prompts. With our synthetic prompts, we use two preference dataset curation methods - rejection sampling (RS) and Monte Carlo Tree Search (MCTS) Experiments reveal that shared prefixes in preference pairs, as generated by MCTS, provide marginal but consistent improvements. High-contrast preference pairs generally outperform low-contrast pairs; however, combining both often yields the best performance.
arXiv Detail & Related papers (2024-12-18T15:38:39Z)
Efficient Diversity-based Experience Replay for Deep Reinforcement Learning [14.96744975805832]
This paper proposes a novel approach, diversity-based experience replay (DBER), which leverages the deterministic point process to prioritize diverse samples in state realizations. We conducted extensive experiments on Robotic Manipulation tasks in MuJoCo, Atari games, and realistic in-door environments in Habitat.
arXiv Detail & Related papers (2024-10-27T15:51:27Z)
Dual Test-time Training for Out-of-distribution Recommender System [91.15209066874694]
We propose a novel Dual Test-Time-Training framework for OOD Recommendation, termed DT3OR. In DT3OR, we incorporate a model adaptation mechanism during the test-time phase to carefully update the recommendation model. To the best of our knowledge, this paper is the first work to address OOD recommendation via a test-time-training strategy.
arXiv Detail & Related papers (2024-07-22T13:27:51Z)
Aquatic Navigation: A Challenging Benchmark for Deep Reinforcement Learning [53.3760591018817]
We propose a new benchmarking environment for aquatic navigation using recent advances in the integration between game engines and Deep Reinforcement Learning. Specifically, we focus on PPO, one of the most widely accepted algorithms, and we propose advanced training techniques. Our empirical evaluation shows that a well-designed combination of these ingredients can achieve promising results.
arXiv Detail & Related papers (2024-05-30T23:20:23Z)
UAV-enabled Collaborative Beamforming via Multi-Agent Deep Reinforcement Learning [79.16150966434299]
We formulate a UAV-enabled collaborative beamforming multi-objective optimization problem (UCBMOP) to maximize the transmission rate of the UVAA and minimize the energy consumption of all UAVs. We use the heterogeneous-agent trust region policy optimization (HATRPO) as the basic framework, and then propose an improved HATRPO algorithm, namely HATRPO-UCB.
arXiv Detail & Related papers (2024-04-11T03:19:22Z)
A Dual Curriculum Learning Framework for Multi-UAV Pursuit-Evasion in Diverse Environments [15.959963737956848]
This paper addresses multi-UAV pursuit-evasion, where a group of drones cooperate to capture a fast evader in a confined environment with obstacles. Existing algorithms, which simplify the pursuit-evasion problem, often lack expressive coordination strategies and struggle to capture the evader in extreme scenarios. We introduce a dual curriculum learning framework, named DualCL, which addresses multi-UAV pursuit-evasion in diverse environments and demonstrates zero-shot transfer ability to unseen scenarios.
arXiv Detail & Related papers (2023-12-19T15:39:09Z)
Contrastive Initial State Buffer for Reinforcement Learning [25.849626996870526]
In Reinforcement Learning, the trade-off between exploration and exploitation poses a complex challenge for achieving efficient learning from limited samples. We introduce the concept of a Contrastive Initial State Buffer, which strategically selects states from past experiences and uses them to initialize the agent in the environment. We validate our approach on two complex robotic tasks without relying on any prior information about the environment.
arXiv Detail & Related papers (2023-09-18T13:26:40Z)
Activation to Saliency: Forming High-Quality Labels for Unsupervised Salient Object Detection [54.92703325989853]
We propose a two-stage Activation-to-Saliency (A2S) framework that effectively generates high-quality saliency cues. No human annotations are involved in our framework during the whole training process. Our framework reports significant performance compared with existing USOD methods.
arXiv Detail & Related papers (2021-12-07T11:54:06Z)
Transferable Deep Reinforcement Learning Framework for Autonomous Vehicles with Joint Radar-Data Communications [69.24726496448713]
We propose an intelligent optimization framework based on the Markov Decision Process (MDP) to help the AV make optimal decisions. We then develop an effective learning algorithm leveraging recent advances of deep reinforcement learning techniques to find the optimal policy for the AV. We show that the proposed transferable deep reinforcement learning framework reduces the obstacle miss detection probability by the AV up to 67% compared to other conventional deep reinforcement learning approaches.
arXiv Detail & Related papers (2021-05-28T08:45:37Z)
A Vision Based Deep Reinforcement Learning Algorithm for UAV Obstacle Avoidance [1.2693545159861856]
We present two techniques for improving exploration for UAV obstacle avoidance. The first is a convergence-based approach that uses convergence error to iterate through unexplored actions and temporal threshold to balance exploration and exploitation. The second is a guidance-based approach which uses a Gaussian mixture distribution to compare previously seen states to a predicted next state in order to select the next action.
arXiv Detail & Related papers (2021-03-11T01:15:26Z)
Experience Replay with Likelihood-free Importance Weights [123.52005591531194]
We propose to reweight experiences based on their likelihood under the stationary distribution of the current policy. We apply the proposed approach empirically on two competitive methods, Soft Actor Critic (SAC) and Twin Delayed Deep Deterministic policy gradient (TD3)
arXiv Detail & Related papers (2020-06-23T17:17:44Z)
Experience Augmentation: Boosting and Accelerating Off-Policy Multi-Agent Reinforcement Learning [6.374722265790792]
We present Experience Augmentation, which enables a time-efficient and boosted learning based on a fast, fair and thorough exploration to the environment. We demonstrate our approach by combining it with MADDPG and verifing the performance in two homogeneous and one heterogeneous environments.
arXiv Detail & Related papers (2020-05-19T13:57:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.