Homotopy Based Reinforcement Learning with Maximum Entropy for
Autonomous Air Combat
- URL: http://arxiv.org/abs/2112.01328v1
- Date: Wed, 1 Dec 2021 09:37:55 GMT
- Title: Homotopy Based Reinforcement Learning with Maximum Entropy for
Autonomous Air Combat
- Authors: Yiwen Zhu, Zhou Fang, Yuan Zheng, Wenya Wei
- Abstract summary: The reinforcement learning (RL) method can significantly shorten the decision time via using neural networks.
The sparse reward problem limits its convergence speed and the artificial prior experience reward can easily deviate its optimal convergent direction of the original task.
We propose a homotopy-based soft actor-critic method (HSAC) which focuses on addressing these problems via following the homotopy path between the original task with sparse reward and the auxiliary task with artificial prior experience reward.
- Score: 3.839929995011407
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Intelligent decision of the unmanned combat aerial vehicle (UCAV) has
long been a challenging problem. The conventional search method can hardly
satisfy the real-time demand during high dynamics air combat scenarios. The
reinforcement learning (RL) method can significantly shorten the decision time
via using neural networks. However, the sparse reward problem limits its
convergence speed and the artificial prior experience reward can easily deviate
its optimal convergent direction of the original task, which raises great
difficulties for the RL air combat application. In this paper, we propose a
homotopy-based soft actor-critic method (HSAC) which focuses on addressing
these problems via following the homotopy path between the original task with
sparse reward and the auxiliary task with artificial prior experience reward.
The convergence and the feasibility of this method are also proved in this
paper. To confirm our method feasibly, we construct a detailed 3D air combat
simulation environment for the RL-based methods training firstly, and we
implement our method in both the attack horizontal flight UCAV task and the
self-play confrontation task. Experimental results show that our method
performs better than the methods only utilizing the sparse reward or the
artificial prior experience reward. The agent trained by our method can reach
more than 98.3% win rate in the attack horizontal flight UCAV task and average
67.4% win rate when confronted with the agents trained by the other two
methods.
Related papers
- Autonomous Decision Making for UAV Cooperative Pursuit-Evasion Game with Reinforcement Learning [50.33447711072726]
This paper proposes a deep reinforcement learning-based model for decision-making in multi-role UAV cooperative pursuit-evasion game.
The proposed method enables autonomous decision-making of the UAVs in pursuit-evasion game scenarios.
arXiv Detail & Related papers (2024-11-05T10:45:30Z) - Multi-UAV Pursuit-Evasion with Online Planning in Unknown Environments by Deep Reinforcement Learning [16.761470423715338]
Multi-UAV pursuit-evasion poses a key challenge for UAV swarm intelligence.
We introduce an evader prediction-enhanced network to tackle partial observability in cooperative strategy learning.
We derive a feasible policy via a two-stage reward refinement and deploy the policy on real quadrotors in a zero-shot manner.
arXiv Detail & Related papers (2024-09-24T08:40:04Z) - UAV-enabled Collaborative Beamforming via Multi-Agent Deep Reinforcement Learning [79.16150966434299]
We formulate a UAV-enabled collaborative beamforming multi-objective optimization problem (UCBMOP) to maximize the transmission rate of the UVAA and minimize the energy consumption of all UAVs.
We use the heterogeneous-agent trust region policy optimization (HATRPO) as the basic framework, and then propose an improved HATRPO algorithm, namely HATRPO-UCB.
arXiv Detail & Related papers (2024-04-11T03:19:22Z) - Learning Multi-Pursuit Evasion for Safe Targeted Navigation of Drones [0.0]
This paper proposes a novel approach, asynchronous multi-stage deep reinforcement learning (AMS-DRL), to train adversarial neural networks.
AMS-DRL evolves adversarial agents in a pursuit-evasion game where the pursuers and the evader are asynchronously trained in a bipartite graph way.
We evaluate our method in extensive simulations and show that it outperforms baselines with higher navigation success rates.
arXiv Detail & Related papers (2023-04-07T01:59:16Z) - Reinforcement Learning Based Self-play and State Stacking Techniques for
Noisy Air Combat Environment [1.7403133838762446]
The complexity of air combat arises from aggressive close-range maneuvers and agile enemy behaviors.
In this study, we developed an air combat simulation, which provides noisy observations to the agents.
We present a state stacking method for noisy RL environments as a noise reduction technique.
arXiv Detail & Related papers (2023-03-06T12:23:23Z) - DL-DRL: A double-level deep reinforcement learning approach for
large-scale task scheduling of multi-UAV [65.07776277630228]
We propose a double-level deep reinforcement learning (DL-DRL) approach based on a divide and conquer framework (DCF)
Particularly, we design an encoder-decoder structured policy network in our upper-level DRL model to allocate the tasks to different UAVs.
We also exploit another attention based policy network in our lower-level DRL model to construct the route for each UAV, with the objective to maximize the number of executed tasks.
arXiv Detail & Related papers (2022-08-04T04:35:53Z) - Asynchronous Curriculum Experience Replay: A Deep Reinforcement Learning
Approach for UAV Autonomous Motion Control in Unknown Dynamic Environments [2.635402406262781]
Unmanned aerial vehicles (UAVs) have been widely used in military warfare.
We formulate the autonomous motion control (AMC) problem as a Markov decision process (MDP)
We propose an advanced deep reinforcement learning (DRL) method that allows UAVs to execute complex tasks in large-scale dynamic three-dimensional (3D) environments.
arXiv Detail & Related papers (2022-07-04T08:19:39Z) - Reward Uncertainty for Exploration in Preference-based Reinforcement
Learning [88.34958680436552]
We present an exploration method specifically for preference-based reinforcement learning algorithms.
Our main idea is to design an intrinsic reward by measuring the novelty based on learned reward.
Our experiments show that exploration bonus from uncertainty in learned reward improves both feedback- and sample-efficiency of preference-based RL algorithms.
arXiv Detail & Related papers (2022-05-24T23:22:10Z) - Solving reward-collecting problems with UAVs: a comparison of online
optimization and Q-learning [2.4251007104039006]
We study the problem of identifying a short path from a designated start to a goal, while collecting all rewards and avoiding adversaries that move randomly on the grid.
We present a comparison of three methods to solve this problem: namely we implement a Deep Q-Learning model, an $varepsilon$-greedy tabular Q-Learning model, and an online optimization framework.
Our experiments, designed using simple grid-world environments with random adversaries, showcase how these approaches work and compare them in terms of performance, accuracy, and computational time.
arXiv Detail & Related papers (2021-11-30T22:27:24Z) - Robust Deep Reinforcement Learning through Adversarial Loss [74.20501663956604]
Recent studies have shown that deep reinforcement learning agents are vulnerable to small adversarial perturbations on the agent's inputs.
We propose RADIAL-RL, a principled framework to train reinforcement learning agents with improved robustness against adversarial attacks.
arXiv Detail & Related papers (2020-08-05T07:49:42Z) - Soft Hindsight Experience Replay [77.99182201815763]
Soft Hindsight Experience Replay (SHER) is a novel approach based on HER and Maximum Entropy Reinforcement Learning (MERL)
We evaluate SHER on Open AI Robotic manipulation tasks with sparse rewards.
arXiv Detail & Related papers (2020-02-06T03:57:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.