Reinforcement Learning Based Self-play and State Stacking Techniques for
Noisy Air Combat Environment
- URL: http://arxiv.org/abs/2303.03068v1
- Date: Mon, 6 Mar 2023 12:23:23 GMT
- Title: Reinforcement Learning Based Self-play and State Stacking Techniques for
Noisy Air Combat Environment
- Authors: Ahmet Semih Tasbas, Safa Onur Sahin, Nazim Kemal Ure
- Abstract summary: The complexity of air combat arises from aggressive close-range maneuvers and agile enemy behaviors.
In this study, we developed an air combat simulation, which provides noisy observations to the agents.
We present a state stacking method for noisy RL environments as a noise reduction technique.
- Score: 1.7403133838762446
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reinforcement learning (RL) has recently proven itself as a powerful
instrument for solving complex problems and even surpassed human performance in
several challenging applications. This signifies that RL algorithms can be used
in the autonomous air combat problem, which has been studied for many years.
The complexity of air combat arises from aggressive close-range maneuvers and
agile enemy behaviors. In addition to these complexities, there may be
uncertainties in real-life scenarios due to sensor errors, which prevent
estimation of the actual position of the enemy. In this case, autonomous
aircraft should be successful even in the noisy environments. In this study, we
developed an air combat simulation, which provides noisy observations to the
agents, therefore, make the air combat problem even more challenging. Thus, we
present a state stacking method for noisy RL environments as a noise reduction
technique. In our extensive set of experiments, the proposed method
significantly outperforms the baseline algorithms in terms of the winning
ratio, where the performance improvement is even more pronounced in the high
noise levels. In addition, we incorporate a self-play scheme to our training
process by periodically updating the enemy with a frozen copy of the training
agent. By this way, the training agent performs air combat simulations to an
enemy with smarter strategies, which improves the performance and robustness of
the agents. In our simulations, we demonstrate that the self-play scheme
provides important performance gains compared to the classical RL training.
Related papers
- Intercepting Unauthorized Aerial Robots in Controlled Airspace Using Reinforcement Learning [2.519319150166215]
The proliferation of unmanned aerial vehicles (UAVs) in controlled airspace presents significant risks.
This work addresses the need for robust, adaptive systems capable of managing such threats through the use of Reinforcement Learning (RL)
We present a novel approach utilizing RL to train fixed-wing UAV pursuer agents for intercepting dynamic evader targets.
arXiv Detail & Related papers (2024-07-09T14:45:47Z) - Aquatic Navigation: A Challenging Benchmark for Deep Reinforcement Learning [53.3760591018817]
We propose a new benchmarking environment for aquatic navigation using recent advances in the integration between game engines and Deep Reinforcement Learning.
Specifically, we focus on PPO, one of the most widely accepted algorithms, and we propose advanced training techniques.
Our empirical evaluation shows that a well-designed combination of these ingredients can achieve promising results.
arXiv Detail & Related papers (2024-05-30T23:20:23Z) - Adversarial Attacks on Reinforcement Learning Agents for Command and Control [6.05332129899857]
Recent work has shown that learning based approaches are highly susceptible to adversarial perturbations.
In this paper, we investigate the robustness of an agent trained for a Command and Control task in an environment controlled by an adversary.
We empirically show that an agent trained using these algorithms is highly susceptible to noise injected by the adversary.
arXiv Detail & Related papers (2024-05-02T19:28:55Z) - Rethinking Closed-loop Training for Autonomous Driving [82.61418945804544]
We present the first empirical study which analyzes the effects of different training benchmark designs on the success of learning agents.
We propose trajectory value learning (TRAVL), an RL-based driving agent that performs planning with multistep look-ahead.
Our experiments show that TRAVL can learn much faster and produce safer maneuvers compared to all the baselines.
arXiv Detail & Related papers (2023-06-27T17:58:39Z) - Autonomous Agent for Beyond Visual Range Air Combat: A Deep
Reinforcement Learning Approach [0.2578242050187029]
This work contributes to developing an agent based on deep reinforcement learning capable of acting in a beyond visual range (BVR) air combat simulation environment.
The paper presents an overview of building an agent representing a high-performance fighter aircraft that can learn and improve its role in BVR combat over time.
It also hopes to examine a real pilot's ability, using virtual simulation, to interact in the same environment with the trained agent and compare their performances.
arXiv Detail & Related papers (2023-04-19T13:54:37Z) - Anchored Learning for On-the-Fly Adaptation -- Extended Technical Report [45.123633153460034]
This study presents "anchor critics", a novel strategy for enhancing the robustness of reinforcement learning (RL) agents in crossing the sim-to-real gap.
We identify that naive fine-tuning approaches lead to catastrophic forgetting, where policies maintain high rewards on frequently encountered states but lose performance on rarer, yet critical scenarios.
Evaluations demonstrate that our approach enables behavior retention in sim-to-sim gymnasium tasks and in sim-to-real scenarios with racing quadrotors, achieving a near-50% reduction in power consumption while maintaining controllable, stable flight.
arXiv Detail & Related papers (2023-01-17T16:16:53Z) - Accelerated Policy Learning with Parallel Differentiable Simulation [59.665651562534755]
We present a differentiable simulator and a new policy learning algorithm (SHAC)
Our algorithm alleviates problems with local minima through a smooth critic function.
We show substantial improvements in sample efficiency and wall-clock time over state-of-the-art RL and differentiable simulation-based algorithms.
arXiv Detail & Related papers (2022-04-14T17:46:26Z) - Homotopy Based Reinforcement Learning with Maximum Entropy for
Autonomous Air Combat [3.839929995011407]
The reinforcement learning (RL) method can significantly shorten the decision time via using neural networks.
The sparse reward problem limits its convergence speed and the artificial prior experience reward can easily deviate its optimal convergent direction of the original task.
We propose a homotopy-based soft actor-critic method (HSAC) which focuses on addressing these problems via following the homotopy path between the original task with sparse reward and the auxiliary task with artificial prior experience reward.
arXiv Detail & Related papers (2021-12-01T09:37:55Z) - Improving Robustness of Reinforcement Learning for Power System Control
with Adversarial Training [71.7750435554693]
We show that several state-of-the-art RL agents proposed for power system control are vulnerable to adversarial attacks.
Specifically, we use an adversary Markov Decision Process to learn an attack policy, and demonstrate the potency of our attack.
We propose to use adversarial training to increase the robustness of RL agent against attacks and avoid infeasible operational decisions.
arXiv Detail & Related papers (2021-10-18T00:50:34Z) - Robust Deep Reinforcement Learning through Adversarial Loss [74.20501663956604]
Recent studies have shown that deep reinforcement learning agents are vulnerable to small adversarial perturbations on the agent's inputs.
We propose RADIAL-RL, a principled framework to train reinforcement learning agents with improved robustness against adversarial attacks.
arXiv Detail & Related papers (2020-08-05T07:49:42Z) - Forgetful Experience Replay in Hierarchical Reinforcement Learning from
Demonstrations [55.41644538483948]
In this paper, we propose a combination of approaches that allow the agent to use low-quality demonstrations in complex vision-based environments.
Our proposed goal-oriented structuring of replay buffer allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations.
The solution based on our algorithm beats all the solutions for the famous MineRL competition and allows the agent to mine a diamond in the Minecraft environment.
arXiv Detail & Related papers (2020-06-17T15:38:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.