Maneuver Decision-Making Through Automatic Curriculum Reinforcement
Learning Without Handcrafted Reward functions
- URL: http://arxiv.org/abs/2307.06152v1
- Date: Wed, 12 Jul 2023 13:20:18 GMT
- Title: Maneuver Decision-Making Through Automatic Curriculum Reinforcement
Learning Without Handcrafted Reward functions
- Authors: Zhang Hong-Peng
- Abstract summary: We propose an automatic curriculum reinforcement learning method, which enables agents to learn effective decisions in air combat from scratch.
The range of initial states are used for distinguishing curricula of different difficulty levels.
As sub-tasks change, agents gradually learn to complete a series of sub-tasks from easy to difficult.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Maneuver decision-making is the core of unmanned combat aerial vehicle for
autonomous air combat. To solve this problem, we propose an automatic
curriculum reinforcement learning method, which enables agents to learn
effective decisions in air combat from scratch. The range of initial states are
used for distinguishing curricula of different difficulty levels, thereby
maneuver decision is divided into a series of sub-tasks from easy to difficult,
and test results are used to change sub-tasks. As sub-tasks change, agents
gradually learn to complete a series of sub-tasks from easy to difficult,
enabling them to make effective maneuvering decisions to cope with various
states without the need to spend effort designing reward functions. The
ablation studied show that the automatic curriculum learning proposed in this
article is an essential component for training through reinforcement learning,
namely, agents cannot complete effective decisions without curriculum learning.
Simulation experiments show that, after training, agents are able to make
effective decisions given different states, including tracking, attacking and
escaping, which are both rational and interpretable.
Related papers
- Optimising Human-AI Collaboration by Learning Convincing Explanations [62.81395661556852]
We propose a method for a collaborative system that remains safe by having a human making decisions.
Ardent enables efficient and effective decision-making by adapting to individual preferences for explanations.
arXiv Detail & Related papers (2023-11-13T16:00:16Z) - Hierarchical Multi-Agent Reinforcement Learning for Air Combat
Maneuvering [40.06500618820166]
We propose a hierarchical multi-agent reinforcement learning framework for air-to-air combat with multiple heterogeneous agents.
Low-level policies are trained for accurate unit combat control. The commander policy is trained on mission targets given pre-trained low-level policies.
arXiv Detail & Related papers (2023-09-20T12:16:00Z) - Maneuver Decision-Making For Autonomous Air Combat Through Curriculum
Learning And Reinforcement Learning With Sparse Rewards [0.0]
Three curricula of air combat maneuver decision-making are designed: angle curriculum, distance curriculum and hybrid curriculum.
The training results show that angle curriculum can increase the speed and stability of training, and improve the performance of the agent.
The maneuver decision results are consistent with the characteristics of missile.
arXiv Detail & Related papers (2023-02-12T02:29:12Z) - Rethinking Learning Dynamics in RL using Adversarial Networks [79.56118674435844]
We present a learning mechanism for reinforcement learning of closely related skills parameterized via a skill embedding space.
The main contribution of our work is to formulate an adversarial training regime for reinforcement learning with the help of entropy-regularized policy gradient formulation.
arXiv Detail & Related papers (2022-01-27T19:51:09Z) - Persistent Reinforcement Learning via Subgoal Curricula [114.83989499740193]
Value-accelerated Persistent Reinforcement Learning (VaPRL) generates a curriculum of initial states.
VaPRL reduces the interventions required by three orders of magnitude compared to episodic reinforcement learning.
arXiv Detail & Related papers (2021-07-27T16:39:45Z) - Automatic Curricula via Expert Demonstrations [6.651864489482536]
We propose Automatic Curricula via Expert Demonstrations (ACED) as a reinforcement learning (RL) approach.
ACED extracts curricula from expert demonstration trajectories by dividing demonstrations into sections and initializing training episodes to states sampled from different sections of demonstrations.
We show that a combination of ACED with behavior cloning allows pick-and-place tasks to be learned with as few as 1 demonstration and block stacking tasks to be learned with 20 demonstrations.
arXiv Detail & Related papers (2021-06-16T22:21:09Z) - An Empowerment-based Solution to Robotic Manipulation Tasks with Sparse
Rewards [14.937474939057596]
It is important for robotic manipulators to learn to accomplish tasks even if they are only provided with very sparse instruction signals.
This paper proposes an intrinsic motivation approach that can be easily integrated into any standard reinforcement learning algorithm.
arXiv Detail & Related papers (2020-10-15T19:06:21Z) - Bridging the Imitation Gap by Adaptive Insubordination [88.35564081175642]
We show that when the teaching agent makes decisions with access to privileged information, this information is marginalized during imitation learning.
We propose 'Adaptive Insubordination' (ADVISOR) to address this gap.
ADVISOR dynamically weights imitation and reward-based reinforcement learning losses during training, enabling on-the-fly switching between imitation and exploration.
arXiv Detail & Related papers (2020-07-23T17:59:57Z) - Emergent Real-World Robotic Skills via Unsupervised Off-Policy
Reinforcement Learning [81.12201426668894]
We develop efficient reinforcement learning methods that acquire diverse skills without any reward function, and then repurpose these skills for downstream tasks.
We show that our proposed algorithm provides substantial improvement in learning efficiency, making reward-free real-world training feasible.
We also demonstrate that the learned skills can be composed using model predictive control for goal-oriented navigation, without any additional training.
arXiv Detail & Related papers (2020-04-27T17:38:53Z) - Scalable Multi-Task Imitation Learning with Autonomous Improvement [159.9406205002599]
We build an imitation learning system that can continuously improve through autonomous data collection.
We leverage the robot's own trials as demonstrations for tasks other than the one that the robot actually attempted.
In contrast to prior imitation learning approaches, our method can autonomously collect data with sparse supervision for continuous improvement.
arXiv Detail & Related papers (2020-02-25T18:56:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.