Maneuver Decision-Making Through Proximal Policy Optimization And Monte
Carlo Tree Search
- URL: http://arxiv.org/abs/2309.08611v1
- Date: Mon, 28 Aug 2023 14:48:49 GMT
- Title: Maneuver Decision-Making Through Proximal Policy Optimization And Monte
Carlo Tree Search
- Authors: Zhang Hong-Peng
- Abstract summary: Maneuver decision-making can be regarded as a Markov decision process and can be address by reinforcement learning.
Agents use random actions in the early stages of training, which makes it difficult to get rewards and learn how to make effective decisions.
A method based on proximal policy optimization and Monte Carlo tree search is proposed.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Maneuver decision-making can be regarded as a Markov decision process and can
be address by reinforcement learning. However, original reinforcement learning
algorithms can hardly solve the maneuvering decision-making problem. One reason
is that agents use random actions in the early stages of training, which makes
it difficult to get rewards and learn how to make effective decisions. To
address this issue, a method based on proximal policy optimization and Monte
Carlo tree search is proposed. The method uses proximal policy optimization to
train the agent, and regards the results of air combat as targets to train the
value network. Then, based on the value network and the visit count of each
node, Monte Carlo tree search is used to find the actions with more expected
returns than random actions, which can improve the training performance. The
ablation studies and simulation experiments indicate that agents trained by the
proposed method can make different decisions according to different states,
which demonstrates that the method can solve the maneuvering decision problem
that the original reinforcement learning algorithm cannot solve.
Related papers
- Sample-Efficient Multi-Objective Learning via Generalized Policy
Improvement Prioritization [8.836422771217084]
Multi-objective reinforcement learning (MORL) algorithms tackle sequential decision problems where agents may have different preferences.
We introduce a novel algorithm that uses Generalized Policy Improvement (GPI) to define principled, formally-derived prioritization schemes.
We empirically show that our method outperforms state-of-the-art MORL algorithms in challenging multi-objective tasks.
arXiv Detail & Related papers (2023-01-18T20:54:40Z) - Policy Gradient Algorithms with Monte Carlo Tree Learning for Non-Markov Decision Processes [3.9311044240639568]
Policy gradient (PG) is a reinforcement learning (RL) approach that optimize a parameterized policy model for an expected return using gradient ascent.
While PG can work well even in non-Markovian environments, it may encounter plateaus or peakiness issues.
In this work, we first introduce Monte Carlo Tree Learning (MCTL), an adaptation of MCTS for online RL. We then explore a combined policy approach of PG and MCTL to leverage their strengths.
arXiv Detail & Related papers (2022-06-02T12:21:40Z) - Reward Uncertainty for Exploration in Preference-based Reinforcement
Learning [88.34958680436552]
We present an exploration method specifically for preference-based reinforcement learning algorithms.
Our main idea is to design an intrinsic reward by measuring the novelty based on learned reward.
Our experiments show that exploration bonus from uncertainty in learned reward improves both feedback- and sample-efficiency of preference-based RL algorithms.
arXiv Detail & Related papers (2022-05-24T23:22:10Z) - Inverse Online Learning: Understanding Non-Stationary and Reactionary
Policies [79.60322329952453]
We show how to develop interpretable representations of how agents make decisions.
By understanding the decision-making processes underlying a set of observed trajectories, we cast the policy inference problem as the inverse to this online learning problem.
We introduce a practical algorithm for retrospectively estimating such perceived effects, alongside the process through which agents update them.
Through application to the analysis of UNOS organ donation acceptance decisions, we demonstrate that our approach can bring valuable insights into the factors that govern decision processes and how they change over time.
arXiv Detail & Related papers (2022-03-14T17:40:42Z) - Improving Human Sequential Decision-Making with Reinforcement Learning [29.334511328067777]
We design a novel machine learning algorithm that is capable of extracting "best practices" from trace data.
Our algorithm selects the tip that best bridges the gap between the actions taken by human workers and those taken by the optimal policy.
Experiments show that the tips generated by our algorithm can significantly improve human performance.
arXiv Detail & Related papers (2021-08-19T02:57:58Z) - MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven
Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems.
We propose a novel method for computing the normalized maximum likelihood (NML) distribution.
We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z) - Modularity in Reinforcement Learning via Algorithmic Independence in
Credit Assignment [79.5678820246642]
We show that certain action-value methods are more sample efficient than policy-gradient methods on transfer problems that require only sparse changes to a sequence of previously optimal decisions.
We generalize the recently proposed societal decision-making framework as a more granular formalism than the Markov decision process.
arXiv Detail & Related papers (2021-06-28T21:29:13Z) - A new soft computing method for integration of expert's knowledge in
reinforcement learn-ing problems [1.11412540857944]
The proposed fuzzy nonlinear mapping as-signs each member of the action set to its probability of being chosen in the next step.
A user tunable parameter is introduced to control the action selection policy, which determines the agent's greedy behavior.
Simulation results indicate that including fuzzy logic within the reinforcement learning in the proposed manner improves the learning algorithm's convergence rate.
arXiv Detail & Related papers (2021-06-13T20:41:29Z) - Identification of Unexpected Decisions in Partially Observable
Monte-Carlo Planning: a Rule-Based Approach [78.05638156687343]
We propose a methodology for analyzing POMCP policies by inspecting their traces.
The proposed method explores local properties of policy behavior to identify unexpected decisions.
We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to mobile robot navigation.
arXiv Detail & Related papers (2020-12-23T15:09:28Z) - Automatic Discovery of Interpretable Planning Strategies [9.410583483182657]
We introduce AI-Interpret, a method for transforming idiosyncratic policies into simple and interpretable descriptions.
We show that prividing the decision rules generated by AI-Interpret as flowcharts significantly improved people's planning strategies and decisions.
arXiv Detail & Related papers (2020-05-24T12:24:52Z) - Reward-Conditioned Policies [100.64167842905069]
imitation learning requires near-optimal expert data.
Can we learn effective policies via supervised learning without demonstrations?
We show how such an approach can be derived as a principled method for policy search.
arXiv Detail & Related papers (2019-12-31T18:07:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.