Online Reinforcement Learning-Based Dynamic Adaptive Evaluation Function for Real-Time Strategy Tasks
- URL: http://arxiv.org/abs/2501.03824v1
- Date: Tue, 07 Jan 2025 14:36:33 GMT
- Title: Online Reinforcement Learning-Based Dynamic Adaptive Evaluation Function for Real-Time Strategy Tasks
- Authors: Weilong Yang, Jie Zhang, Xunyun Liu, Yanqing Ye,
- Abstract summary: Effective evaluation of real-time strategy tasks requires adaptive mechanisms to cope with dynamic and unpredictable environments.
This study proposes a method to improve evaluation functions for real-time responsiveness to battle-field situation changes.
- Score: 5.115170525117103
- License:
- Abstract: Effective evaluation of real-time strategy tasks requires adaptive mechanisms to cope with dynamic and unpredictable environments. This study proposes a method to improve evaluation functions for real-time responsiveness to battle-field situation changes, utilizing an online reinforcement learning-based dynam-ic weight adjustment mechanism within the real-time strategy game. Building on traditional static evaluation functions, the method employs gradient descent in online reinforcement learning to update weights dynamically, incorporating weight decay techniques to ensure stability. Additionally, the AdamW optimizer is integrated to adjust the learning rate and decay rate of online reinforcement learning in real time, further reducing the dependency on manual parameter tun-ing. Round-robin competition experiments demonstrate that this method signifi-cantly enhances the application effectiveness of the Lanchester combat model evaluation function, Simple evaluation function, and Simple Sqrt evaluation function in planning algorithms including IDABCD, IDRTMinimax, and Port-folio AI. The method achieves a notable improvement in scores, with the en-hancement becoming more pronounced as the map size increases. Furthermore, the increase in evaluation function computation time induced by this method is kept below 6% for all evaluation functions and planning algorithms. The pro-posed dynamic adaptive evaluation function demonstrates a promising approach for real-time strategy task evaluation.
Related papers
- Efficient Off-Policy Learning for High-Dimensional Action Spaces [22.129001951441015]
Existing off-policy reinforcement learning algorithms often rely on an explicit state-action-value function representation.
We present an efficient approach that utilizes only a state-value function as the critic for off-policy deep reinforcement learning.
arXiv Detail & Related papers (2024-03-07T12:45:51Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Value Summation: A Novel Scoring Function for MPC-based Model-based
Reinforcement Learning [4.473327661758546]
This paper proposes a novel scoring function for the planning module of MPC-based reinforcement learning methods.
The proposed method enhances the learning efficiency of existing MPC-based MBRL methods using the discounted sum of values.
The results demonstrate that the proposed method outperforms the current state-of-the-art algorithms in terms of learning efficiency and average reward return.
arXiv Detail & Related papers (2022-09-16T20:52:39Z) - Stabilizing Q-learning with Linear Architectures for Provably Efficient
Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation.
We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z) - Robust Value Iteration for Continuous Control Tasks [99.00362538261972]
When transferring a control policy from simulation to a physical system, the policy needs to be robust to variations in the dynamics to perform well.
We present Robust Fitted Value Iteration, which uses dynamic programming to compute the optimal value function on the compact state domain.
We show that robust value is more robust compared to deep reinforcement learning algorithm and the non-robust version of the algorithm.
arXiv Detail & Related papers (2021-05-25T19:48:35Z) - Adaptive Gradient Method with Resilience and Momentum [120.83046824742455]
We propose an Adaptive Gradient Method with Resilience and Momentum (AdaRem)
AdaRem adjusts the parameter-wise learning rate according to whether the direction of one parameter changes in the past is aligned with the direction of the current gradient.
Our method outperforms previous adaptive learning rate-based algorithms in terms of the training speed and the test error.
arXiv Detail & Related papers (2020-10-21T14:49:00Z) - Meta-Gradient Reinforcement Learning with an Objective Discovered Online [54.15180335046361]
We propose an algorithm based on meta-gradient descent that discovers its own objective, flexibly parameterised by a deep neural network.
Because the objective is discovered online, it can adapt to changes over time.
On the Atari Learning Environment, the meta-gradient algorithm adapts over time to learn with greater efficiency.
arXiv Detail & Related papers (2020-07-16T16:17:09Z) - META-Learning Eligibility Traces for More Sample Efficient Temporal
Difference Learning [2.0559497209595823]
We propose a meta-learning method for adjusting the eligibility trace parameter, in a state-dependent manner.
The adaptation is achieved with the help of auxiliary learners that learn distributional information about the update targets online.
We prove that, under some assumptions, the proposed method improves the overall quality of the update targets, by minimizing the overall target error.
arXiv Detail & Related papers (2020-06-16T03:41:07Z) - Gradient Monitored Reinforcement Learning [0.0]
We focus on the enhancement of training and evaluation performance in reinforcement learning algorithms.
We propose an approach to steer the learning in the weight parameters of a neural network based on the dynamic development and feedback from the training process itself.
arXiv Detail & Related papers (2020-05-25T13:45:47Z) - Weighted Aggregating Stochastic Gradient Descent for Parallel Deep
Learning [8.366415386275557]
Solution involves a reformation of the objective function for optimization in neural network models.
We introduce a decentralized weighted aggregating scheme based on the performance of local workers.
To validate the new method, we benchmark our schemes against several popular algorithms.
arXiv Detail & Related papers (2020-04-07T23:38:29Z) - Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension.
We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation.
These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.