Style-Preserving Policy Optimization for Game Agents
- URL: http://arxiv.org/abs/2506.16995v2
- Date: Thu, 07 Aug 2025 16:00:32 GMT
- Title: Style-Preserving Policy Optimization for Game Agents
- Authors: Lingfeng Li, Yunlong Lu, Yongyi Wang, Wenxin Li,
- Abstract summary: Mixed Proximal Policy Optimization (MPPO) is designed to improve the proficiency of existing suboptimal agents while retaining their distinct styles.<n>MPPO achieves proficiency levels comparable to, or even superior to, pure online algorithms while preserving demonstrators' play styles.
- Score: 0.8678250057211367
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Proficient game agents with diverse play styles enrich the gaming experience and enhance the replay value of games. However, recent advancements in game AI based on reinforcement learning have predominantly focused on improving proficiency, whereas methods based on evolution algorithms generate agents with diverse play styles but exhibit subpar performance compared to RL methods. To address this gap, this paper proposes Mixed Proximal Policy Optimization (MPPO), a method designed to improve the proficiency of existing suboptimal agents while retaining their distinct styles. MPPO unifies loss objectives for both online and offline samples and introduces an implicit constraint to approximate demonstrator policies by adjusting the empirical distribution of samples. Empirical results across environments of varying scales demonstrate that MPPO achieves proficiency levels comparable to, or even superior to, pure online algorithms while preserving demonstrators' play styles. This work presents an effective approach for generating highly proficient and diverse game agents, ultimately contributing to more engaging gameplay experiences.
Related papers
- Mxplainer: Explain and Learn Insights by Imitating Mahjong Agents [0.8088999193162028]
This paper introduces Mxplainer, a parameterized search algorithm that can be converted into an equivalent neural network to learn the parameters of black-box agents.<n>Experiments conducted on AI and human player data demonstrate that the learned parameters provide human-understandable insights into these agents' characteristics and play styles.
arXiv Detail & Related papers (2025-06-17T07:07:13Z) - Preference-based opponent shaping in differentiable games [3.373994463906893]
We propose a novel Preference-based Opponent Shaping (PBOS) method to enhance the strategy learning process by shaping agents' preferences towards cooperation.<n>We verify the performance of PBOS algorithm in a variety of differentiable games.
arXiv Detail & Related papers (2024-12-04T06:49:21Z) - From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.<n>We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - Toward Optimal LLM Alignments Using Two-Player Games [86.39338084862324]
In this paper, we investigate alignment through the lens of two-agent games, involving iterative interactions between an adversarial and a defensive agent.
We theoretically demonstrate that this iterative reinforcement learning optimization converges to a Nash Equilibrium for the game induced by the agents.
Experimental results in safety scenarios demonstrate that learning in such a competitive environment not only fully trains agents but also leads to policies with enhanced generalization capabilities for both adversarial and defensive agents.
arXiv Detail & Related papers (2024-06-16T15:24:50Z) - Reinforcement Learning for High-Level Strategic Control in Tower Defense Games [47.618236610219554]
In strategy games, one of the most important aspects of game design is maintaining a sense of challenge for players.
We propose an automated approach that combines traditional scripted methods with reinforcement learning.
Results show that combining a learned approach, such as reinforcement learning, with a scripted AI produces a higher-performing and more robust agent than using only AI.
arXiv Detail & Related papers (2024-06-12T08:06:31Z) - Best Response Shaping [1.0874100424278175]
LOLA and POLA agents learn reciprocity-based cooperative policies by differentiation through a few look-ahead optimization steps of their opponent.
Because they consider a few optimization steps, a learning opponent that takes many steps to optimize its return may exploit them.
In response, we introduce a novel approach, Best Response Shaping (BRS), which differentiates through an opponent approxing the best response.
arXiv Detail & Related papers (2024-04-05T22:03:35Z) - DanZero+: Dominating the GuanDan Game through Reinforcement Learning [95.90682269990705]
We develop an AI program for an exceptionally complex and popular card game called GuanDan.
We first put forward an AI program named DanZero for this game.
In order to further enhance the AI's capabilities, we apply policy-based reinforcement learning algorithm to GuanDan.
arXiv Detail & Related papers (2023-12-05T08:07:32Z) - Learning Diverse Risk Preferences in Population-based Self-play [23.07952140353786]
Current self-play algorithms optimize the agent to maximize expected win-rates against its current or historical copies.
We introduce diversity from the perspective that agents could have diverse risk preferences in the face of uncertainty.
We show that our method achieves comparable or superior performance in competitive games.
arXiv Detail & Related papers (2023-05-19T06:56:02Z) - Building a 3-Player Mahjong AI using Deep Reinforcement Learning [9.603486077267693]
We present Meowjong, an AI for Sanma using deep reinforcement learning.
Meowjong's models achieve test accuracies comparable with AIs for 4-player Mahjong.
Being the first ever AI in Sanma, we claim that Meowjong stands as a state-of-the-art in this game.
arXiv Detail & Related papers (2022-02-25T17:41:43Z) - A Ranking Game for Imitation Learning [22.028680861819215]
We treat imitation as a two-player ranking-based Stackelberg game between a $textitpolicy$ and a $textitreward$ function.
This game encompasses a large subset of both inverse reinforcement learning (IRL) methods and methods which learn from offline preferences.
We theoretically analyze the requirements of the loss function used for ranking policy performances to facilitate near-optimal imitation learning at equilibrium.
arXiv Detail & Related papers (2022-02-07T19:38:22Z) - Learning Meta Representations for Agents in Multi-Agent Reinforcement
Learning [12.170248966278281]
In multi-agent reinforcement learning, behaviors that agents learn in a single Markov Game (MG) are typically confined to the given agent number.
In this work, our focus is on creating agents that can generalize across population-varying MGs.
Instead of learning a unimodal policy, each agent learns a policy set comprising effective strategies across a variety of games.
arXiv Detail & Related papers (2021-08-30T04:30:53Z) - Semi-On-Policy Training for Sample Efficient Multi-Agent Policy
Gradients [51.749831824106046]
We introduce semi-on-policy (SOP) training as an effective and computationally efficient way to address the sample inefficiency of on-policy policy gradient methods.
We show that our methods perform as well or better than state-of-the-art value-based methods on a variety of SMAC tasks.
arXiv Detail & Related papers (2021-04-27T19:37:01Z) - Policy Fusion for Adaptive and Customizable Reinforcement Learning
Agents [137.86426963572214]
We show how to combine distinct behavioral policies to obtain a meaningful "fusion" policy.
We propose four different policy fusion methods for combining pre-trained policies.
We provide several practical examples and use-cases for how these methods are indeed useful for video game production and designers.
arXiv Detail & Related papers (2021-04-21T16:08:44Z) - Portfolio Search and Optimization for General Strategy Game-Playing [58.896302717975445]
We propose a new algorithm for optimization and action-selection based on the Rolling Horizon Evolutionary Algorithm.
For the optimization of the agents' parameters and portfolio sets we study the use of the N-tuple Bandit Evolutionary Algorithm.
An analysis of the agents' performance shows that the proposed algorithm generalizes well to all game-modes and is able to outperform other portfolio methods.
arXiv Detail & Related papers (2021-04-21T09:28:28Z) - Generating Diverse and Competitive Play-Styles for Strategy Games [58.896302717975445]
We propose Portfolio Monte Carlo Tree Search with Progressive Unpruning for playing a turn-based strategy game (Tribes)
We show how it can be parameterized so a quality-diversity algorithm (MAP-Elites) is used to achieve different play-styles while keeping a competitive level of play.
Our results show that this algorithm is capable of achieving these goals even for an extensive collection of game levels beyond those used for training.
arXiv Detail & Related papers (2021-04-17T20:33:24Z) - On the Power of Refined Skat Selection [1.3706331473063877]
Skat is a fascinating card game, show-casing many of the intrinsic challenges for modern AI systems.
We propose hard expert-rules and a scoring function based on refined skat evaluation features.
Experiments emphasize the impact of the refined skat putting algorithm on the playing performance of the bots.
arXiv Detail & Related papers (2021-04-07T08:54:58Z) - Forgetful Experience Replay in Hierarchical Reinforcement Learning from
Demonstrations [55.41644538483948]
In this paper, we propose a combination of approaches that allow the agent to use low-quality demonstrations in complex vision-based environments.
Our proposed goal-oriented structuring of replay buffer allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations.
The solution based on our algorithm beats all the solutions for the famous MineRL competition and allows the agent to mine a diamond in the Minecraft environment.
arXiv Detail & Related papers (2020-06-17T15:38:40Z) - Enhanced Rolling Horizon Evolution Algorithm with Opponent Model
Learning: Results for the Fighting Game AI Competition [9.75720700239984]
We propose a novel algorithm that combines Rolling Horizon Evolution Algorithm (RHEA) with opponent model learning.
Our proposed bot with the policy-gradient-based opponent model is the only one without using Monte-Carlo Tree Search (MCTS) among top five bots in the 2019 competition.
arXiv Detail & Related papers (2020-03-31T04:44:33Z) - Suphx: Mastering Mahjong with Deep Reinforcement Learning [114.68233321904623]
We design an AI for Mahjong, named Suphx, based on deep reinforcement learning with some newly introduced techniques.
Suphx has demonstrated stronger performance than most top human players in terms of stable rank.
This is the first time that a computer program outperforms most top human players in Mahjong.
arXiv Detail & Related papers (2020-03-30T16:18:16Z) - Efficient exploration of zero-sum stochastic games [83.28949556413717]
We investigate the increasingly important and common game-solving setting where we do not have an explicit description of the game but only oracle access to it through gameplay.
During a limited-duration learning phase, the algorithm can control the actions of both players in order to try to learn the game and how to play it well.
Our motivation is to quickly learn strategies that have low exploitability in situations where evaluating the payoffs of a queried strategy profile is costly.
arXiv Detail & Related papers (2020-02-24T20:30:38Z) - Provable Self-Play Algorithms for Competitive Reinforcement Learning [48.12602400021397]
We study self-play in competitive reinforcement learning under the setting of Markov games.
We show that a self-play algorithm achieves regret $tildemathcalO(sqrtT)$ after playing $T$ steps of the game.
We also introduce an explore-then-exploit style algorithm, which achieves a slightly worse regret $tildemathcalO(T2/3)$, but is guaranteed to run in time even in the worst case.
arXiv Detail & Related papers (2020-02-10T18:44:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.