Related papers: Learning Controllable and Diverse Player Behaviors in Multi-Agent Environments

Learning Controllable and Diverse Player Behaviors in Multi-Agent Environments

URL: http://arxiv.org/abs/2512.10835v1
Date: Thu, 11 Dec 2025 17:26:24 GMT
Title: Learning Controllable and Diverse Player Behaviors in Multi-Agent Environments
Authors: Atahan Cilan, Atay Özgövde,
Abstract summary: This paper introduces a reinforcement learning framework that enables controllable and diverse player behaviors without relying on human gameplay data.<n>We define player behavior in an N-dimensional continuous space and uniformly sample target behavior vectors from a region that encompasses a subset representing real human styles.<n>A single PPO-based multi-agent policy can reproduce new or unseen play styles without retraining.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper introduces a reinforcement learning framework that enables controllable and diverse player behaviors without relying on human gameplay data. Existing approaches often require large-scale player trajectories, train separate models for different player types, or provide no direct mapping between interpretable behavioral parameters and the learned policy, limiting their scalability and controllability. We define player behavior in an N-dimensional continuous space and uniformly sample target behavior vectors from a region that encompasses the subset representing real human styles. During training, each agent receives both its current and target behavior vectors as input, and the reward is based on the normalized reduction in distance between them. This allows the policy to learn how actions influence behavioral statistics, enabling smooth control over attributes such as aggressiveness, mobility, and cooperativeness. A single PPO-based multi-agent policy can reproduce new or unseen play styles without retraining. Experiments conducted in a custom multi-player Unity game show that the proposed framework produces significantly greater behavioral diversity than a win-only baseline and reliably matches specified behavior vectors across diverse targets. The method offers a scalable solution for automated playtesting, game balancing, human-like behavior simulation, and replacing disconnected players in online games.

Related papers

Game-TARS: Pretrained Foundation Models for Scalable Generalist Multimodal Game Agents [56.25101378553328]
We present Game-TARS, a generalist game agent trained with a unified, scalable action space anchored to human-aligned keyboard-mouse inputs.<n>Game-TARS is pre-trained on over 500B tokens with diverse trajectories and multimodal data.<n> Experiments show that Game-TARS achieves about 2 times the success rate over the previous sota model on open-world Minecraft tasks.
arXiv Detail & Related papers (2025-10-27T17:43:51Z)
A Multimodal Architecture for Endpoint Position Prediction in Team-based Multiplayer Games [42.059466998190224]
This paper presents a multimodal architecture for predicting future player locations on a dynamic time horizon.<n>The architecture makes efficient use of the multimodal game state including image inputs, numerical and categorical features, as well as dynamic game data.
arXiv Detail & Related papers (2025-07-28T09:51:49Z)
Generating Personas for Games with Multimodal Adversarial Imitation Learning [47.70823327747952]
Reinforcement learning has been widely successful in producing agents capable of playing games at a human level. Going beyond reinforcement learning is necessary to model a wide range of human playstyles. This paper presents a novel imitation learning approach to generate multiple persona policies for playtesting.
arXiv Detail & Related papers (2023-08-15T06:58:19Z)
Learnable Behavior Control: Breaking Atari Human World Records via Sample-Efficient Behavior Selection [80.35510218548693]
We propose a general framework called Learnable Behavioral Control (LBC) to address the limitation.<n>Our agents have achieved 10077.52% mean human normalized score and surpassed 24 human world records within 1B training frames.
arXiv Detail & Related papers (2023-05-09T08:00:23Z)
Pick Your Battles: Interaction Graphs as Population-Level Objectives for Strategic Diversity [49.68758494467258]
We study how to construct diverse populations of agents by carefully structuring how individuals within a population interact. Our approach is based on interaction graphs, which control the flow of information between agents during training. We provide evidence for the importance of diversity in multi-agent training and analyse the effect of applying different interaction graphs on the training trajectories, diversity and performance of populations in a range of games.
arXiv Detail & Related papers (2021-10-08T11:29:52Z)
Policy Fusion for Adaptive and Customizable Reinforcement Learning Agents [137.86426963572214]
We show how to combine distinct behavioral policies to obtain a meaningful "fusion" policy. We propose four different policy fusion methods for combining pre-trained policies. We provide several practical examples and use-cases for how these methods are indeed useful for video game production and designers.
arXiv Detail & Related papers (2021-04-21T16:08:44Z)
PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD) We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z)
Opponent Learning Awareness and Modelling in Multi-Objective Normal Form Games [5.0238343960165155]
It is essential for an agent to learn about the behaviour of other agents in the system. We present the first study of the effects of such opponent modelling on multi-objective multi-agent interactions with non-linear utilities.
arXiv Detail & Related papers (2020-11-14T12:35:32Z)
Learning to Model Opponent Learning [11.61673411387596]
Multi-Agent Reinforcement Learning (MARL) considers settings in which a set of coexisting agents interact with one another and their environment. This poses a great challenge for value function-based algorithms whose convergence usually relies on the assumption of a stationary environment. We develop a novel approach to modelling an opponent's learning dynamics which we term Learning to Model Opponent Learning (LeMOL)
arXiv Detail & Related papers (2020-06-06T17:19:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.