Related papers: Preference-based opponent shaping in differentiable games

Preference-based opponent shaping in differentiable games

URL: http://arxiv.org/abs/2412.03072v1
Date: Wed, 04 Dec 2024 06:49:21 GMT
Title: Preference-based opponent shaping in differentiable games
Authors: Xinyu Qiao, Yudong Hu, Congying Han, Weiyan Wu, Tiande Guo,
Abstract summary: We propose a novel Preference-based Opponent Shaping (PBOS) method to enhance the strategy learning process by shaping agents' preferences towards cooperation.<n>We verify the performance of PBOS algorithm in a variety of differentiable games.
Score: 3.373994463906893
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Strategy learning in game environments with multi-agent is a challenging problem. Since each agent's reward is determined by the joint strategy, a greedy learning strategy that aims to maximize its own reward may fall into a local optimum. Recent studies have proposed the opponent modeling and shaping methods for game environments. These methods enhance the efficiency of strategy learning by modeling the strategies and updating processes of other agents. However, these methods often rely on simple predictions of opponent strategy changes. Due to the lack of modeling behavioral preferences such as cooperation and competition, they are usually applicable only to predefined scenarios and lack generalization capabilities. In this paper, we propose a novel Preference-based Opponent Shaping (PBOS) method to enhance the strategy learning process by shaping agents' preferences towards cooperation. We introduce the preference parameter, which is incorporated into the agent's loss function, thus allowing the agent to directly consider the opponent's loss function when updating the strategy. We update the preference parameters concurrently with strategy learning to ensure that agents can adapt to any cooperative or competitive game environment. Through a series of experiments, we verify the performance of PBOS algorithm in a variety of differentiable games. The experimental results show that the PBOS algorithm can guide the agent to learn the appropriate preference parameters, so as to achieve better reward distribution in multiple game environments.

Related papers

PillagerBench: Benchmarking LLM-Based Agents in Competitive Minecraft Team Environments [48.892997022500765]
We introduce PillagerBench, a framework for evaluating multi-agent systems in real-time competitive team-vs-team scenarios in Minecraft.<n>We also propose TactiCrafter, an LLM-based multi-agent system that facilitates teamwork through human-readable tactics.<n>Our evaluation demonstrates that TactiCrafter outperforms baseline approaches and showcases adaptive learning through self-play.
arXiv Detail & Related papers (2025-09-07T22:51:12Z)
Style-Preserving Policy Optimization for Game Agents [0.8678250057211367]
Mixed Proximal Policy Optimization (MPPO) is designed to improve the proficiency of existing suboptimal agents while retaining their distinct styles.<n>MPPO achieves proficiency levels comparable to, or even superior to, pure online algorithms while preserving demonstrators' play styles.
arXiv Detail & Related papers (2025-06-20T13:46:06Z)
Best Response Shaping [1.0874100424278175]
LOLA and POLA agents learn reciprocity-based cooperative policies by differentiation through a few look-ahead optimization steps of their opponent. Because they consider a few optimization steps, a learning opponent that takes many steps to optimize its return may exploit them. In response, we introduce a novel approach, Best Response Shaping (BRS), which differentiates through an opponent approxing the best response.
arXiv Detail & Related papers (2024-04-05T22:03:35Z)
Fast Peer Adaptation with Context-aware Exploration [63.08444527039578]
We propose a peer identification reward for learning agents in multi-agent games. This reward motivates the agent to learn a context-aware policy for effective exploration and fast adaptation. We evaluate our method on diverse testbeds that involve competitive (Kuhn Poker), cooperative (PO-Overcooked), or mixed (Predator-Prey-W) games with peer agents.
arXiv Detail & Related papers (2024-02-04T13:02:27Z)
All by Myself: Learning Individualized Competitive Behaviour with a Contrastive Reinforcement Learning optimization [57.615269148301515]
In a competitive game scenario, a set of agents have to learn decisions that maximize their goals and minimize their adversaries' goals at the same time. We propose a novel model composed of three neural layers that learn a representation of a competitive game, learn how to map the strategy of specific opponents, and how to disrupt them. Our experiments demonstrate that our model achieves better performance when playing against offline, online, and competitive-specific models, in particular when playing against the same opponent multiple times.
arXiv Detail & Related papers (2023-10-02T08:11:07Z)
Finding mixed-strategy equilibria of continuous-action games without gradients using randomized policy networks [83.28949556413717]
We study the problem of computing an approximate Nash equilibrium of continuous-action game without access to gradients. We model players' strategies using artificial neural networks. This paper is the first to solve general continuous-action games with unrestricted mixed strategies and without any gradient information.
arXiv Detail & Related papers (2022-11-29T05:16:41Z)
Game Theoretic Rating in N-player general-sum games with Equilibria [26.166859475522106]
We propose novel algorithms suitable for N-player, general-sum rating of strategies in normal-form games according to the payoff rating system. This enables well-established solution concepts, such as equilibria, to be leveraged to efficiently rate strategies in games with complex strategic interactions.
arXiv Detail & Related papers (2022-10-05T12:33:03Z)
Mixed Strategies for Security Games with General Defending Requirements [37.02840909260615]
The Stackelberg security game is played between a defender and an attacker, where the defender needs to allocate a limited amount of resources to multiple targets. We propose an efficient close-to-optimal Patching algorithm that computes mixed strategies that use only few pure strategies.
arXiv Detail & Related papers (2022-04-26T08:56:39Z)
Conditional Imitation Learning for Multi-Agent Games [89.897635970366]
We study the problem of conditional multi-agent imitation learning, where we have access to joint trajectory demonstrations at training time. We propose a novel approach to address the difficulties of scalability and data scarcity. Our model learns a low-rank subspace over ego and partner agent strategies, then infers and adapts to a new partner strategy by interpolating in the subspace.
arXiv Detail & Related papers (2022-01-05T04:40:13Z)
Portfolio Search and Optimization for General Strategy Game-Playing [58.896302717975445]
We propose a new algorithm for optimization and action-selection based on the Rolling Horizon Evolutionary Algorithm. For the optimization of the agents' parameters and portfolio sets we study the use of the N-tuple Bandit Evolutionary Algorithm. An analysis of the agents' performance shows that the proposed algorithm generalizes well to all game-modes and is able to outperform other portfolio methods.
arXiv Detail & Related papers (2021-04-21T09:28:28Z)
On the Impossibility of Convergence of Mixed Strategies with No Regret Learning [10.515544361834241]
We study convergence properties of the mixed strategies that result from a general class of optimal no regret learning strategies. We consider the class of strategies whose information set at each step is the empirical average of the opponent's realized play.
arXiv Detail & Related papers (2020-12-03T18:02:40Z)
Learning to Model Opponent Learning [11.61673411387596]
Multi-Agent Reinforcement Learning (MARL) considers settings in which a set of coexisting agents interact with one another and their environment. This poses a great challenge for value function-based algorithms whose convergence usually relies on the assumption of a stationary environment. We develop a novel approach to modelling an opponent's learning dynamics which we term Learning to Model Opponent Learning (LeMOL)
arXiv Detail & Related papers (2020-06-06T17:19:04Z)
Efficient exploration of zero-sum stochastic games [83.28949556413717]
We investigate the increasingly important and common game-solving setting where we do not have an explicit description of the game but only oracle access to it through gameplay. During a limited-duration learning phase, the algorithm can control the actions of both players in order to try to learn the game and how to play it well. Our motivation is to quickly learn strategies that have low exploitability in situations where evaluating the payoffs of a queried strategy profile is costly.
arXiv Detail & Related papers (2020-02-24T20:30:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.