Offline Fictitious Self-Play for Competitive Games
- URL: http://arxiv.org/abs/2403.00841v1
- Date: Thu, 29 Feb 2024 11:36:48 GMT
- Title: Offline Fictitious Self-Play for Competitive Games
- Authors: Jingxiao Chen, Weiji Xie, Weinan Zhang, Yong yu, Ying Wen
- Abstract summary: Off-FSP is the first model-free offline RL algorithm for competitive games.
This paper introduces Off-FSP, the first practical model-free offline RL algorithm for competitive games.
- Score: 34.445740191223614
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Offline Reinforcement Learning (RL) has received significant interest due to
its ability to improve policies in previously collected datasets without online
interactions. Despite its success in the single-agent setting, offline
multi-agent RL remains a challenge, especially in competitive games. Firstly,
unaware of the game structure, it is impossible to interact with the opponents
and conduct a major learning paradigm, self-play, for competitive games.
Secondly, real-world datasets cannot cover all the state and action space in
the game, resulting in barriers to identifying Nash equilibrium (NE). To
address these issues, this paper introduces Off-FSP, the first practical
model-free offline RL algorithm for competitive games. We start by simulating
interactions with various opponents by adjusting the weights of the fixed
dataset with importance sampling. This technique allows us to learn best
responses to different opponents and employ the Offline Self-Play learning
framework. In this framework, we further implement Fictitious Self-Play (FSP)
to approximate NE. In partially covered real-world datasets, our methods show
the potential to approach NE by incorporating any single-agent offline RL
method. Experimental results in Leduc Hold'em Poker show that our method
significantly improves performances compared with state-of-the-art baselines.
Related papers
- Hokoff: Real Game Dataset from Honor of Kings and its Offline Reinforcement Learning Benchmarks [59.50879251101105]
We propose Hokoff, a comprehensive set of pre-collected datasets that covers offline RL and offline MARL.
This data is derived from Honor of Kings, a recognized Multiplayer Online Battle Arena (MOBA) game.
We also introduce a novel baseline algorithm tailored for the inherent hierarchical action space of the game.
arXiv Detail & Related papers (2024-08-20T05:38:50Z) - In-Context Exploiter for Extensive-Form Games [38.24471816329584]
We introduce a novel method, In-Context Exploiter (ICE), to train a single model that can act as any player in the game and adaptively exploit opponents entirely by in-context learning.
Our ICE algorithm involves generating diverse opponent strategies, collecting interactive history data by a reinforcement learning algorithm, and training a transformer-based agent within a well-designed curriculum learning framework.
arXiv Detail & Related papers (2024-08-10T14:59:09Z) - SEABO: A Simple Search-Based Method for Offline Imitation Learning [57.2723889718596]
offline reinforcement learning (RL) has attracted much attention due to its ability in learning from static offline datasets.
We propose a simple yet effective search-based offline IL method, tagged SEABO.
We show that SEABO can achieve competitive performance to offline RL algorithms with ground-truth rewards, given only a single expert trajectory.
arXiv Detail & Related papers (2024-02-06T08:48:01Z) - Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning [93.99377042564919]
This paper tries to build more flexible constraints for value estimation without impeding the exploration of potential advantages.
The key idea is to leverage off-the-shelf RL simulators, which can be easily interacted with in an online manner, as the "test bed" for offline policies.
We introduce CoWorld, a model-based RL approach that mitigates cross-domain discrepancies in state and reward spaces.
arXiv Detail & Related papers (2023-05-24T15:45:35Z) - RvS: What is Essential for Offline RL via Supervised Learning? [77.91045677562802]
Recent work has shown that supervised learning alone, without temporal difference (TD) learning, can be remarkably effective for offline RL.
In every environment suite we consider simply maximizing likelihood with two-layer feedforward is competitive.
They also probe the limits of existing RvS methods, which are comparatively weak on random data.
arXiv Detail & Related papers (2021-12-20T18:55:16Z) - Offline Reinforcement Learning Hands-On [60.36729294485601]
offline RL aims to turn large datasets into powerful decision-making engines without any online interactions with the environment.
This work aims to reflect upon these efforts from a practitioner viewpoint.
We experimentally validate that diversity and high-return examples in the data are crucial to the success of offline RL.
arXiv Detail & Related papers (2020-11-29T14:45:02Z) - Learning to Play No-Press Diplomacy with Best Response Policy Iteration [31.367850729299665]
We apply deep reinforcement learning methods to Diplomacy, a 7-player board game.
We show that our agents convincingly outperform the previous state-of-the-art, and game theoretic equilibrium analysis shows that the new process yields consistent improvements.
arXiv Detail & Related papers (2020-06-08T14:33:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.