Closed Drafting as a Case Study for First-Principle Interpretability,
Memory, and Generalizability in Deep Reinforcement Learning
- URL: http://arxiv.org/abs/2310.20654v3
- Date: Fri, 17 Nov 2023 17:01:26 GMT
- Title: Closed Drafting as a Case Study for First-Principle Interpretability,
Memory, and Generalizability in Deep Reinforcement Learning
- Authors: Ryan Rezai and Jason Wang
- Abstract summary: We study the interpretability, generalizability, and memory of Deep Q-Network (DQN) models playing closed drafting games.
We use a popular family of closed drafting games called "Sushi Go Party", in which we achieve state-of-the-art performance.
- Score: 3.018656336329545
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Closed drafting or "pick and pass" is a popular game mechanic where each
round players select a card or other playable element from their hand and pass
the rest to the next player. In this paper, we establish first-principle
methods for studying the interpretability, generalizability, and memory of Deep
Q-Network (DQN) models playing closed drafting games. In particular, we use a
popular family of closed drafting games called "Sushi Go Party", in which we
achieve state-of-the-art performance. We fit decision rules to interpret the
decision-making strategy of trained DRL agents by comparing them to the ranking
preferences of different types of human players. As Sushi Go Party can be
expressed as a set of closely-related games based on the set of cards in play,
we quantify the generalizability of DRL models trained on various sets of
cards, establishing a method to benchmark agent performance as a function of
environment unfamiliarity. Using the explicitly calculable memory of other
player's hands in closed drafting games, we create measures of the ability of
DRL models to learn memory.
Related papers
- All by Myself: Learning Individualized Competitive Behaviour with a
Contrastive Reinforcement Learning optimization [57.615269148301515]
In a competitive game scenario, a set of agents have to learn decisions that maximize their goals and minimize their adversaries' goals at the same time.
We propose a novel model composed of three neural layers that learn a representation of a competitive game, learn how to map the strategy of specific opponents, and how to disrupt them.
Our experiments demonstrate that our model achieves better performance when playing against offline, online, and competitive-specific models, in particular when playing against the same opponent multiple times.
arXiv Detail & Related papers (2023-10-02T08:11:07Z) - Beyond the Meta: Leveraging Game Design Parameters for Patch-Agnostic
Esport Analytics [4.1692797498685685]
Esport games comprise a sizeable fraction of the global games market, and is the fastest growing segment in games.
Compared to traditional sports, esport titles change rapidly, in terms of mechanics as well as rules.
This paper extracts information from game design (i.e. patch notes) and uses clustering techniques to propose a new form of character representation.
arXiv Detail & Related papers (2023-05-29T11:05:20Z) - SPRING: Studying the Paper and Reasoning to Play Games [102.5587155284795]
We propose a novel approach, SPRING, to read the game's original academic paper and use the knowledge learned to reason and play the game through a large language model (LLM)
In experiments, we study the quality of in-context "reasoning" induced by different forms of prompts under the setting of the Crafter open-world environment.
Our experiments suggest that LLMs, when prompted with consistent chain-of-thought, have great potential in completing sophisticated high-level trajectories.
arXiv Detail & Related papers (2023-05-24T18:14:35Z) - Learning Chess With Language Models and Transformers [0.0]
Representing a board game and its positions by text-based notation enables the possibility of NLP applications.
BERT models, first to the simple Nim game to analyze its performance in the presence of noise in a setup of a few-shot learning architecture.
Model practically learns the rules of the chess game and can survive games against Stockfish at a category-A rating level.
arXiv Detail & Related papers (2022-09-24T01:22:59Z) - Principal Trade-off Analysis [79.16635054977068]
We show "Principal Trade-off Analysis" (PTA), a decomposition method that embeds games into a low-dimensional feature space.
PTA represents an arbitrary two-player zero-sum game as the weighted sum of pairs of 2D feature planes.
We demonstrate the validity of PTA on a quartet of games (Kuhn poker, RPS+2, Blotto, and Pokemon)
arXiv Detail & Related papers (2022-06-09T18:16:28Z) - Discovering Multi-Agent Auto-Curricula in Two-Player Zero-Sum Games [31.97631243571394]
We introduce a framework, LMAC, that automates the discovery of the update rule without explicit human design.
Surprisingly, even without human design, the discovered MARL algorithms achieve competitive or even better performance.
We show that LMAC is able to generalise from small games to large games, for example training on Kuhn Poker and outperforming PSRO.
arXiv Detail & Related papers (2021-06-04T22:30:25Z) - Generating Diverse and Competitive Play-Styles for Strategy Games [58.896302717975445]
We propose Portfolio Monte Carlo Tree Search with Progressive Unpruning for playing a turn-based strategy game (Tribes)
We show how it can be parameterized so a quality-diversity algorithm (MAP-Elites) is used to achieve different play-styles while keeping a competitive level of play.
Our results show that this algorithm is capable of achieving these goals even for an extensive collection of game levels beyond those used for training.
arXiv Detail & Related papers (2021-04-17T20:33:24Z) - Markov Cricket: Using Forward and Inverse Reinforcement Learning to
Model, Predict And Optimize Batting Performance in One-Day International
Cricket [0.8122270502556374]
We model one-day international cricket games as Markov processes, applying forward and inverse Reinforcement Learning (RL) to develop three novel tools for the game.
We show that, when used as a proxy for remaining scoring resources, this approach outperforms the state-of-the-art Duckworth-Lewis-Stern method by 3 to 10 fold.
We envisage our prediction and simulation techniques may provide a fairer alternative for estimating final scores in interrupted games, while the inferred reward model may provide useful insights for the professional game to optimize playing strategy.
arXiv Detail & Related papers (2021-03-07T13:11:16Z) - Individualized Context-Aware Tensor Factorization for Online Games
Predictions [6.602875221541352]
We present the Neural Individualized Context-aware Embeddings (NICE) model for predicting user performance and game outcomes.
Our proposed method identifies individual behavioral differences in different contexts by learning latent representations of users and contexts.
Using a dataset from the MOBA game League of Legends, we demonstrate that our model substantially improves the prediction of winning outcome, individual user performance, and user engagement.
arXiv Detail & Related papers (2021-02-22T20:46:02Z) - DeepCrawl: Deep Reinforcement Learning for Turn-based Strategy Games [137.86426963572214]
We introduce DeepCrawl, a fully-playable Roguelike prototype for iOS and Android in which all agents are controlled by policy networks trained using Deep Reinforcement Learning (DRL)
Our aim is to understand whether recent advances in DRL can be used to develop convincing behavioral models for non-player characters in videogames.
arXiv Detail & Related papers (2020-12-03T13:53:29Z) - Faster Algorithms for Optimal Ex-Ante Coordinated Collusive Strategies
in Extensive-Form Zero-Sum Games [123.76716667704625]
We focus on the problem of finding an optimal strategy for a team of two players that faces an opponent in an imperfect-information zero-sum extensive-form game.
In that setting, it is known that the best the team can do is sample a profile of potentially randomized strategies (one per player) from a joint (a.k.a. correlated) probability distribution at the beginning of the game.
We provide an algorithm that computes such an optimal distribution by only using profiles where only one of the team members gets to randomize in each profile.
arXiv Detail & Related papers (2020-09-21T17:51:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.