Efficacy of Language Model Self-Play in Non-Zero-Sum Games
- URL: http://arxiv.org/abs/2406.18872v2
- Date: Mon, 09 Dec 2024 02:45:45 GMT
- Title: Efficacy of Language Model Self-Play in Non-Zero-Sum Games
- Authors: Austen Liao, Nicholas Tomlin, Dan Klein,
- Abstract summary: We empirically investigate whether techniques like self-play can effectively be used to improve language models.
We finetune language models in self-play over multiple rounds of filtered behavior cloning in Deal or No Deal.
We find that language models improve substantially in self-play, achieving 14-17x higher scores in task reward after finetuning.
- Score: 38.644991461153275
- License:
- Abstract: Game-playing agents like AlphaGo have achieved superhuman performance through self-play, which is theoretically guaranteed to yield optimal policies in competitive games. However, most language tasks are partially or fully cooperative, so it is an open question whether techniques like self-play can effectively be used to improve language models. We empirically investigate this question in a negotiation game setting known as Deal or No Deal (DoND). Crucially, the objective in DoND can be modified to produce a fully cooperative game, a strictly competitive one, or anything in between. We finetune language models in self-play over multiple rounds of filtered behavior cloning in DoND for each of these objectives and evaluate them in self-play and in collaboration with humans. We find that language models improve substantially in self-play, achieving 14-17x higher scores in task reward after finetuning. Further, the trained models generalize to both cooperation and competition with humans, scoring 2.5-6x higher than base models. We view these results as an early promising sign for language model self-play in cooperative settings, despite a lack of theoretical guarantees.
Related papers
- Multi-agent KTO: Reinforcing Strategic Interactions of Large Language Model in Language Game [32.791648070823776]
Werewolf is a social deduction game that tests language understanding.
We develop the Multi-agent Kahneman & Tversky's Optimization (MaKTO)
MaKTO achieves a 61% average win rate across various models.
arXiv Detail & Related papers (2025-01-24T04:09:03Z) - Understanding Players as if They Are Talking to the Game in a Customized Language: A Pilot Study [3.4333699338998693]
This pilot study explores the application of language models (LMs) to model game event sequences.
We transform raw event data into textual sequences and pretraining a Longformer model on this data.
The results demonstrate the potential of self-supervised LMs in enhancing game design and personalization without relying on ground-truth labels.
arXiv Detail & Related papers (2024-10-24T09:59:10Z) - Robustifying Language Models with Test-Time Adaptation [17.96043752001886]
Large-scale language models achieved state-of-the-art performance over a number of language tasks.
They fail on adversarial language examples, which are sentences optimized to fool the language models but with similar semantic meanings for humans.
We show that we can reverse many language adversarial attacks by adapting the input sentence with predictions from masked words.
arXiv Detail & Related papers (2023-10-29T22:37:54Z) - Improving Language Model Negotiation with Self-Play and In-Context
Learning from AI Feedback [97.54519989641388]
We study whether multiple large language models (LLMs) can autonomously improve each other in a negotiation game by playing, reflecting, and criticizing.
Only a subset of the language models we consider can self-play and improve the deal price from AI feedback.
arXiv Detail & Related papers (2023-05-17T11:55:32Z) - Promptable Game Models: Text-Guided Game Simulation via Masked Diffusion
Models [68.85478477006178]
We present a Promptable Game Model (PGM) for neural video game simulators.
It allows a user to play the game by prompting it with high- and low-level action sequences.
Most captivatingly, our PGM unlocks the director's mode, where the game is played by specifying goals for the agents in the form of a prompt.
Our method significantly outperforms existing neural video game simulators in terms of rendering quality and unlocks applications beyond the capabilities of the current state of the art.
arXiv Detail & Related papers (2023-03-23T17:43:17Z) - Language Models are General-Purpose Interfaces [109.45478241369655]
We propose to use language models as a general-purpose interface to various foundation models.
A collection of pretrained encoders perceive diverse modalities (such as vision, and language)
We propose a semi-causal language modeling objective to jointly pretrain the interface and the modular encoders.
arXiv Detail & Related papers (2022-06-13T17:34:22Z) - What Language Model Architecture and Pretraining Objective Work Best for
Zero-Shot Generalization? [50.84738303888189]
We present a large-scale evaluation of modeling choices and their impact on zero-shot generalization.
We train models with over 5 billion parameters for more than 170 billion tokens.
We find that pretrained causal decoder models can be efficiently adapted into non-causal decoder models.
arXiv Detail & Related papers (2022-04-12T14:19:49Z) - The Interplay of Task Success and Dialogue Quality: An in-depth
Evaluation in Task-Oriented Visual Dialogues [6.02280861819024]
We show that in the popular end-to-end approach, this choice prevents the model from learning to generate linguistically richer dialogues.
We show that in GuessWhat, models could increase their accuracy if they learn to ground, encode, and decode also words that do not occur frequently in the training set.
arXiv Detail & Related papers (2021-03-20T10:13:30Z) - Read Like Humans: Autonomous, Bidirectional and Iterative Language
Modeling for Scene Text Recognition [80.446770909975]
Linguistic knowledge is of great benefit to scene text recognition.
How to effectively model linguistic rules in end-to-end deep networks remains a research challenge.
We propose an autonomous, bidirectional and iterative ABINet for scene text recognition.
arXiv Detail & Related papers (2021-03-11T06:47:45Z) - Keep CALM and Explore: Language Models for Action Generation in
Text-based Games [27.00685301984832]
We propose the Contextual Action Language Model (CALM) to generate a compact set of action candidates at each game state.
We combine CALM with a reinforcement learning agent which re-ranks the generated action candidates to maximize in-game rewards.
arXiv Detail & Related papers (2020-10-06T17:36:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.