TextArena
- URL: http://arxiv.org/abs/2504.11442v1
- Date: Tue, 15 Apr 2025 17:55:20 GMT
- Title: TextArena
- Authors: Leon Guertler, Bobby Cheng, Simon Yu, Bo Liu, Leshem Choshen, Cheston Tan,
- Abstract summary: TextArena is an open-source collection of competitive text-based games for training and evaluation of agentic behavior in Large Language Models (LLMs)<n>It spans 57+ unique environments (including single-player, two-player, and multi-player setups) and allows for easy evaluation of model capabilities via an online-play system.<n>TextArena emphasizes ease of adding new games, adapting the framework, testing models, playing against the models, and training models.
- Score: 13.269790016084178
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: TextArena is an open-source collection of competitive text-based games for training and evaluation of agentic behavior in Large Language Models (LLMs). It spans 57+ unique environments (including single-player, two-player, and multi-player setups) and allows for easy evaluation of model capabilities via an online-play system (against humans and other submitted models) with real-time TrueSkill scores. Traditional benchmarks rarely assess dynamic social skills such as negotiation, theory of mind, and deception, creating a gap that TextArena addresses. Designed with research, community and extensibility in mind, TextArena emphasizes ease of adding new games, adapting the framework, testing models, playing against the models, and training models. Detailed documentation of environments, games, leaderboard, and examples are available on https://github.com/LeonGuertler/TextArena and https://www.textarena.ai/.
Related papers
- AMONGAGENTS: Evaluating Large Language Models in the Interactive Text-Based Social Deduction Game [12.384945632524424]
This paper focuses on creating proxies of human behavior in simulated environments, with Among Us utilized as a tool for studying simulated human behavior.
Our work demonstrates that state-of-the-art large language models (LLMs) can effectively grasp the game rules and make decisions based on the current context.
arXiv Detail & Related papers (2024-07-23T14:34:38Z) - ScriptWorld: Text Based Environment For Learning Procedural Knowledge [2.0491741153610334]
ScriptWorld is a text-based environment for teaching agents about real-world daily chores.
We provide gaming environments for 10 daily activities and perform a detailed analysis of the proposed environment.
We leverage features obtained from pre-trained language models in the RL agents.
arXiv Detail & Related papers (2023-07-08T05:43:03Z) - Large Language Models are Diverse Role-Players for Summarization
Evaluation [82.31575622685902]
A document summary's quality can be assessed by human annotators on various criteria, both objective ones like grammar and correctness, and subjective ones like informativeness, succinctness, and appeal.
Most of the automatic evaluation methods like BLUE/ROUGE may be not able to adequately capture the above dimensions.
We propose a new evaluation framework based on LLMs, which provides a comprehensive evaluation framework by comparing generated text and reference text from both objective and subjective aspects.
arXiv Detail & Related papers (2023-03-27T10:40:59Z) - Promptable Game Models: Text-Guided Game Simulation via Masked Diffusion
Models [68.85478477006178]
We present a Promptable Game Model (PGM) for neural video game simulators.
It allows a user to play the game by prompting it with high- and low-level action sequences.
Most captivatingly, our PGM unlocks the director's mode, where the game is played by specifying goals for the agents in the form of a prompt.
Our method significantly outperforms existing neural video game simulators in terms of rendering quality and unlocks applications beyond the capabilities of the current state of the art.
arXiv Detail & Related papers (2023-03-23T17:43:17Z) - Grounding Language Models to Images for Multimodal Inputs and Outputs [89.30027812161686]
We propose an efficient method to ground pretrained text-only language models to the visual domain.
We process arbitrarily interleaved image-and-text data, and generate text interleaved with retrieved images.
arXiv Detail & Related papers (2023-01-31T18:33:44Z) - Infusing Commonsense World Models with Graph Knowledge [89.27044249858332]
We study the setting of generating narratives in an open world text adventure game.
A graph representation of the underlying game state can be used to train models that consume and output both grounded graph representations and natural language descriptions and actions.
arXiv Detail & Related papers (2023-01-13T19:58:27Z) - Immersive Text Game and Personality Classification [1.9171404264679484]
Immersive Text Game allows the player to choose a story and a character, and interact with other characters in the story in an immersive manner.
The game is based on several latest models, including text generation language model, information extraction model, commonsense reasoning model, and psychology evaluation model.
arXiv Detail & Related papers (2022-03-20T18:37:03Z) - Pre-trained Language Models as Prior Knowledge for Playing Text-based
Games [2.423547527175808]
In this paper, we improve the semantic understanding of the agent by proposing a simple RL with LM framework.
We perform a detailed study of our framework to demonstrate how our model outperforms all existing agents on the popular game, Zork1.
Our proposed approach also performs comparably to the state-of-the-art models on the other set of text games.
arXiv Detail & Related papers (2021-07-18T10:28:48Z) - Read Like Humans: Autonomous, Bidirectional and Iterative Language
Modeling for Scene Text Recognition [80.446770909975]
Linguistic knowledge is of great benefit to scene text recognition.
How to effectively model linguistic rules in end-to-end deep networks remains a research challenge.
We propose an autonomous, bidirectional and iterative ABINet for scene text recognition.
arXiv Detail & Related papers (2021-03-11T06:47:45Z) - Keep CALM and Explore: Language Models for Action Generation in
Text-based Games [27.00685301984832]
We propose the Contextual Action Language Model (CALM) to generate a compact set of action candidates at each game state.
We combine CALM with a reinforcement learning agent which re-ranks the generated action candidates to maximize in-game rewards.
arXiv Detail & Related papers (2020-10-06T17:36:29Z) - Interactive Fiction Game Playing as Multi-Paragraph Reading
Comprehension with Reinforcement Learning [94.50608198582636]
Interactive Fiction (IF) games with real human-written natural language texts provide a new natural evaluation for language understanding techniques.
We take a novel perspective of IF game solving and re-formulate it as Multi-Passage Reading (MPRC) tasks.
arXiv Detail & Related papers (2020-10-05T23:09:20Z) - Learning Dynamic Belief Graphs to Generalize on Text-Based Games [55.59741414135887]
Playing text-based games requires skills in processing natural language and sequential decision making.
In this work, we investigate how an agent can plan and generalize in text-based games using graph-structured representations learned end-to-end from raw text.
arXiv Detail & Related papers (2020-02-21T04:38:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.