Playing NetHack with LLMs: Potential & Limitations as Zero-Shot Agents
- URL: http://arxiv.org/abs/2403.00690v1
- Date: Fri, 1 Mar 2024 17:22:16 GMT
- Title: Playing NetHack with LLMs: Potential & Limitations as Zero-Shot Agents
- Authors: Dominik Jeurissen and Diego Perez-Liebana and Jeremy Gow and Duygu
Cakmak and James Kwan
- Abstract summary: Large Language Models (LLMs) have shown great success as high-level planners for zero-shot game-playing agents.
We present NetPlay, the first LLM-powered zero-shot agent for the challenging roguelike NetHack.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have shown great success as high-level planners
for zero-shot game-playing agents. However, these agents are primarily
evaluated on Minecraft, where long-term planning is relatively straightforward.
In contrast, agents tested in dynamic robot environments face limitations due
to simplistic environments with only a few objects and interactions. To fill
this gap in the literature, we present NetPlay, the first LLM-powered zero-shot
agent for the challenging roguelike NetHack. NetHack is a particularly
challenging environment due to its diverse set of items and monsters, complex
interactions, and many ways to die.
NetPlay uses an architecture designed for dynamic robot environments,
modified for NetHack. Like previous approaches, it prompts the LLM to choose
from predefined skills and tracks past interactions to enhance decision-making.
Given NetHack's unpredictable nature, NetPlay detects important game events to
interrupt running skills, enabling it to react to unforeseen circumstances.
While NetPlay demonstrates considerable flexibility and proficiency in
interacting with NetHack's mechanics, it struggles with ambiguous task
descriptions and a lack of explicit feedback. Our findings demonstrate that
NetPlay performs best with detailed context information, indicating the
necessity for dynamic methods in supplying context information for complex
games such as NetHack.
Related papers
- Tachikuma: Understading Complex Interactions with Multi-Character and
Novel Objects by Large Language Models [67.20964015591262]
We introduce a benchmark named Tachikuma, comprising a Multiple character and novel Object based interaction Estimation task and a supporting dataset.
The dataset captures log data from real-time communications during gameplay, providing diverse, grounded, and complex interactions for further explorations.
We present a simple prompting baseline and evaluate its performance, demonstrating its effectiveness in enhancing interaction understanding.
arXiv Detail & Related papers (2023-07-24T07:40:59Z) - Scaling Laws for Imitation Learning in Single-Agent Games [28.257046559127875]
We investigate whether carefully scaling up model and data size can bring similar improvements in the imitation learning setting for single-agent games.
We first demonstrate our findings on a variety of Atari games, and thereafter focus on the extremely challenging game of NetHack.
We find that IL loss and mean return scale smoothly with the compute budget and are strongly correlated, resulting in power laws for training compute-optimal IL agents.
arXiv Detail & Related papers (2023-07-18T16:43:03Z) - LuckyMera: a Modular AI Framework for Building Hybrid NetHack Agents [7.23273667916516]
Roguelike video games offer a good trade-off in terms of complexity of the environment and computational costs.
We present LuckyMera, a flexible, modular, generalization and AI framework built around NetHack.
LuckyMera comes with a set of off-the-shelf symbolic and neural modules (called "skills"): these modules can be either hard-coded behaviors, or neural Reinforcement Learning approaches.
arXiv Detail & Related papers (2023-07-17T14:46:59Z) - NetHack is Hard to Hack [37.24009814390211]
In the NeurIPS 2021 NetHack Challenge, symbolic agents outperformed neural approaches by over four times in median game score.
We present an extensive study on neural policy learning for NetHack.
We produce a state-of-the-art neural agent that surpasses previous fully neural policies by 127% in offline settings and 25% in online settings on median game score.
arXiv Detail & Related papers (2023-05-30T17:30:17Z) - Promptable Game Models: Text-Guided Game Simulation via Masked Diffusion
Models [68.85478477006178]
We present a Promptable Game Model (PGM) for neural video game simulators.
It allows a user to play the game by prompting it with high- and low-level action sequences.
Most captivatingly, our PGM unlocks the director's mode, where the game is played by specifying goals for the agents in the form of a prompt.
Our method significantly outperforms existing neural video game simulators in terms of rendering quality and unlocks applications beyond the capabilities of the current state of the art.
arXiv Detail & Related papers (2023-03-23T17:43:17Z) - Insights From the NeurIPS 2021 NetHack Challenge [40.52602443114554]
The first NeurIPS 2021 NetHack Challenge showcased community-driven progress in AI.
It served as a direct comparison between neural (e.g., deep RL) and symbolic AI, as well as hybrid systems.
No agent got close to winning the game, illustrating NetHack's suitability as a long-term benchmark for AI research.
arXiv Detail & Related papers (2022-03-22T17:01:07Z) - Deep Policy Networks for NPC Behaviors that Adapt to Changing Design
Parameters in Roguelike Games [137.86426963572214]
Turn-based strategy games like Roguelikes, for example, present unique challenges to Deep Reinforcement Learning (DRL)
We propose two network architectures to better handle complex categorical state spaces and to mitigate the need for retraining forced by design decisions.
arXiv Detail & Related papers (2020-12-07T08:47:25Z) - DeepCrawl: Deep Reinforcement Learning for Turn-based Strategy Games [137.86426963572214]
We introduce DeepCrawl, a fully-playable Roguelike prototype for iOS and Android in which all agents are controlled by policy networks trained using Deep Reinforcement Learning (DRL)
Our aim is to understand whether recent advances in DRL can be used to develop convincing behavioral models for non-player characters in videogames.
arXiv Detail & Related papers (2020-12-03T13:53:29Z) - The NetHack Learning Environment [79.06395964379107]
We present the NetHack Learning Environment (NLE), a procedurally generated rogue-like environment for Reinforcement Learning research.
We argue that NetHack is sufficiently complex to drive long-term research on problems such as exploration, planning, skill acquisition, and language-conditioned RL.
We demonstrate empirical success for early stages of the game using a distributed Deep RL baseline and Random Network Distillation exploration.
arXiv Detail & Related papers (2020-06-24T14:12:56Z) - Learning to Simulate Dynamic Environments with GameGAN [109.25308647431952]
In this paper, we aim to learn a simulator by simply watching an agent interact with an environment.
We introduce GameGAN, a generative model that learns to visually imitate a desired game by ingesting screenplay and keyboard actions during training.
arXiv Detail & Related papers (2020-05-25T14:10:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.