Related papers: Playing NetHack with LLMs: Potential & Limitations as Zero-Shot Agents

Playing NetHack with LLMs: Potential & Limitations as Zero-Shot Agents

URL: http://arxiv.org/abs/2403.00690v1
Date: Fri, 1 Mar 2024 17:22:16 GMT
Title: Playing NetHack with LLMs: Potential & Limitations as Zero-Shot Agents
Authors: Dominik Jeurissen and Diego Perez-Liebana and Jeremy Gow and Duygu Cakmak and James Kwan
Abstract summary: Large Language Models (LLMs) have shown great success as high-level planners for zero-shot game-playing agents. We present NetPlay, the first LLM-powered zero-shot agent for the challenging roguelike NetHack.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) have shown great success as high-level planners for zero-shot game-playing agents. However, these agents are primarily evaluated on Minecraft, where long-term planning is relatively straightforward. In contrast, agents tested in dynamic robot environments face limitations due to simplistic environments with only a few objects and interactions. To fill this gap in the literature, we present NetPlay, the first LLM-powered zero-shot agent for the challenging roguelike NetHack. NetHack is a particularly challenging environment due to its diverse set of items and monsters, complex interactions, and many ways to die. NetPlay uses an architecture designed for dynamic robot environments, modified for NetHack. Like previous approaches, it prompts the LLM to choose from predefined skills and tracks past interactions to enhance decision-making. Given NetHack's unpredictable nature, NetPlay detects important game events to interrupt running skills, enabling it to react to unforeseen circumstances. While NetPlay demonstrates considerable flexibility and proficiency in interacting with NetHack's mechanics, it struggles with ambiguous task descriptions and a lack of explicit feedback. Our findings demonstrate that NetPlay performs best with detailed context information, indicating the necessity for dynamic methods in supplying context information for complex games such as NetHack.

Related papers

Cultivating Game Sense for Yourself: Making VLMs Gaming Experts [23.370716496046217]
We propose a paradigm shift in gameplay agent design. Instead of directly controlling gameplay, VLM develops specialized execution modules tailored for tasks like shooting and combat. These modules handle real-time game interactions, elevating VLM to a high-level developer.
arXiv Detail & Related papers (2025-03-27T08:40:47Z)
Tachikuma: Understading Complex Interactions with Multi-Character and Novel Objects by Large Language Models [67.20964015591262]
We introduce a benchmark named Tachikuma, comprising a Multiple character and novel Object based interaction Estimation task and a supporting dataset. The dataset captures log data from real-time communications during gameplay, providing diverse, grounded, and complex interactions for further explorations. We present a simple prompting baseline and evaluate its performance, demonstrating its effectiveness in enhancing interaction understanding.
arXiv Detail & Related papers (2023-07-24T07:40:59Z)
Scaling Laws for Imitation Learning in Single-Agent Games [29.941613597833133]
We investigate whether carefully scaling up model and data size can bring similar improvements in the imitation learning setting for single-agent games. We first demonstrate our findings on a variety of Atari games, and thereafter focus on the extremely challenging game of NetHack. We find that IL loss and mean return scale smoothly with the compute budget and are strongly correlated, resulting in power laws for training compute-optimal IL agents.
arXiv Detail & Related papers (2023-07-18T16:43:03Z)
LuckyMera: a Modular AI Framework for Building Hybrid NetHack Agents [7.23273667916516]
Roguelike video games offer a good trade-off in terms of complexity of the environment and computational costs. We present LuckyMera, a flexible, modular, generalization and AI framework built around NetHack. LuckyMera comes with a set of off-the-shelf symbolic and neural modules (called "skills"): these modules can be either hard-coded behaviors, or neural Reinforcement Learning approaches.
arXiv Detail & Related papers (2023-07-17T14:46:59Z)
NetHack is Hard to Hack [37.24009814390211]
In the NeurIPS 2021 NetHack Challenge, symbolic agents outperformed neural approaches by over four times in median game score. We present an extensive study on neural policy learning for NetHack. We produce a state-of-the-art neural agent that surpasses previous fully neural policies by 127% in offline settings and 25% in online settings on median game score.
arXiv Detail & Related papers (2023-05-30T17:30:17Z)
Promptable Game Models: Text-Guided Game Simulation via Masked Diffusion Models [68.85478477006178]
We present a Promptable Game Model (PGM) for neural video game simulators. It allows a user to play the game by prompting it with high- and low-level action sequences. Most captivatingly, our PGM unlocks the director's mode, where the game is played by specifying goals for the agents in the form of a prompt. Our method significantly outperforms existing neural video game simulators in terms of rendering quality and unlocks applications beyond the capabilities of the current state of the art.
arXiv Detail & Related papers (2023-03-23T17:43:17Z)
Insights From the NeurIPS 2021 NetHack Challenge [40.52602443114554]
The first NeurIPS 2021 NetHack Challenge showcased community-driven progress in AI. It served as a direct comparison between neural (e.g., deep RL) and symbolic AI, as well as hybrid systems. No agent got close to winning the game, illustrating NetHack's suitability as a long-term benchmark for AI research.
arXiv Detail & Related papers (2022-03-22T17:01:07Z)
Deep Policy Networks for NPC Behaviors that Adapt to Changing Design Parameters in Roguelike Games [137.86426963572214]
Turn-based strategy games like Roguelikes, for example, present unique challenges to Deep Reinforcement Learning (DRL) We propose two network architectures to better handle complex categorical state spaces and to mitigate the need for retraining forced by design decisions.
arXiv Detail & Related papers (2020-12-07T08:47:25Z)
DeepCrawl: Deep Reinforcement Learning for Turn-based Strategy Games [137.86426963572214]
We introduce DeepCrawl, a fully-playable Roguelike prototype for iOS and Android in which all agents are controlled by policy networks trained using Deep Reinforcement Learning (DRL) Our aim is to understand whether recent advances in DRL can be used to develop convincing behavioral models for non-player characters in videogames.
arXiv Detail & Related papers (2020-12-03T13:53:29Z)
The NetHack Learning Environment [79.06395964379107]
We present the NetHack Learning Environment (NLE), a procedurally generated rogue-like environment for Reinforcement Learning research. We argue that NetHack is sufficiently complex to drive long-term research on problems such as exploration, planning, skill acquisition, and language-conditioned RL. We demonstrate empirical success for early stages of the game using a distributed Deep RL baseline and Random Network Distillation exploration.
arXiv Detail & Related papers (2020-06-24T14:12:56Z)
Learning to Simulate Dynamic Environments with GameGAN [109.25308647431952]
In this paper, we aim to learn a simulator by simply watching an agent interact with an environment. We introduce GameGAN, a generative model that learns to visually imitate a desired game by ingesting screenplay and keyboard actions during training.
arXiv Detail & Related papers (2020-05-25T14:10:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.