Cultivating Game Sense for Yourself: Making VLMs Gaming Experts
- URL: http://arxiv.org/abs/2503.21263v1
- Date: Thu, 27 Mar 2025 08:40:47 GMT
- Title: Cultivating Game Sense for Yourself: Making VLMs Gaming Experts
- Authors: Wenxuan Lu, Jiangyang He, Zhanqiu Zhang, Yiwen Guo, Tianning Zang,
- Abstract summary: We propose a paradigm shift in gameplay agent design.<n>Instead of directly controlling gameplay, VLM develops specialized execution modules tailored for tasks like shooting and combat.<n>These modules handle real-time game interactions, elevating VLM to a high-level developer.
- Score: 23.370716496046217
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Developing agents capable of fluid gameplay in first/third-person games without API access remains a critical challenge in Artificial General Intelligence (AGI). Recent efforts leverage Vision Language Models (VLMs) as direct controllers, frequently pausing the game to analyze screens and plan action through language reasoning. However, this inefficient paradigm fundamentally restricts agents to basic and non-fluent interactions: relying on isolated VLM reasoning for each action makes it impossible to handle tasks requiring high reactivity (e.g., FPS shooting) or dynamic adaptability (e.g., ACT combat). To handle this, we propose a paradigm shift in gameplay agent design: instead of directly controlling gameplay, VLM develops specialized execution modules tailored for tasks like shooting and combat. These modules handle real-time game interactions, elevating VLM to a high-level developer. Building upon this paradigm, we introduce GameSense, a gameplay agent framework where VLM develops task-specific game sense modules by observing task execution and leveraging vision tools and neural network training pipelines. These modules encapsulate action-feedback logic, ranging from direct action rules to neural network-based decisions. Experiments demonstrate that our framework is the first to achieve fluent gameplay in diverse genres, including ACT, FPS, and Flappy Bird, setting a new benchmark for game-playing agents.
Related papers
- Agents Play Thousands of 3D Video Games [26.290863972751428]
We present PORTAL, a novel framework for developing artificial intelligence agents capable of playing thousands of 3D video games.
By transforming decision-making problems into language modeling tasks, our approach leverages large language models (LLMs) to generate behavior trees.
Our framework introduces a hybrid policy structure that combines rule-based nodes with neural network components, enabling both high-level strategic reasoning and precise low-level control.
arXiv Detail & Related papers (2025-03-17T16:42:34Z) - AVA: Attentive VLM Agent for Mastering StarCraft II [56.07921367623274]
We introduce Attentive VLM Agent (AVA), a multimodal StarCraft II agent that aligns artificial agent perception with the human gameplay experience.<n>Our agent addresses this limitation by incorporating RGB visual inputs and natural language observations that more closely simulate human cognitive processes during gameplay.
arXiv Detail & Related papers (2025-03-07T12:54:25Z) - DVM: Towards Controllable LLM Agents in Social Deduction Games [16.826397707182963]
Large Language Models (LLMs) have advanced the capability of game agents in social deduction games (SDGs)<n>We present DVM, a novel framework for developing controllable LLM agents for SDGs.<n>We demonstrate DVM's implementation on one of the most popular SDGs, Werewolf.
arXiv Detail & Related papers (2025-01-12T03:11:20Z) - Can VLMs Play Action Role-Playing Games? Take Black Myth Wukong as a Study Case [20.14197375326218]
This research aims to provide new insights and directions for applying multimodal agents in complex action game environments.
We select an ARPG, Black Myth: Wukong'', as a research platform to explore the capability boundaries of existing vision language models.
We will release a human operation dataset containing recorded gameplay videos and operation logs, including mouse and keyboard actions.
arXiv Detail & Related papers (2024-09-19T16:30:25Z) - Scaling Instructable Agents Across Many Simulated Worlds [70.97268311053328]
Our goal is to develop an agent that can accomplish anything a human can do in any simulated 3D environment.
Our approach focuses on language-driven generality while imposing minimal assumptions.
Our agents interact with environments in real-time using a generic, human-like interface.
arXiv Detail & Related papers (2024-03-13T17:50:32Z) - DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent) [73.10899129264375]
This paper explores DoraemonGPT, a comprehensive and conceptually elegant system driven by LLMs to understand dynamic scenes.
Given a video with a question/task, DoraemonGPT begins by converting the input video into a symbolic memory that stores task-related attributes.
We extensively evaluate DoraemonGPT's effectiveness on three benchmarks and several in-the-wild scenarios.
arXiv Detail & Related papers (2024-01-16T14:33:09Z) - Promptable Game Models: Text-Guided Game Simulation via Masked Diffusion
Models [68.85478477006178]
We present a Promptable Game Model (PGM) for neural video game simulators.
It allows a user to play the game by prompting it with high- and low-level action sequences.
Most captivatingly, our PGM unlocks the director's mode, where the game is played by specifying goals for the agents in the form of a prompt.
Our method significantly outperforms existing neural video game simulators in terms of rendering quality and unlocks applications beyond the capabilities of the current state of the art.
arXiv Detail & Related papers (2023-03-23T17:43:17Z) - Learning to Simulate Dynamic Environments with GameGAN [109.25308647431952]
In this paper, we aim to learn a simulator by simply watching an agent interact with an environment.
We introduce GameGAN, a generative model that learns to visually imitate a desired game by ingesting screenplay and keyboard actions during training.
arXiv Detail & Related papers (2020-05-25T14:10:17Z) - Neural MMO v1.3: A Massively Multiagent Game Environment for Training
and Evaluating Neural Networks [48.5733173329785]
We present Neural MMO, a massively multiagent game environment inspired by MMOs.
We discuss our progress on two more general challenges in multiagent systems engineering for AI research: distributed infrastructure and game IO.
arXiv Detail & Related papers (2020-01-31T18:50:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.