Related papers: NitroGen: An Open Foundation Model for Generalist Gaming Agents

NitroGen: An Open Foundation Model for Generalist Gaming Agents

URL: http://arxiv.org/abs/2601.02427v1
Date: Sun, 04 Jan 2026 16:24:50 GMT
Title: NitroGen: An Open Foundation Model for Generalist Gaming Agents
Authors: Loïc Magne, Anas Awadalla, Guanzhi Wang, Yinzhen Xu, Joshua Belofsky, Fengyuan Hu, Joohwan Kim, Ludwig Schmidt, Georgia Gkioxari, Jan Kautz, Yisong Yue, Yejin Choi, Yuke Zhu, Linxi "Jim" Fan,
Abstract summary: NitroGen is a vision-action foundation model for generalist gaming agents.<n>It is trained on 40,000 hours of gameplay videos across more than 1,000 games.
Score: 101.41866522979548
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: We introduce NitroGen, a vision-action foundation model for generalist gaming agents that is trained on 40,000 hours of gameplay videos across more than 1,000 games. We incorporate three key ingredients: 1) an internet-scale video-action dataset constructed by automatically extracting player actions from publicly available gameplay videos, 2) a multi-game benchmark environment that can measure cross-game generalization, and 3) a unified vision-action model trained with large-scale behavior cloning. NitroGen exhibits strong competence across diverse domains, including combat encounters in 3D action games, high-precision control in 2D platformers, and exploration in procedurally generated worlds. It transfers effectively to unseen games, achieving up to 52% relative improvement in task success rates over models trained from scratch. We release the dataset, evaluation suite, and model weights to advance research on generalist embodied agents.

Related papers

Game-TARS: Pretrained Foundation Models for Scalable Generalist Multimodal Game Agents [56.25101378553328]
We present Game-TARS, a generalist game agent trained with a unified, scalable action space anchored to human-aligned keyboard-mouse inputs.<n>Game-TARS is pre-trained on over 500B tokens with diverse trajectories and multimodal data.<n> Experiments show that Game-TARS achieves about 2 times the success rate over the previous sota model on open-world Minecraft tasks.
arXiv Detail & Related papers (2025-10-27T17:43:51Z)
Learning to play: A Multimodal Agent for 3D Game-Play [2.5663091969883993]
We first describe our dataset of human game-play, collected across a large variety of 3-D first-person games.<n>We show the resulting model is capable of playing a variety of 3-D games and responding to text input.
arXiv Detail & Related papers (2025-10-19T09:45:15Z)
Pixels to Play: A Foundation Model for 3D Gameplay [4.380638021267298]
We introduce Pixels2Play-0.1 (P2P0.1), a foundation model that learns to play a wide range of 3D video games with recognizable human-like behavior.
arXiv Detail & Related papers (2025-08-19T22:24:50Z)
GameFactory: Creating New Games with Generative Interactive Videos [50.368593726912856]
Generative videos have the potential to revolutionize game development by autonomously creating new content.<n>We present GameFactory, a framework for action-controlled scene-generalizable game video generation.<n> Experimental results demonstrate that GameFactory effectively generates open-domain action-controllable game videos.
arXiv Detail & Related papers (2025-01-14T18:57:21Z)
GameGen-X: Interactive Open-world Game Video Generation [10.001128258269675]
We introduce GameGen-X, the first diffusion transformer model specifically designed for both generating and interactively controlling open-world game videos.<n>It simulates an array of game engine features, such as innovative characters, dynamic environments, complex actions, and diverse events.<n>It provides interactive controllability, predicting and future altering content based on the current clip, thus allowing for gameplay simulation.
arXiv Detail & Related papers (2024-11-01T17:59:17Z)
Promptable Game Models: Text-Guided Game Simulation via Masked Diffusion Models [68.85478477006178]
We present a Promptable Game Model (PGM) for neural video game simulators. It allows a user to play the game by prompting it with high- and low-level action sequences. Most captivatingly, our PGM unlocks the director's mode, where the game is played by specifying goals for the agents in the form of a prompt. Our method significantly outperforms existing neural video game simulators in terms of rendering quality and unlocks applications beyond the capabilities of the current state of the art.
arXiv Detail & Related papers (2023-03-23T17:43:17Z)
Multi-Game Decision Transformers [49.257185338595434]
We show that a single transformer-based model can play a suite of up to 46 Atari games simultaneously at close-to-human performance. We compare several approaches in this multi-game setting, such as online and offline RL methods and behavioral cloning. We find that our Multi-Game Decision Transformer models offer the best scalability and performance.
arXiv Detail & Related papers (2022-05-30T16:55:38Z)
Neural MMO v1.3: A Massively Multiagent Game Environment for Training and Evaluating Neural Networks [48.5733173329785]
We present Neural MMO, a massively multiagent game environment inspired by MMOs. We discuss our progress on two more general challenges in multiagent systems engineering for AI research: distributed infrastructure and game IO.
arXiv Detail & Related papers (2020-01-31T18:50:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.