Scaling Behavior Cloning Improves Causal Reasoning: An Open Model for Real-Time Video Game Playing
- URL: http://arxiv.org/abs/2601.04575v1
- Date: Thu, 08 Jan 2026 04:06:17 GMT
- Title: Scaling Behavior Cloning Improves Causal Reasoning: An Open Model for Real-Time Video Game Playing
- Authors: Yuguang Yue, Irakli Salia, Samuel Hunt, Chris Green, Wenzhe Shi, Jonathan J Hunt,
- Abstract summary: We release all data (8300+ hours of high quality human gameplay), training and inference code, and pretrained checkpoints under an open license.<n>We show that our best model is capable of playing a variety of 3D video games at a level competitive with human play.<n>We first show in a simple toy problem that, for some types of causal reasoning, increasing both the amount of training data and the depth of the network results in the model learning a more causal policy.
- Score: 2.5663091969883993
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Behavior cloning is enjoying a resurgence in popularity as scaling both model and data sizes proves to provide a strong starting point for many tasks of interest. In this work, we introduce an open recipe for training a video game playing foundation model designed for inference in realtime on a consumer GPU. We release all data (8300+ hours of high quality human gameplay), training and inference code, and pretrained checkpoints under an open license. We show that our best model is capable of playing a variety of 3D video games at a level competitive with human play. We use this recipe to systematically examine the scaling laws of behavior cloning to understand how the model's performance and causal reasoning varies with model and data scale. We first show in a simple toy problem that, for some types of causal reasoning, increasing both the amount of training data and the depth of the network results in the model learning a more causal policy. We then systematically study how causality varies with the number of parameters (and depth) and training steps in scaled models of up to 1.2 billion parameters, and we find similar scaling results to what we observe in the toy problem.
Related papers
- Learning to play: A Multimodal Agent for 3D Game-Play [2.5663091969883993]
We first describe our dataset of human game-play, collected across a large variety of 3-D first-person games.<n>We show the resulting model is capable of playing a variety of 3-D games and responding to text input.
arXiv Detail & Related papers (2025-10-19T09:45:15Z) - Look-ahead Reasoning with a Learned Model in Imperfect Information Games [3.4935179780034242]
This paper introduces an algorithm that learns an abstracted model of an imperfect information game directly from the agent-environment interaction.<n>During test time, this trained model is used to perform look-ahead reasoning.<n>We empirically demonstrate that with sufficient capacity, LAMIR learns the exact underlying game structure, and with limited capacity, it still learns a valuable abstraction.
arXiv Detail & Related papers (2025-10-06T17:26:56Z) - Do Larger Language Models Generalize Better? A Scaling Law for Implicit Reasoning at Pretraining Time [73.22651918134808]
This work shows counterintuitive effects of model size scaling and provides new insights into the relationship between scaling and reasoning in language models (LMs)<n>We pretrain LMs from scratch on a synthetic implicit multihop reasoning environment designed to replicate the structure and distribution of real-world large-scale knowledge graphs.<n>We then assess the LMs' ability to complete the missing edges in the graph, which requires multi-hop reasoning that can be viewed as a simplification of implicit reasoning during real-world pretraining.
arXiv Detail & Related papers (2025-04-04T17:57:22Z) - Scaling Inference-Efficient Language Models [3.271571137474847]
We show that model architecture affects inference latency, where models of the same size can have up to 3.5x difference in latency.<n>We modify the Chinchilla scaling laws to co-optimize the model parameter count, the number of training tokens, and the model architecture.<n>We release the Morph-1B model, which improves inference latency by 1.8x while maintaining accuracy on downstream tasks.
arXiv Detail & Related papers (2025-01-30T03:16:44Z) - How Far is Video Generation from World Model: A Physical Law Perspective [101.24278831609249]
OpenAI's Sora highlights the potential of video generation for developing world models that adhere to physical laws.<n>But the ability of video generation models to discover such laws purely from visual data without human priors can be questioned.<n>In this work, we evaluate across three key scenarios: in-distribution, out-of-distribution, and generalization.
arXiv Detail & Related papers (2024-11-04T18:53:05Z) - A Hitchhiker's Guide to Scaling Law Estimation [56.06982415792523]
Scaling laws predict the loss of a target machine learning model by extrapolating from easier-to-train models with fewer parameters or smaller training sets.<n>We estimate more than 1000 scaling laws, then derive a set of best practices for estimating scaling laws in new model families.
arXiv Detail & Related papers (2024-10-15T17:59:10Z) - A Tale of Tails: Model Collapse as a Change of Scaling Laws [11.6055501181235]
We ask: How will the scaling laws change in the inevitable regime where synthetic data makes its way into the training corpus?
We develop a theoretical framework of model collapse through the lens of scaling laws.
We discover a wide range of decay phenomena, analyzing loss of scaling, shifted scaling with number of generations, the ''un-learning" of skills, and grokking when mixing human and synthesized data.
arXiv Detail & Related papers (2024-02-10T21:06:34Z) - A Dynamical Model of Neural Scaling Laws [79.59705237659547]
We analyze a random feature model trained with gradient descent as a solvable model of network training and generalization.
Our theory shows how the gap between training and test loss can gradually build up over time due to repeated reuse of data.
arXiv Detail & Related papers (2024-02-02T01:41:38Z) - Generation of Games for Opponent Model Differentiation [2.164100958962259]
Previous results show that modeling human behavior can significantly improve the performance of the algorithms.
In this work, we use data gathered by psychologists who identified personality types that increase the likelihood of performing malicious acts.
We created a novel model that links its parameters to psychological traits.
arXiv Detail & Related papers (2023-11-28T13:45:03Z) - STAR: Sparse Transformer-based Action Recognition [61.490243467748314]
This work proposes a novel skeleton-based human action recognition model with sparse attention on the spatial dimension and segmented linear attention on the temporal dimension of data.
Experiments show that our model can achieve comparable performance while utilizing much less trainable parameters and achieve high speed in training and inference.
arXiv Detail & Related papers (2021-07-15T02:53:11Z) - Model-Based Reinforcement Learning for Atari [89.3039240303797]
We show how video prediction models can enable agents to solve Atari games with fewer interactions than model-free methods.
Our experiments evaluate SimPLe on a range of Atari games in low data regime of 100k interactions between the agent and the environment.
arXiv Detail & Related papers (2019-03-01T15:40:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.