Finite Automata Extraction: Low-data World Model Learning as Programs from Gameplay Video
- URL: http://arxiv.org/abs/2508.11836v1
- Date: Fri, 15 Aug 2025 23:05:37 GMT
- Title: Finite Automata Extraction: Low-data World Model Learning as Programs from Gameplay Video
- Authors: Dave Goel, Matthew Guzdial, Anurag Sarkar,
- Abstract summary: We propose an approach that learns a neurosymbolic world model from gameplay video represented as programs in a novel domain-specific language.<n>Compared to prior world model approaches, FAE learns a more precise model of the environment and more general code than prior spatial-based approaches.
- Score: 3.213440505903066
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: World models are defined as a compressed spatial and temporal learned representation of an environment. The learned representation is typically a neural network, making transfer of the learned environment dynamics and explainability a challenge. In this paper, we propose an approach, Finite Automata Extraction (FAE), that learns a neuro-symbolic world model from gameplay video represented as programs in a novel domain-specific language (DSL): Retro Coder. Compared to prior world model approaches, FAE learns a more precise model of the environment and more general code than prior DSL-based approaches.
Related papers
- Latent Particle World Models: Self-supervised Object-centric Stochastic Dynamics Modeling [51.40150411616207]
We introduce Latent Particle World Model (LPWM), a self-supervised object-centric world model scaled to real-world multi-object datasets.<n>LPWM autonomously discovers keypoints, bounding boxes, and object masks directly from video data.<n>Our architecture is trained end-to-end purely from videos and supports flexible conditioning on actions, language, and image goals.
arXiv Detail & Related papers (2026-03-04T19:36:08Z) - Can World Models Benefit VLMs for World Dynamics? [59.73433292793044]
We investigate the capabilities when world model priors are transferred into Vision-Language Models.<n>We name our best-performing variant Dynamic Vision Aligner (DyVA)<n>We find DyVA to surpass both open-source and proprietary baselines, achieving state-of-the-art or comparable performance.
arXiv Detail & Related papers (2025-10-01T13:07:05Z) - Continual Learning for Generative AI: From LLMs to MLLMs and Beyond [56.29231194002407]
We present a comprehensive survey of continual learning methods for mainstream generative AI models.<n>We categorize these approaches into three paradigms: architecture-based, regularization-based, and replay-based.<n>We analyze continual learning setups for different generative models, including training objectives, benchmarks, and core backbones.
arXiv Detail & Related papers (2025-06-16T02:27:25Z) - PoE-World: Compositional World Modeling with Products of Programmatic Experts [41.07916209987106]
Learning how the world works is central to building AI agents that can adapt to complex environments.<n>Recent advances in program synthesis using Large Language Models (LLMs) give an alternate approach which learns world models represented as source code.<n>We show that this approach can learn complex world models from just a few observations. We evaluate the learned world models by embedding them in a model-based planning agent, demonstrating efficient performance and generalization to unseen levels on Atari's Pong and Montezuma's Revenge.
arXiv Detail & Related papers (2025-05-16T03:28:42Z) - SMILE: Infusing Spatial and Motion Semantics in Masked Video Learning [50.98341607245458]
Masked video modeling is an effective paradigm for video self-supervised learning (SSL)<n>This paper introduces a novel SSL approach for video representation learning, dubbed as SMILE, by infusing both spatial and motion semantics.<n>We establish a new self-supervised video learning paradigm capable of learning strong video representations without requiring any natural video data.
arXiv Detail & Related papers (2025-04-01T08:20:55Z) - Disentangled World Models: Learning to Transfer Semantic Knowledge from Distracting Videos for Reinforcement Learning [93.58897637077001]
This paper tries to learn and understand underlying semantic variations from distracting videos via offline-to-online latent distillation and flexible disentanglement constraints.<n>We pretrain the action-free video prediction model offline with disentanglement regularization to extract semantic knowledge from distracting videos.<n>For finetuning in the online environment, we exploit the knowledge from the pretrained model and introduce a disentanglement constraint to the world model.
arXiv Detail & Related papers (2025-03-11T13:50:22Z) - Multimodal Dreaming: A Global Workspace Approach to World Model-Based Reinforcement Learning [6.747704665696368]
In Reinforcement Learning (RL), world models aim to capture how the environment evolves in response to the agent's actions.<n>We show that performing the dreaming process inside the latent space allows for training with fewer environment steps.<n>We conclude that the combination of GW with World Models holds great potential for improving decision-making in RL agents.
arXiv Detail & Related papers (2025-02-28T15:24:17Z) - Learning to Walk from Three Minutes of Real-World Data with Semi-structured Dynamics Models [9.318262213262866]
We introduce a novel framework for learning semi-structured dynamics models for contact-rich systems.
We make accurate long-horizon predictions with substantially less data than prior methods.
We validate our approach on a real-world Unitree Go1 quadruped robot.
arXiv Detail & Related papers (2024-10-11T18:11:21Z) - One-shot World Models Using a Transformer Trained on a Synthetic Prior [37.027893127637036]
One-Shot World Model (OSWM) is a transformer world model that is learned in an in-context learning fashion from purely synthetic data.
OSWM is able to quickly adapt to the dynamics of a simple grid world, as well as the CartPole gym and a custom control environment.
arXiv Detail & Related papers (2024-09-21T09:39:32Z) - Pre-training Contextualized World Models with In-the-wild Videos for
Reinforcement Learning [54.67880602409801]
In this paper, we study the problem of pre-training world models with abundant in-the-wild videos for efficient learning of visual control tasks.
We introduce Contextualized World Models (ContextWM) that explicitly separate context and dynamics modeling.
Our experiments show that in-the-wild video pre-training equipped with ContextWM can significantly improve the sample efficiency of model-based reinforcement learning.
arXiv Detail & Related papers (2023-05-29T14:29:12Z) - Masked World Models for Visual Control [90.13638482124567]
We introduce a visual model-based RL framework that decouples visual representation learning and dynamics learning.
We demonstrate that our approach achieves state-of-the-art performance on a variety of visual robotic tasks.
arXiv Detail & Related papers (2022-06-28T18:42:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.