Object-Centric World Models for Causality-Aware Reinforcement Learning
- URL: http://arxiv.org/abs/2511.14262v1
- Date: Tue, 18 Nov 2025 08:53:09 GMT
- Title: Object-Centric World Models for Causality-Aware Reinforcement Learning
- Authors: Yosuke Nishimoto, Takashi Matsubara,
- Abstract summary: We propose emph Transformer Imagination with CAusality-aware reinforcement learning (ASTICA)<n>A unified framework in which object-centric Transformers serve as the world model and causality-aware policy and value networks.<n>Experiments on object-rich benchmarks demonstrate that STICA consistently outperforms state-of-the-art agents in both sample efficiency and final performance.
- Score: 13.063093054280946
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: World models have been developed to support sample-efficient deep reinforcement learning agents. However, it remains challenging for world models to accurately replicate environments that are high-dimensional, non-stationary, and composed of multiple objects with rich interactions since most world models learn holistic representations of all environmental components. By contrast, humans perceive the environment by decomposing it into discrete objects, facilitating efficient decision-making. Motivated by this insight, we propose \emph{Slot Transformer Imagination with CAusality-aware reinforcement learning} (STICA), a unified framework in which object-centric Transformers serve as the world model and causality-aware policy and value networks. STICA represents each observation as a set of object-centric tokens, together with tokens for the agent action and the resulting reward, enabling the world model to predict token-level dynamics and interactions. The policy and value networks then estimate token-level cause--effect relations and use them in the attention layers, yielding causality-guided decision-making. Experiments on object-rich benchmarks demonstrate that STICA consistently outperforms state-of-the-art agents in both sample efficiency and final performance.
Related papers
- Continual learning and refinement of causal models through dynamic predicate invention [0.6198237241838559]
We propose a framework for constructing symbolic causal world models entirely online.<n>We leverage the power of Meta-Interpretive Learning and predicate invention to find semantically meaningful and reusable abstractions.
arXiv Detail & Related papers (2026-02-19T10:08:31Z) - Aligning Agentic World Models via Knowledgeable Experience Learning [68.85843641222186]
We introduce WorldMind, a framework that constructs a symbolic World Knowledge Repository by synthesizing environmental feedback.<n>WorldMind achieves superior performance compared to baselines with remarkable cross-model and cross-environment transferability.
arXiv Detail & Related papers (2026-01-19T17:33:31Z) - Object-Centric World Models Meet Monte Carlo Tree Search [49.12393425510251]
We introduce ObjectZero, a novel reinforcement learning (RL) algorithm that leverages the power of object-level representations to model dynamic environments.<n>Our method employs Graph Neural Networks (GNNs) to capture intricate interactions among multiple objects.<n>We trained the algorithm in a complex setting teeming with diverse, interactive objects, demonstrating its ability to effectively learn and predict object dynamics.
arXiv Detail & Related papers (2026-01-10T15:59:17Z) - From Word to World: Can Large Language Models be Implicit Text-based World Models? [82.47317196099907]
Agentic reinforcement learning increasingly relies on experience-driven scaling.<n>World models offer a potential way to improve learning efficiency through simulated experience.<n>We study whether large language models can reliably serve this role and under what conditions they meaningfully benefit agents.
arXiv Detail & Related papers (2025-12-21T17:28:42Z) - When Object-Centric World Models Meet Policy Learning: From Pixels to Policies, and Where It Breaks [24.669692812050645]
We introduce a fully unsupervised, disentangled object-centric world model that learns object-level latents directly from pixels.<n>DLPWM achieves strong reconstruction and prediction performance, including robustness to several out-of-distribution (OOD) visual variations.<n>Our results suggest that, although object-centric perception supports robust visual modeling, achieving stable control requires mitigating latent drift.
arXiv Detail & Related papers (2025-11-08T21:09:44Z) - Learning Interactive World Model for Object-Centric Reinforcement Learning [27.710001478315288]
We introduce a unified framework that learns structured representations of both objects and their interactions within a world model.<n>FIOC-WM captures environment dynamics with disentangled and modular representations of object interactions.<n>On simulated robotic and embodied-AI benchmarks, FIOC-WM improves policy-learning sample efficiency and generalization over world-model baselines.
arXiv Detail & Related papers (2025-11-04T03:35:58Z) - Dyn-O: Building Structured World Models with Object-Centric Representations [42.65409148846005]
We introduce Dyn-O, an enhanced structured world model built upon object-centric representations.<n>Compared to prior work in object-centric representations, Dyn-O improves in both learning representations and modeling dynamics.<n>We find that our method can learn object-centric world models directly from pixel observations, outperforming DreamerV3 in rollout prediction accuracy.
arXiv Detail & Related papers (2025-07-04T05:06:15Z) - World Models for Cognitive Agents: Transforming Edge Intelligence in Future Networks [55.90051810762702]
We present a comprehensive overview of world models, highlighting their architecture, training paradigms, and applications across prediction, generation, planning, and causal reasoning.<n>We propose Wireless Dreamer, a novel world model-based reinforcement learning framework tailored for wireless edge intelligence optimization.
arXiv Detail & Related papers (2025-05-31T06:43:00Z) - SPARTAN: A Sparse Transformer Learning Local Causation [63.29645501232935]
Causal structures play a central role in world models that flexibly adapt to changes in the environment.
We present the SPARse TrANsformer World model (SPARTAN), a Transformer-based world model that learns local causal structures between entities in a scene.
By applying sparsity regularisation on the attention pattern between object-factored tokens, SPARTAN identifies sparse local causal models that accurately predict future object states.
arXiv Detail & Related papers (2024-11-11T11:42:48Z) - Zero-Shot Object-Centric Representation Learning [72.43369950684057]
We study current object-centric methods through the lens of zero-shot generalization.
We introduce a benchmark comprising eight different synthetic and real-world datasets.
We find that training on diverse real-world images improves transferability to unseen scenarios.
arXiv Detail & Related papers (2024-08-17T10:37:07Z) - Relational Object-Centric Actor-Critic [44.99833362998488]
Recent works highlight that disentangled object representations can aid policy learning in image-based, object-centric reinforcement learning tasks.<n>This paper proposes a novel object-centric reinforcement learning algorithm that integrates actor-critic and model-based approaches.<n>We evaluate our method in a simulated 3D robotic environment and a 2D environment with compositional structure.
arXiv Detail & Related papers (2023-10-26T06:05:12Z) - Robust and Controllable Object-Centric Learning through Energy-based
Models [95.68748828339059]
ours is a conceptually simple and general approach to learning object-centric representations through an energy-based model.
We show that ours can be easily integrated into existing architectures and can effectively extract high-quality object-centric representations.
arXiv Detail & Related papers (2022-10-11T15:11:15Z) - Bridging the Gap to Real-World Object-Centric Learning [66.55867830853803]
We show that reconstructing features from models trained in a self-supervised manner is a sufficient training signal for object-centric representations to arise in a fully unsupervised way.
Our approach, DINOSAUR, significantly out-performs existing object-centric learning models on simulated data.
arXiv Detail & Related papers (2022-09-29T15:24:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.