Object-Centric World Models Meet Monte Carlo Tree Search
- URL: http://arxiv.org/abs/2601.06604v1
- Date: Sat, 10 Jan 2026 15:59:17 GMT
- Title: Object-Centric World Models Meet Monte Carlo Tree Search
- Authors: Rodion Vakhitov, Leonid Ugadiarov, Aleksandr Panov,
- Abstract summary: We introduce ObjectZero, a novel reinforcement learning (RL) algorithm that leverages the power of object-level representations to model dynamic environments.<n>Our method employs Graph Neural Networks (GNNs) to capture intricate interactions among multiple objects.<n>We trained the algorithm in a complex setting teeming with diverse, interactive objects, demonstrating its ability to effectively learn and predict object dynamics.
- Score: 49.12393425510251
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we introduce ObjectZero, a novel reinforcement learning (RL) algorithm that leverages the power of object-level representations to model dynamic environments more effectively. Unlike traditional approaches that process the world as a single undifferentiated input, our method employs Graph Neural Networks (GNNs) to capture intricate interactions among multiple objects. These objects, which can be manipulated and interact with each other, serve as the foundation for our model's understanding of the environment. We trained the algorithm in a complex setting teeming with diverse, interactive objects, demonstrating its ability to effectively learn and predict object dynamics. Our results highlight that a structured world model operating on object-centric representations can be successfully integrated into a model-based RL algorithm utilizing Monte Carlo Tree Search as a planning module.
Related papers
- Learning Interactive World Model for Object-Centric Reinforcement Learning [27.710001478315288]
We introduce a unified framework that learns structured representations of both objects and their interactions within a world model.<n>FIOC-WM captures environment dynamics with disentangled and modular representations of object interactions.<n>On simulated robotic and embodied-AI benchmarks, FIOC-WM improves policy-learning sample efficiency and generalization over world-model baselines.
arXiv Detail & Related papers (2025-11-04T03:35:58Z) - SOLD: Slot Object-Centric Latent Dynamics Models for Relational Manipulation Learning from Pixels [16.020835290802548]
Slot-Attention for Object-centric Latent Dynamics is a novel model-based reinforcement learning algorithm.<n>It learns object-centric dynamics models in an unsupervised manner from pixel inputs.<n>We demonstrate that the structured latent space not only improves model interpretability but also provides a valuable input space for behavior models to reason over.
arXiv Detail & Related papers (2024-10-11T14:03:31Z) - Efficient Exploration and Discriminative World Model Learning with an Object-Centric Abstraction [19.59151245929067]
We study whether giving an agent an object-centric mapping (describing a set of items and their attributes) allow for more efficient learning.<n>We find this problem is best solved hierarchically by modelling items at a higher level of state abstraction to pixels, and attribute change at a higher level of temporal abstraction to primitive actions.<n>We propose a fully model-based algorithm that learns a discriminative world model, plans to explore efficiently with only a count-based intrinsic reward, and can subsequently plan to reach any discovered (abstract) states.
arXiv Detail & Related papers (2024-08-21T17:59:31Z) - Zero-Shot Object-Centric Representation Learning [72.43369950684057]
We study current object-centric methods through the lens of zero-shot generalization.
We introduce a benchmark comprising eight different synthetic and real-world datasets.
We find that training on diverse real-world images improves transferability to unseen scenarios.
arXiv Detail & Related papers (2024-08-17T10:37:07Z) - Slot Structured World Models [0.0]
State-of-the-art approaches use a feedforward encoder to extract object embeddings and a latent graph neural network to model the interaction between these object embeddings.
We introduce it Slot Structured World Models (SSWM), a class of world models that combines an it object-centric encoder with a latent graph-based dynamics model.
arXiv Detail & Related papers (2024-01-08T21:19:30Z) - Detecting Any Human-Object Interaction Relationship: Universal HOI
Detector with Spatial Prompt Learning on Foundation Models [55.20626448358655]
This study explores the universal interaction recognition in an open-world setting through the use of Vision-Language (VL) foundation models and large language models (LLMs)
Our design includes an HO Prompt-guided Decoder (HOPD), facilitates the association of high-level relation representations in the foundation model with various HO pairs within the image.
For open-category interaction recognition, our method supports either of two input types: interaction phrase or interpretive sentence.
arXiv Detail & Related papers (2023-11-07T08:27:32Z) - Relational Object-Centric Actor-Critic [44.99833362998488]
Recent works highlight that disentangled object representations can aid policy learning in image-based, object-centric reinforcement learning tasks.<n>This paper proposes a novel object-centric reinforcement learning algorithm that integrates actor-critic and model-based approaches.<n>We evaluate our method in a simulated 3D robotic environment and a 2D environment with compositional structure.
arXiv Detail & Related papers (2023-10-26T06:05:12Z) - Continual Visual Reinforcement Learning with A Life-Long World Model [55.05017177980985]
We present a new continual learning approach for visual dynamics modeling.<n>We first introduce the life-long world model, which learns task-specific latent dynamics.<n>Then, we address the value estimation challenge for previous tasks with the exploratory-conservative behavior learning approach.
arXiv Detail & Related papers (2023-03-12T05:08:03Z) - Bridging the Gap to Real-World Object-Centric Learning [66.55867830853803]
We show that reconstructing features from models trained in a self-supervised manner is a sufficient training signal for object-centric representations to arise in a fully unsupervised way.
Our approach, DINOSAUR, significantly out-performs existing object-centric learning models on simulated data.
arXiv Detail & Related papers (2022-09-29T15:24:47Z) - Model-free Representation Learning and Exploration in Low-rank MDPs [64.72023662543363]
We present the first model-free representation learning algorithms for low rank MDPs.
Key algorithmic contribution is a new minimax representation learning objective.
Result can accommodate general function approximation to scale to complex environments.
arXiv Detail & Related papers (2021-02-14T00:06:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.