Real-World Robot Control by Deep Active Inference With a Temporally Hierarchical World Model
- URL: http://arxiv.org/abs/2512.01924v1
- Date: Mon, 01 Dec 2025 17:41:01 GMT
- Title: Real-World Robot Control by Deep Active Inference With a Temporally Hierarchical World Model
- Authors: Kentaro Fujii, Shingo Murata,
- Abstract summary: Deep active inference is a framework that accounts for human goal-directed and exploratory actions.<n>We propose a novel deep active inference framework that consists of a world model, an action model, and an abstract world model.<n>We evaluate the framework on object-manipulation tasks with a real-world robot.
- Score: 0.7284556903703034
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robots in uncertain real-world environments must perform both goal-directed and exploratory actions. However, most deep learning-based control methods neglect exploration and struggle under uncertainty. To address this, we adopt deep active inference, a framework that accounts for human goal-directed and exploratory actions. Yet, conventional deep active inference approaches face challenges due to limited environmental representation capacity and high computational cost in action selection. We propose a novel deep active inference framework that consists of a world model, an action model, and an abstract world model. The world model encodes environmental dynamics into hidden state representations at slow and fast timescales. The action model compresses action sequences into abstract actions using vector quantization, and the abstract world model predicts future slow states conditioned on the abstract action, enabling low-cost action selection. We evaluate the framework on object-manipulation tasks with a real-world robot. Results show that it achieves high success rates across diverse manipulation tasks and switches between goal-directed and exploratory actions in uncertain settings, while making action selection computationally tractable. These findings highlight the importance of modeling multiple timescale dynamics and abstracting actions and state transitions.
Related papers
- DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos [110.98100817695307]
We introduce DreamDojo, a foundation world model that learns diverse interactions and dexterous controls from 44k hours of egocentric human videos.<n>Our work enables several important applications based on generative world models, including live teleoperation, policy evaluation, and model-based planning.
arXiv Detail & Related papers (2026-02-06T18:49:43Z) - EgoActor: Grounding Task Planning into Spatial-aware Egocentric Actions for Humanoid Robots via Visual-Language Models [31.768426199719816]
We propose EgoActing, which requires directly grounding high-level instructions into various, precise, spatially aware humanoid actions.<n>We further instantiate this task by introducing EgoActor, a unified and scalable vision-language model (VLM) that can predict locomotion primitives.<n>We leverage broad supervision over egocentric RGB-only data from real-world demonstrations, spatial reasoning question-answering, and simulated environment demonstrations.
arXiv Detail & Related papers (2026-02-04T13:04:56Z) - Generalizable Geometric Prior and Recurrent Spiking Feature Learning for Humanoid Robot Manipulation [90.90219129619344]
This paper presents a novel R-prior-S, Recurrent Geometric-priormodal Policy with Spiking features.<n>To ground high-level reasoning in physical reality, we leverage lightweight 2D geometric inductive biases.<n>For the data efficiency issue in robotic action generation, we introduce a Recursive Adaptive Spiking Network.
arXiv Detail & Related papers (2026-01-13T23:36:30Z) - Ctrl-World: A Controllable Generative World Model for Robot Manipulation [53.71061464925014]
Generalist robot policies can perform a wide range of manipulation skills.<n> evaluating and improving their ability with unfamiliar objects and instructions remains a significant challenge.<n>World models offer a promising, scalable alternative by enabling policies to rollout within imagination space.
arXiv Detail & Related papers (2025-10-11T09:13:10Z) - ExoPredicator: Learning Abstract Models of Dynamic Worlds for Robot Planning [77.49815848173613]
We propose a framework for abstract world models that jointly learns symbolic state representations and causal processes for both endogenous actions and mechanisms.<n>Across five simulated tabletop robotics environments, the learned models enable fast planning that generalizes to held-out tasks with more objects and more complex goals, outperforming a range of baselines.
arXiv Detail & Related papers (2025-09-30T13:44:34Z) - ImaginationPolicy: Towards Generalizable, Precise and Reliable End-to-End Policy for Robotic Manipulation [46.06124092071133]
We propose a novel Chain of Moving Oriented Keypoints (CoMOK) formulation for robotic manipulation.<n>Our formulation is used as the action representation of a neural policy, which can be trained in an end-to-end fashion.
arXiv Detail & Related papers (2025-09-25T07:29:07Z) - Dexplore: Scalable Neural Control for Dexterous Manipulation from Reference-Scoped Exploration [58.4036440289082]
Hand-object motion-capture (MoCap) offer large-scale, contact-rich demonstrations and hold promise for dexterous robotic scopes.<n>We introduce Dexplore, a unified single-loop optimization that performs repositories and tracking to learn robot control policies directly from MoCap at scale.
arXiv Detail & Related papers (2025-09-11T17:59:07Z) - Humanoid Agent via Embodied Chain-of-Action Reasoning with Multimodal Foundation Models for Zero-Shot Loco-Manipulation [23.43820490179566]
Humanoid-COA integrates foundation model reasoning with an Embodied Chain-of-Action mechanism for zero-shot loco-manipulation.<n>Our framework substantially outperforms prior baselines across manipulation, locomotion, and loco-manipulation tasks.
arXiv Detail & Related papers (2025-04-13T11:37:32Z) - Learning Coordinated Bimanual Manipulation Policies using State Diffusion and Inverse Dynamics Models [22.826115023573205]
We infuse the predictive nature of human manipulation strategies into robot imitation learning.<n>We train a diffusion model to predict future states and compute robot actions that achieve the predicted states.<n>Our framework consistently outperforms state-of-the-art state-to-action mapping policies.
arXiv Detail & Related papers (2025-03-30T01:25:35Z) - CAIMAN: Causal Action Influence Detection for Sample-efficient Loco-manipulation [17.94272840532448]
We present CAIMAN, a reinforcement learning framework that encourages robots to gain control over other entities in the environment.<n>We empirically demonstrate CAIMAN's superior sample efficiency and adaptability to diverse scenarios in simulation.
arXiv Detail & Related papers (2025-02-02T16:16:53Z) - Action abstractions for amortized sampling [49.384037138511246]
We propose an approach to incorporate the discovery of action abstractions, or high-level actions, into the policy optimization process.
Our approach involves iteratively extracting action subsequences commonly used across many high-reward trajectories and chunking' them into a single action that is added to the action space.
arXiv Detail & Related papers (2024-10-19T19:22:50Z) - Active Exploration for Robotic Manipulation [40.39182660794481]
This paper proposes a model-based active exploration approach that enables efficient learning in sparse-reward robotic manipulation tasks.
We evaluate our proposed algorithm in simulation and on a real robot, trained from scratch with our method.
arXiv Detail & Related papers (2022-10-23T18:07:51Z) - Curious Exploration via Structured World Models Yields Zero-Shot Object
Manipulation [19.840186443344]
We propose to use structured world models to incorporate inductive biases in the control loop to achieve sample-efficient exploration.
Our method generates free-play behavior that starts to interact with objects early on and develops more complex behavior over time.
arXiv Detail & Related papers (2022-06-22T22:08:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.