A Step Toward World Models: A Survey on Robotic Manipulation
- URL: http://arxiv.org/abs/2511.02097v2
- Date: Mon, 10 Nov 2025 03:45:44 GMT
- Title: A Step Toward World Models: A Survey on Robotic Manipulation
- Authors: Peng-Fei Zhang, Ying Cheng, Xiaofan Sun, Shijie Wang, Fengling Li, Lei Zhu, Heng Tao Shen,
- Abstract summary: We look at approaches that exhibit the core capabilities of world models through a review of methods in robotic manipulation.<n>We analyze their roles across perception, prediction, and control, identify key challenges and solutions, and distill the core components, capabilities, and functions that a fully realized world model should possess.
- Score: 58.8419978790227
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Autonomous agents are increasingly expected to operate in complex, dynamic, and uncertain environments, performing tasks such as manipulation, navigation, and decision-making. Achieving these capabilities requires agents to understand the underlying mechanisms and dynamics of the world, moving beyond reactive control or simple replication of observed states. This motivates the development of world models as internal representations that encode environmental states, capture dynamics, and support prediction, planning, and reasoning. Despite growing interest, the definition, scope, architectures, and essential capabilities of world models remain ambiguous. In this survey, we go beyond prescribing a fixed definition and limiting our scope to methods explicitly labeled as world models. Instead, we examine approaches that exhibit the core capabilities of world models through a review of methods in robotic manipulation. We analyze their roles across perception, prediction, and control, identify key challenges and solutions, and distill the core components, capabilities, and functions that a fully realized world model should possess. Building on this analysis, we aim to motivate further development toward generalizable and practical world models for robotics.
Related papers
- Research on World Models Is Not Merely Injecting World Knowledge into Specific Tasks [43.59401259468559]
We argue that a robust world model should not be a loose collection of capabilities but a normative framework that integrally incorporates interaction, perception, symbolic reasoning, and spatial representation.<n>This work aims to guide future research toward more general, robust, and principled models of the world.
arXiv Detail & Related papers (2026-02-02T04:42:44Z) - SmallWorlds: Assessing Dynamics Understanding of World Models in Isolated Environments [15.243547292947397]
We introduce the SmallWorld Benchmark, a testbed designed to assess world model capability under isolated and precisely controlled dynamics.<n>We conduct comprehensive experiments in the fully observable state space on representative architectures including Recurrent State Space Model, Transformer, Diffusion model, and Neural ODE.<n>The experimental results reveal how effectively these models capture environment structure and how their predictions deteriorate over extended rollouts.
arXiv Detail & Related papers (2025-11-28T18:56:02Z) - Edge General Intelligence Through World Models and Agentic AI: Fundamentals, Solutions, and Challenges [87.02855999212817]
Edge General Intelligence (EGI) represents a transformative evolution of edge computing, where distributed agents possess the capability to perceive, reason, and act autonomously.<n>World models act as proactive internal simulators that not only predict but also actively imagine future trajectories, reason under uncertainty, and plan multi-step actions with foresight.<n>This survey bridges the gap by offering a comprehensive analysis of how world models can empower agentic artificial intelligence (AI) systems at the edge.
arXiv Detail & Related papers (2025-08-13T07:29:40Z) - AI in a vat: Fundamental limits of efficient world modelling for agent sandboxing and interpretability [84.52205243353761]
Recent work proposes using world models to generate controlled virtual environments in which AI agents can be tested before deployment.<n>We investigate ways of simplifying world models that remain agnostic to the AI agent under evaluation.
arXiv Detail & Related papers (2025-04-06T20:35:44Z) - A Survey of World Models for Autonomous Driving [55.520179689933904]
Recent breakthroughs in autonomous driving have been propelled by advances in robust world modeling.<n>World models offer high-fidelity representations of the driving environment that integrate multi-sensor data, semantic cues, and temporal dynamics.<n>Future research must address key challenges in self-supervised representation learning, multimodal fusion, and advanced simulation.
arXiv Detail & Related papers (2025-01-20T04:00:02Z) - Exploring the Interplay Between Video Generation and World Models in Autonomous Driving: A Survey [61.39993881402787]
World models and video generation are pivotal technologies in the domain of autonomous driving.
This paper investigates the relationship between these two technologies.
By analyzing the interplay between video generation and world models, this survey identifies critical challenges and future research directions.
arXiv Detail & Related papers (2024-11-05T08:58:35Z) - Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond [90.63687738298125]
General world models represent a crucial pathway toward achieving Artificial General Intelligence (AGI)<n>In this survey, we embark on a comprehensive exploration of the latest advancements in world models.<n>We examine challenges and limitations of world models, and discuss their potential future directions.
arXiv Detail & Related papers (2024-05-06T14:37:07Z) - A Survey on Robotics with Foundation Models: toward Embodied AI [30.999414445286757]
Recent advances in computer vision, natural language processing, and multi-modality learning have shown that the foundation models have superhuman capabilities for specific tasks.
This survey aims to provide a comprehensive and up-to-date overview of foundation models in robotics, focusing on autonomous manipulation and encompassing high-level planning and low-level control.
arXiv Detail & Related papers (2024-02-04T07:55:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.