Advancing Open-source World Models
- URL: http://arxiv.org/abs/2601.20540v1
- Date: Wed, 28 Jan 2026 12:37:01 GMT
- Title: Advancing Open-source World Models
- Authors: Robbyant Team, Zelin Gao, Qiuyu Wang, Yanhong Zeng, Jiapeng Zhu, Ka Leong Cheng, Yixuan Li, Hanlin Wang, Yinghao Xu, Shuailei Ma, Yihang Chen, Jie Liu, Yansong Cheng, Yao Yao, Jiayi Zhu, Yihao Meng, Kecheng Zheng, Qingyan Bai, Jingye Chen, Zehong Shen, Yue Yu, Xing Zhu, Yujun Shen, Hao Ouyang,
- Abstract summary: LingBot-World is an open-sourced world simulator stemming from video generation.<n>It maintains high fidelity and robust dynamics in a broad spectrum of environments.<n>It supports real-time interactivity, achieving a latency of under 1 second when producing 16 frames per second.
- Score: 92.17462908419326
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present LingBot-World, an open-sourced world simulator stemming from video generation. Positioned as a top-tier world model, LingBot-World offers the following features. (1) It maintains high fidelity and robust dynamics in a broad spectrum of environments, including realism, scientific contexts, cartoon styles, and beyond. (2) It enables a minute-level horizon while preserving contextual consistency over time, which is also known as "long-term memory". (3) It supports real-time interactivity, achieving a latency of under 1 second when producing 16 frames per second. We provide public access to the code and model in an effort to narrow the divide between open-source and closed-source technologies. We believe our release will empower the community with practical applications across areas like content creation, gaming, and robot learning.
Related papers
- Beyond Pixel Histories: World Models with Persistent 3D State [50.4601060508243]
PERSIST is a new paradigm of world model which simulates the evolution of a latent 3D scene.<n>We show substantial improvements in spatial memory, 3D consistency, and long-horizon stability over existing methods.
arXiv Detail & Related papers (2026-03-03T19:58:31Z) - DreamWorld: Unified World Modeling in Video Generation [32.857497363728584]
We introduce textbfDreamWorld, a unified framework that integrates complementary world knowledge into video generators.<n>We show that DreamWorld improves world consistency, outperforming Wan2.1 by 2.26 points on VBench.
arXiv Detail & Related papers (2026-02-28T05:02:39Z) - Web World Models [60.208836336654315]
We introduce the Web World Model (WWM), a middle ground where world state and physics'' are implemented in ordinary web code.<n>We build a suite of WWMs on a realistic web stack, including an infinite travel atlas grounded in real geography, fictional galaxy explorers, web-scale encyclopedic and narrative worlds, and simulation- and game-like environments.<n>Our results suggest that web stacks themselves can serve as a scalable substrate for world models, enabling controllable yet open-ended environments.
arXiv Detail & Related papers (2025-12-29T18:31:45Z) - Yume-1.5: A Text-Controlled Interactive World Generation Model [78.93049063633084]
method is a novel framework designed to generate realistic, interactive, and continuous worlds from a single image or text prompt.<n>method achieves this through a carefully designed framework that supports keyboard-based exploration of the generated worlds.
arXiv Detail & Related papers (2025-12-26T17:52:49Z) - WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling [34.486078065308995]
WorldPlay is a streaming video diffusion model that enables real-time, interactive world modeling with long-term geometric consistency.<n>We use a Dual Action Representation to enable robust action control in response to the user's keyboard and mouse inputs.<n>We also propose Context Forcing, a novel distillation method designed for memory-aware model.
arXiv Detail & Related papers (2025-12-16T17:22:46Z) - UnityVideo: Unified Multi-Modal Multi-Task Learning for Enhancing World-Aware Video Generation [61.98887854225878]
We introduce UnityVideo, a unified framework for world-aware video generation.<n>Our approach features two core components: (1) dynamic noising to unify heterogeneous training paradigms, and (2) a modality switcher with an in-context learner.<n>We demonstrate that UnityVideo achieves superior video quality, consistency, and improved alignment with physical world constraints.
arXiv Detail & Related papers (2025-12-08T18:59:01Z) - LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation [35.4193352348583]
We propose a simple yet effective 3D world generation framework that streamlines the industrial production pipeline of 3D environments.<n>LatticeWorld creates large-scale 3D interactive worlds with dynamic agents, featuring competitive multi-agent interaction.<n>LatticeWorld achieves over a $90times$ increase in industrial production efficiency.
arXiv Detail & Related papers (2025-09-05T17:22:33Z) - Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model [15.16063778402193]
Matrix-Game 2.0 is an interactive world model generates long videos on-the-fly via few-step auto-regressive diffusion.<n>It can generate high-quality minute-level videos across diverse scenes at an ultra-fast speed of 25 FPS.
arXiv Detail & Related papers (2025-08-18T15:28:53Z) - Video World Models with Long-term Spatial Memory [110.530715838396]
We introduce a novel framework to enhance long-term consistency of video world models.<n>Our framework includes mechanisms to store and retrieve information from the long-term spatial memory.<n>Our evaluations show improved quality, consistency, and context length compared to relevant baselines.
arXiv Detail & Related papers (2025-06-05T17:42:34Z) - Open-Sora: Democratizing Efficient Video Production for All [15.68402186082992]
We create Open-Sora, an open-source video generation model designed to produce high-fidelity video content.<n>Open-Sora supports a wide spectrum of visual generation tasks, including text-to-image generation, text-to-video generation, and image-to-video generation.<n>By embracing the open-source principle, Open-Sora democratizes full access to all the training/inference/data preparation codes as well as model weights.
arXiv Detail & Related papers (2024-12-29T08:52:49Z) - WonderWorld: Interactive 3D Scene Generation from a Single Image [38.83667648993784]
We present WonderWorld, a novel framework for interactive 3D scene generation.<n>WonderWorld generates connected and diverse 3D scenes in less than 10 seconds on a single A6000 GPU.
arXiv Detail & Related papers (2024-06-13T17:59:10Z) - Self-supervised novel 2D view synthesis of large-scale scenes with
efficient multi-scale voxel carving [77.07589573960436]
We introduce an efficient multi-scale voxel carving method to generate novel views of real scenes.
Our final high-resolution output is efficiently self-trained on data automatically generated by the voxel carving module.
We demonstrate the effectiveness of our method on highly complex and large-scale scenes in real environments.
arXiv Detail & Related papers (2023-06-26T13:57:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.