WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion
- URL: http://arxiv.org/abs/2512.19678v1
- Date: Mon, 22 Dec 2025 18:53:50 GMT
- Title: WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion
- Authors: Hanyang Kong, Xingyi Yang, Xiaoxu Zheng, Xinchao Wang,
- Abstract summary: WorldWarp is a framework that couples a 3D structural anchor with a 2D generative refiner.<n>WorldWarp maintains consistency across video chunks by dynamically updating the 3D cache at every step.<n>It achieves state-of-the-art fidelity by ensuring that 3D logic guides structure while diffusion logic perfects texture.
- Score: 78.20778143251171
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generating long-range, geometrically consistent video presents a fundamental dilemma: while consistency demands strict adherence to 3D geometry in pixel space, state-of-the-art generative models operate most effectively in a camera-conditioned latent space. This disconnect causes current methods to struggle with occluded areas and complex camera trajectories. To bridge this gap, we propose WorldWarp, a framework that couples a 3D structural anchor with a 2D generative refiner. To establish geometric grounding, WorldWarp maintains an online 3D geometric cache built via Gaussian Splatting (3DGS). By explicitly warping historical content into novel views, this cache acts as a structural scaffold, ensuring each new frame respects prior geometry. However, static warping inevitably leaves holes and artifacts due to occlusions. We address this using a Spatio-Temporal Diffusion (ST-Diff) model designed for a "fill-and-revise" objective. Our key innovation is a spatio-temporal varying noise schedule: blank regions receive full noise to trigger generation, while warped regions receive partial noise to enable refinement. By dynamically updating the 3D cache at every step, WorldWarp maintains consistency across video chunks. Consequently, it achieves state-of-the-art fidelity by ensuring that 3D logic guides structure while diffusion logic perfects texture. Project page: \href{https://hyokong.github.io/worldwarp-page/}{https://hyokong.github.io/worldwarp-page/}.
Related papers
- RnG: A Unified Transformer for Complete 3D Modeling from Partial Observations [70.83499963694238]
RnG (Reconstruction and Generation) is a novel feed-forward Transformer that unifies reconstruction and generation.<n>It reconstructs visible geometry and generates plausible, coherent unseen geometry and appearance.<n>Our method achieves state-of-the-art performance in both generalizable 3D reconstruction and novel view generation.
arXiv Detail & Related papers (2026-03-01T17:25:32Z) - Interp3D: Correspondence-aware Interpolation for Generative Textured 3D Morphing [63.141976759536625]
We propose Interp3D, a training-free framework for textured 3D morphing.<n>It harnesses generative priors and adopts a progressive alignment principle to ensure both geometric fidelity and texture coherence.<n>For comprehensive evaluations, we construct a dedicated dataset, Interp3DData, with graded difficulty levels and assess generation results from fidelity, transition smoothness, and plausibility.
arXiv Detail & Related papers (2026-01-20T16:03:22Z) - EA3D: Online Open-World 3D Object Extraction from Streaming Videos [55.48835711373918]
We present ExtractAnything3D (EA3D), a unified online framework for open-world 3D object extraction.<n>Given a streaming video, EA3D dynamically interprets each frame using vision-language and 2D vision foundation encoders to extract object-level knowledge.<n>A recurrent joint optimization module directs the model's attention to regions of interest, simultaneously enhancing both geometric reconstruction and semantic understanding.
arXiv Detail & Related papers (2025-10-29T03:56:41Z) - WorldGrow: Generating Infinite 3D World [75.81531067447203]
We tackle the challenge of generating the infinitely extendable 3D world -- large, continuous environments with coherent geometry and realistic appearance.<n>We propose WorldGrow, a hierarchical framework for unbounded 3D scene synthesis.<n>Our method features three core components: (1) a data curation pipeline that extracts high-quality scene blocks for training, making the 3D structured latent representations suitable for scene generation; (2) a 3D block inpainting mechanism that enables context-aware scene extension; and (3) a coarse-to-fine generation strategy that ensures both global layout plausibility and local geometric/textural fidelity.
arXiv Detail & Related papers (2025-10-24T17:39:52Z) - 3D Scene Prompting for Scene-Consistent Camera-Controllable Video Generation [55.29423122177883]
3DScenePrompt is a framework that generates the next chunk from arbitrary-length input.<n>It enables camera control and preserving scene consistency.<n>Our framework significantly outperforms existing methods in scene consistency, camera controllability, and generation quality.
arXiv Detail & Related papers (2025-10-16T17:55:25Z) - GeoPurify: A Data-Efficient Geometric Distillation Framework for Open-Vocabulary 3D Segmentation [57.8059956428009]
Recent attempts to transfer features from 2D Vision-Language Models to 3D semantic segmentation expose a persistent trade-off.<n>We propose GeoPurify that applies a small Student Affinity Network to 2D VLM-generated 3D point features using geometric priors distilled from a 3D self-supervised teacher model.<n>Benefiting from latent geometric information and the learned affinity network, GeoPurify effectively mitigates the trade-off and achieves superior data efficiency.
arXiv Detail & Related papers (2025-10-02T16:37:56Z) - WonderVerse: Extendable 3D Scene Generation with Video Generative Models [28.002645364066005]
We introduce WonderVerse, a framework for generating extendable 3D scenes.<n>WonderVerse leverages the powerful world-level priors embedded within video generative foundation models.<n>It is compatible with various 3D reconstruction methods, allowing both efficient and high-quality generation.
arXiv Detail & Related papers (2025-03-12T08:44:51Z) - GaussRender: Learning 3D Occupancy with Gaussian Rendering [86.89653628311565]
GaussRender is a module that improves 3D occupancy learning by enforcing projective consistency.<n>Our method penalizes 3D configurations that produce inconsistent 2D projections, thereby enforcing a more coherent 3D structure.
arXiv Detail & Related papers (2025-02-07T16:07:51Z) - Deep Geometric Moments Promote Shape Consistency in Text-to-3D Generation [27.43973967994717]
MT3D is a text-to-3D generative model that leverages a high-fidelity 3D object to overcome viewpoint bias.<n>By incorporating geometric details from a 3D asset, MT3D enables the creation of diverse and geometrically consistent objects.
arXiv Detail & Related papers (2024-08-12T06:25:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.