Related papers: AnchorWeave: World-Consistent Video Generation with Retrieved Local Spatial Memories

AnchorWeave: World-Consistent Video Generation with Retrieved Local Spatial Memories

URL: http://arxiv.org/abs/2602.14941v1
Date: Mon, 16 Feb 2026 17:23:08 GMT
Title: AnchorWeave: World-Consistent Video Generation with Retrieved Local Spatial Memories
Authors: Zun Wang, Han Lin, Jaehong Yoon, Jaemin Cho, Yue Zhang, Mohit Bansal,
Abstract summary: Existing memory-based approaches often condition generation on globally reconstructed 3D scenes by rendering anchor videos from the reconstructed geometry in the history.<n>We introduce AnchorWeave, a memory-augmented video generation framework that replaces a single misaligned global memory with multiple clean local geometric memories.<n>Experiments demonstrate that AnchorWeave significantly improves long-term scene consistency while maintaining strong visual quality.
Score: 78.78355829813793
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Maintaining spatial world consistency over long horizons remains a central challenge for camera-controllable video generation. Existing memory-based approaches often condition generation on globally reconstructed 3D scenes by rendering anchor videos from the reconstructed geometry in the history. However, reconstructing a global 3D scene from multiple views inevitably introduces cross-view misalignment, as pose and depth estimation errors cause the same surfaces to be reconstructed at slightly different 3D locations across views. When fused, these inconsistencies accumulate into noisy geometry that contaminates the conditioning signals and degrades generation quality. We introduce AnchorWeave, a memory-augmented video generation framework that replaces a single misaligned global memory with multiple clean local geometric memories and learns to reconcile their cross-view inconsistencies. To this end, AnchorWeave performs coverage-driven local memory retrieval aligned with the target trajectory and integrates the selected local memories through a multi-anchor weaving controller during generation. Extensive experiments demonstrate that AnchorWeave significantly improves long-term scene consistency while maintaining strong visual quality, with ablation and analysis studies further validating the effectiveness of local geometric conditioning, multi-anchor control, and coverage-driven retrieval.

Related papers

WorldStereo: Bridging Camera-Guided Video Generation and Scene Reconstruction via 3D Geometric Memories [36.79437857022868]
WorldStereo is a novel framework that bridges camera-guided video generation and 3D reconstruction.<n>We show that WorldStereo acts as a powerful world model, tackling diverse scene generation tasks with high-fidelity 3D results.
arXiv Detail & Related papers (2026-03-02T16:36:56Z)
Scalable Adaptation of 3D Geometric Foundation Models via Weak Supervision from Internet Video [76.32954467706581]
We propose SAGE, a framework for Scalable Adaptation of GEometric foundation models from raw video streams.<n>We use a hierarchical mining pipeline to transform videos into training trajectories and hybrid supervision.<n>Experiments show that SAGE significantly enhances zero-shot generalization, reducing Chamfer Distance by 20-42% on unseen benchmarks.
arXiv Detail & Related papers (2026-02-08T09:53:21Z)
Geometry-Aware Rotary Position Embedding for Consistent Video World Model [48.914346802616414]
ViewRope is a geometry-aware encoding that injects camera-ray directions directly into video transformer self-attention layers.<n>Geometry-Aware Frame-Sparse Attention exploits these geometric cues to selectively attend to relevant historical frames.<n>Our results demonstrate that ViewRope substantially improves long-term consistency while reducing computational costs.
arXiv Detail & Related papers (2026-02-08T08:01:16Z)
EvoWorld: Evolving Panoramic World Generation with Explicit 3D Memory [40.346684158976494]
EvoWorld bridges panoramic video generation with evolving 3D memory to enable spatially consistent long-horizon exploration.<n>Unlike prior state-of-the-arts that synthesize videos only, our key insight lies in exploiting this evolving 3D reconstruction as explicit spatial guidance.<n>To evaluate long-range exploration capabilities, we introduce the first comprehensive benchmark spanning synthetic outdoor environments, Habitat indoor scenes, and challenging real-world scenarios.
arXiv Detail & Related papers (2025-10-01T17:59:38Z)
IGFuse: Interactive 3D Gaussian Scene Reconstruction via Multi-Scans Fusion [15.837932667195037]
IGFuse is a novel framework that reconstructs interactive Gaussian scene by fusing observations from multiple scans.<n>Our method constructs segmentation aware Gaussian fields and enforces bi-directional photometric and semantic consistency across scans.<n>IGFuse enables high fidelity rendering and object level scene manipulation without dense observations or complex pipelines.
arXiv Detail & Related papers (2025-08-18T17:59:47Z)
VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory [55.73900731190389]
We introduce Surfel-Indexed View Memory (VMem), a memory module that remembers past views by indexing them geometrically based on the 3D surface elements (surfels) they have observed.<n>VMem enables efficient retrieval of the most relevant past views when generating new ones.
arXiv Detail & Related papers (2025-06-23T17:59:56Z)
Video World Models with Long-term Spatial Memory [110.530715838396]
We introduce a novel framework to enhance long-term consistency of video world models.<n>Our framework includes mechanisms to store and retrieve information from the long-term spatial memory.<n>Our evaluations show improved quality, consistency, and context length compared to relevant baselines.
arXiv Detail & Related papers (2025-06-05T17:42:34Z)
Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation [66.95956271144982]
We present Voyager, a novel video diffusion framework that generates world-consistent 3D point-cloud sequences from a single image.<n>Unlike existing approaches, Voyager achieves end-to-end scene generation and reconstruction with inherent consistency across frames.
arXiv Detail & Related papers (2025-06-04T17:59:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.