Related papers: WorldMirror: Universal 3D World Reconstruction with Any-Prior Prompting

WorldMirror: Universal 3D World Reconstruction with Any-Prior Prompting

URL: http://arxiv.org/abs/2510.10726v1
Date: Sun, 12 Oct 2025 17:59:09 GMT
Title: WorldMirror: Universal 3D World Reconstruction with Any-Prior Prompting
Authors: Yifan Liu, Zhiyuan Min, Zhenwei Wang, Junta Wu, Tengfei Wang, Yixuan Yuan, Yawei Luo, Chunchao Guo,
Abstract summary: We present WorldMirror, an all-in-one, feed-forward model for versatile 3D geometric prediction tasks.<n>Our framework flexibly integrates diverse geometric priors, including camera poses, intrinsics, and depth maps.<n>WorldMirror achieves state-of-the-art performance across diverse benchmarks from camera, point map, depth, and surface normal estimation to novel view synthesis.
Score: 51.69408870574092
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present WorldMirror, an all-in-one, feed-forward model for versatile 3D geometric prediction tasks. Unlike existing methods constrained to image-only inputs or customized for a specific task, our framework flexibly integrates diverse geometric priors, including camera poses, intrinsics, and depth maps, while simultaneously generating multiple 3D representations: dense point clouds, multi-view depth maps, camera parameters, surface normals, and 3D Gaussians. This elegant and unified architecture leverages available prior information to resolve structural ambiguities and delivers geometrically consistent 3D outputs in a single forward pass. WorldMirror achieves state-of-the-art performance across diverse benchmarks from camera, point map, depth, and surface normal estimation to novel view synthesis, while maintaining the efficiency of feed-forward inference. Code and models will be publicly available soon.

Related papers

EA3D: Online Open-World 3D Object Extraction from Streaming Videos [55.48835711373918]
We present ExtractAnything3D (EA3D), a unified online framework for open-world 3D object extraction.<n>Given a streaming video, EA3D dynamically interprets each frame using vision-language and 2D vision foundation encoders to extract object-level knowledge.<n>A recurrent joint optimization module directs the model's attention to regions of interest, simultaneously enhancing both geometric reconstruction and semantic understanding.
arXiv Detail & Related papers (2025-10-29T03:56:41Z)
MapAnything: Universal Feed-Forward Metric 3D Reconstruction [63.79151976126576]
MapAnything ingests one or more images along with optional geometric inputs such as camera intrinsics, poses, depth, or partial reconstructions.<n>It then directly regresses the metric 3D scene geometry and cameras.<n>MapAnything addresses a broad range of 3D vision tasks in a single feed-forward pass.
arXiv Detail & Related papers (2025-09-16T18:00:14Z)
Dens3R: A Foundation Model for 3D Geometry Prediction [44.13431776180547]
Dens3R is a 3D foundation model designed for joint geometric dense prediction.<n>By integrating image-pair matching features with intrinsic invariance modeling, Dens3R accurately regresses multiple geometric quantities.
arXiv Detail & Related papers (2025-07-22T07:22:30Z)
E3D-Bench: A Benchmark for End-to-End 3D Geometric Foundation Models [78.1674905950243]
We present the first comprehensive benchmark for 3D geometric foundation models (GFMs)<n>GFMs directly predict dense 3D representations in a single feed-forward pass, eliminating the need for slow or unavailable precomputed camera parameters.<n>We evaluate 16 state-of-the-art GFMs, revealing their strengths and limitations across tasks and domains.<n>All code, evaluation scripts, and processed data will be publicly released to accelerate research in 3D spatial intelligence.
arXiv Detail & Related papers (2025-06-02T17:53:09Z)
MUSt3R: Multi-view Network for Stereo 3D Reconstruction [11.61182864709518]
We propose an extension of DUSt3R from pairs to multiple views, that addresses all aforementioned concerns.<n>We entail the model with a multi-layer memory mechanism which allows to reduce the computational complexity.<n>The framework is designed to perform 3D reconstruction both offline and online, and hence can be seamlessly applied to SfM and visual SLAM scenarios.
arXiv Detail & Related papers (2025-03-03T15:36:07Z)
FLARE: Feed-forward Geometry, Appearance and Camera Estimation from Uncalibrated Sparse Views [100.45129752375658]
We present FLARE, a feed-forward model designed to infer high-quality camera poses and 3D geometry from uncalibrated sparse-view images.<n>Our solution features a cascaded learning paradigm with camera pose serving as the critical bridge, recognizing its essential role in mapping 3D structures onto 2D image planes.
arXiv Detail & Related papers (2025-02-17T18:54:05Z)
Large Spatial Model: End-to-end Unposed Images to Semantic 3D [79.94479633598102]
Large Spatial Model (LSM) processes unposed RGB images directly into semantic radiance fields. LSM simultaneously estimates geometry, appearance, and semantics in a single feed-forward operation. It can generate versatile label maps by interacting with language at novel viewpoints.
arXiv Detail & Related papers (2024-10-24T17:54:42Z)
Zero-Shot Multi-Object Scene Completion [59.325611678171974]
We present a 3D scene completion method that recovers the complete geometry of multiple unseen objects in complex scenes from a single RGB-D image. Our method outperforms the current state-of-the-art on both synthetic and real-world datasets.
arXiv Detail & Related papers (2024-03-21T17:59:59Z)
Omni-Recon: Harnessing Image-based Rendering for General-Purpose Neural Radiance Fields [29.573344213110172]
We propose a framework called Omni-Recon, which is capable of (1) generalizable 3D reconstruction and zero-shot multitask scene understanding, and (2) adaptability to diverse downstream 3D applications such as real-time rendering and scene editing. Specifically, our Omni-Recon features a general-purpose NeRF model using image-based rendering with two decoupled branches. This design achieves state-of-the-art (SOTA) generalizable 3D surface reconstruction quality with blending weights reusable across diverse tasks for zero-shot multitask scene understanding.
arXiv Detail & Related papers (2024-03-17T07:47:26Z)
DUSt3R: Geometric 3D Vision Made Easy [8.471330244002564]
We introduce DUSt3R, a novel paradigm for Dense and Unconstrained Stereo 3D Reconstruction of arbitrary image collections.<n>We show that this formulation smoothly unifies the monocular and binocular reconstruction cases.<n>Our formulation directly provides a 3D model of the scene as well as depth information, but interestingly, we can seamlessly recover from it, pixel matches, relative and absolute camera.
arXiv Detail & Related papers (2023-12-21T18:52:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.