FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models
- URL: http://arxiv.org/abs/2308.05733v1
- Date: Thu, 10 Aug 2023 17:55:02 GMT
- Title: FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models
- Authors: Guangkai Xu, Wei Yin, Hao Chen, Chunhua Shen, Kai Cheng, Feng Zhao
- Abstract summary: We propose a novel test-time optimization approach for 3D scene reconstruction.
Our method achieves state-of-the-art cross-dataset reconstruction on five zero-shot testing datasets.
- Score: 67.96827539201071
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D scene reconstruction is a long-standing vision task. Existing approaches
can be categorized into geometry-based and learning-based methods. The former
leverages multi-view geometry but can face catastrophic failures due to the
reliance on accurate pixel correspondence across views. The latter was
proffered to mitigate these issues by learning 2D or 3D representation
directly. However, without a large-scale video or 3D training data, it can
hardly generalize to diverse real-world scenarios due to the presence of tens
of millions or even billions of optimization parameters in the deep network.
Recently, robust monocular depth estimation models trained with large-scale
datasets have been proven to possess weak 3D geometry prior, but they are
insufficient for reconstruction due to the unknown camera parameters, the
affine-invariant property, and inter-frame inconsistency. Here, we propose a
novel test-time optimization approach that can transfer the robustness of
affine-invariant depth models such as LeReS to challenging diverse scenes while
ensuring inter-frame consistency, with only dozens of parameters to optimize
per video frame. Specifically, our approach involves freezing the pre-trained
affine-invariant depth model's depth predictions, rectifying them by optimizing
the unknown scale-shift values with a geometric consistency alignment module,
and employing the resulting scale-consistent depth maps to robustly obtain
camera poses and achieve dense scene reconstruction, even in low-texture
regions. Experiments show that our method achieves state-of-the-art
cross-dataset reconstruction on five zero-shot testing datasets.
Related papers
- PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting [54.7468067660037]
PF3plat sets a new state-of-the-art across all benchmarks, supported by comprehensive ablation studies validating our design choices.
Our framework capitalizes on fast speed, scalability, and high-quality 3D reconstruction and view synthesis capabilities of 3DGS.
arXiv Detail & Related papers (2024-10-29T15:28:15Z) - Robust Geometry-Preserving Depth Estimation Using Differentiable
Rendering [93.94371335579321]
We propose a learning framework that trains models to predict geometry-preserving depth without requiring extra data or annotations.
Comprehensive experiments underscore our framework's superior generalization capabilities.
Our innovative loss functions empower the model to autonomously recover domain-specific scale-and-shift coefficients.
arXiv Detail & Related papers (2023-09-18T12:36:39Z) - Towards Accurate Reconstruction of 3D Scene Shape from A Single
Monocular Image [91.71077190961688]
We propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image.
We then exploits 3D point cloud data to predict the depth shift and the camera's focal length that allow us to recover 3D scene shapes.
We test our depth model on nine unseen datasets and achieve state-of-the-art performance on zero-shot evaluation.
arXiv Detail & Related papers (2022-08-28T16:20:14Z) - Towards 3D Scene Reconstruction from Locally Scale-Aligned Monocular
Video Depth [90.33296913575818]
In some video-based scenarios such as video depth estimation and 3D scene reconstruction from a video, the unknown scale and shift residing in per-frame prediction may cause the depth inconsistency.
We propose a locally weighted linear regression method to recover the scale and shift with very sparse anchor points.
Our method can boost the performance of existing state-of-the-art approaches by 50% at most over several zero-shot benchmarks.
arXiv Detail & Related papers (2022-02-03T08:52:54Z) - Towards Non-Line-of-Sight Photography [48.491977359971855]
Non-line-of-sight (NLOS) imaging is based on capturing the multi-bounce indirect reflections from the hidden objects.
Active NLOS imaging systems rely on the capture of the time of flight of light through the scene.
We propose a new problem formulation, called NLOS photography, to specifically address this deficiency.
arXiv Detail & Related papers (2021-09-16T08:07:13Z) - H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction [27.66008315400462]
Recent learning approaches that implicitly represent surface geometry have shown impressive results in the problem of multi-view 3D reconstruction.
We tackle these limitations for the specific problem of few-shot full 3D head reconstruction.
We learn a shape model of 3D heads from thousands of incomplete raw scans using implicit representations.
arXiv Detail & Related papers (2021-07-26T23:04:18Z) - Learning to Recover 3D Scene Shape from a Single Image [98.20106822614392]
We propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image.
We then use 3D point cloud encoders to predict the missing depth shift and focal length that allow us to recover a realistic 3D scene shape.
arXiv Detail & Related papers (2020-12-17T02:35:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.