MonoInstance: Enhancing Monocular Priors via Multi-view Instance Alignment for Neural Rendering and Reconstruction
- URL: http://arxiv.org/abs/2503.18363v2
- Date: Sun, 30 Mar 2025 04:42:06 GMT
- Title: MonoInstance: Enhancing Monocular Priors via Multi-view Instance Alignment for Neural Rendering and Reconstruction
- Authors: Wenyuan Zhang, Yixiao Yang, Han Huang, Liang Han, Kanle Shi, Yu-Shen Liu, Zhizhong Han,
- Abstract summary: Monocular depth priors have been widely adopted by neural rendering in multi-view based tasks such as 3D reconstruction and novel view synthesis.<n>Current methods treat the entire estimated depth map indiscriminately, and use it as ground truth supervision.<n>We propose MonoInstance, a general approach that explores the uncertainty of monocular depths to provide enhanced geometric priors.
- Score: 45.70946415376022
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Monocular depth priors have been widely adopted by neural rendering in multi-view based tasks such as 3D reconstruction and novel view synthesis. However, due to the inconsistent prediction on each view, how to more effectively leverage monocular cues in a multi-view context remains a challenge. Current methods treat the entire estimated depth map indiscriminately, and use it as ground truth supervision, while ignoring the inherent inaccuracy and cross-view inconsistency in monocular priors. To resolve these issues, we propose MonoInstance, a general approach that explores the uncertainty of monocular depths to provide enhanced geometric priors for neural rendering and reconstruction. Our key insight lies in aligning each segmented instance depths from multiple views within a common 3D space, thereby casting the uncertainty estimation of monocular depths into a density measure within noisy point clouds. For high-uncertainty areas where depth priors are unreliable, we further introduce a constraint term that encourages the projected instances to align with corresponding instance masks on nearby views. MonoInstance is a versatile strategy which can be seamlessly integrated into various multi-view neural rendering frameworks. Our experimental results demonstrate that MonoInstance significantly improves the performance in both reconstruction and novel view synthesis under various benchmarks.
Related papers
- Mono3R: Exploiting Monocular Cues for Geometric 3D Reconstruction [11.220655907305515]
We introduce a monocular-guided refinement module that integrates monocular geometric priors into multi-view reconstruction frameworks.
Our method achieves substantial improvements in both mutli-view camera pose estimation and point cloud accuracy.
arXiv Detail & Related papers (2025-04-18T02:33:12Z) - Multi-view Reconstruction via SfM-guided Monocular Depth Estimation [92.89227629434316]
We present a new method for multi-view geometric reconstruction.<n>We incorporate SfM information, a strong multi-view prior, into the depth estimation process.<n>Our method significantly improves the quality of depth estimation compared to previous monocular depth estimation works.
arXiv Detail & Related papers (2025-03-18T17:54:06Z) - Fine-detailed Neural Indoor Scene Reconstruction using multi-level importance sampling and multi-view consistency [1.912429179274357]
We propose a novel neural implicit surface reconstruction method, named FD-NeuS, to learn fine-detailed 3D models.
Specifically, we leverage segmentation priors to guide region-based ray sampling, and use piecewise exponential functions as weights to pilot 3D points sampling.
In addition, we introduce multi-view feature consistency and multi-view normal consistency as supervision and uncertainty respectively, which further improve the reconstruction of details.
arXiv Detail & Related papers (2024-10-10T04:08:06Z) - GenS: Generalizable Neural Surface Reconstruction from Multi-View Images [20.184657468900852]
GenS is an end-to-end generalizable neural surface reconstruction model.
Our representation is more powerful, which can recover high-frequency details while maintaining global smoothness.
Experiments on popular benchmarks show that our model can generalize well to new scenes.
arXiv Detail & Related papers (2024-06-04T17:13:10Z) - Learning to Fuse Monocular and Multi-view Cues for Multi-frame Depth
Estimation in Dynamic Scenes [51.20150148066458]
We propose a novel method to learn to fuse the multi-view and monocular cues encoded as volumes without needing the generalizationally crafted masks.
Experiments on real-world datasets prove the significant effectiveness and ability of the proposed method.
arXiv Detail & Related papers (2023-04-18T13:55:24Z) - Progressive Multi-view Human Mesh Recovery with Self-Supervision [68.60019434498703]
Existing solutions typically suffer from poor generalization performance to new settings.
We propose a novel simulation-based training pipeline for multi-view human mesh recovery.
arXiv Detail & Related papers (2022-12-10T06:28:29Z) - MonoSDF: Exploring Monocular Geometric Cues for Neural Implicit Surface
Reconstruction [72.05649682685197]
State-of-the-art neural implicit methods allow for high-quality reconstructions of simple scenes from many input views.
This is caused primarily by the inherent ambiguity in the RGB reconstruction loss that does not provide enough constraints.
Motivated by recent advances in the area of monocular geometry prediction, we explore the utility these cues provide for improving neural implicit surface reconstruction.
arXiv Detail & Related papers (2022-06-01T17:58:15Z) - Self-Supervised Visibility Learning for Novel View Synthesis [79.53158728483375]
Conventional rendering methods estimate scene geometry and synthesize novel views in two separate steps.
We propose an end-to-end NVS framework to eliminate the error propagation issue.
Our network is trained in an end-to-end self-supervised fashion, thus significantly alleviating error accumulation in view synthesis.
arXiv Detail & Related papers (2021-03-29T08:11:25Z) - Monocular Depth Estimation with Self-supervised Instance Adaptation [138.0231868286184]
In robotics applications, multiple views of a scene may or may not be available, depend-ing on the actions of the robot.
We propose a new approach that extends any off-the-shelf self-supervised monocular depth reconstruction system to usemore than one image at test time.
arXiv Detail & Related papers (2020-04-13T08:32:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.