MonoSelfRecon: Purely Self-Supervised Explicit Generalizable 3D Reconstruction of Indoor Scenes from Monocular RGB Views
- URL: http://arxiv.org/abs/2404.06753v1
- Date: Wed, 10 Apr 2024 05:41:05 GMT
- Title: MonoSelfRecon: Purely Self-Supervised Explicit Generalizable 3D Reconstruction of Indoor Scenes from Monocular RGB Views
- Authors: Runfa Li, Upal Mahbub, Vasudev Bhaskaran, Truong Nguyen,
- Abstract summary: MonoSelfRecon achieves explicit 3D mesh reconstruction for generalizable indoor scenes with monocular RGB views by purely self-supervision on voxel-SDF.
We propose novel self-supervised losses, which not only support pure self-supervision, but can be used together with supervised signals to further boost supervised training.
- Score: 4.570455747723325
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Current monocular 3D scene reconstruction (3DR) works are either fully-supervised, or not generalizable, or implicit in 3D representation. We propose a novel framework - MonoSelfRecon that for the first time achieves explicit 3D mesh reconstruction for generalizable indoor scenes with monocular RGB views by purely self-supervision on voxel-SDF (signed distance function). MonoSelfRecon follows an Autoencoder-based architecture, decodes voxel-SDF and a generalizable Neural Radiance Field (NeRF), which is used to guide voxel-SDF in self-supervision. We propose novel self-supervised losses, which not only support pure self-supervision, but can be used together with supervised signals to further boost supervised training. Our experiments show that "MonoSelfRecon" trained in pure self-supervision outperforms current best self-supervised indoor depth estimation models and is comparable to 3DR models trained in fully supervision with depth annotations. MonoSelfRecon is not restricted by specific model design, which can be used to any models with voxel-SDF for purely self-supervised manner.
Related papers
- DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features [65.8738034806085]
DistillNeRF is a self-supervised learning framework for understanding 3D environments in autonomous driving scenes.
Our method is a generalizable feedforward model that predicts a rich neural scene representation from sparse, single-frame multi-view camera inputs.
arXiv Detail & Related papers (2024-06-17T21:15:13Z) - SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction [77.15924044466976]
We propose SelfOcc to explore a self-supervised way to learn 3D occupancy using only video sequences.
We first transform the images into the 3D space (e.g., bird's eye view) to obtain 3D representation of the scene.
We can then render 2D images of previous and future frames as self-supervision signals to learn the 3D representations.
arXiv Detail & Related papers (2023-11-21T17:59:14Z) - MOHO: Learning Single-view Hand-held Object Reconstruction with
Multi-view Occlusion-Aware Supervision [75.38953287579616]
We present a novel framework to exploit Multi-view Occlusion-aware supervision from hand-object videos for Hand-held Object reconstruction.
We tackle two predominant challenges in such setting: hand-induced occlusion and object's self-occlusion.
Experiments on HO3D and DexYCB datasets demonstrate 2D-supervised MOHO gains superior results against 3D-supervised methods by a large margin.
arXiv Detail & Related papers (2023-10-18T03:57:06Z) - AutoRecon: Automated 3D Object Discovery and Reconstruction [41.60050228813979]
We propose a novel framework named AutoRecon for the automated discovery and reconstruction of an object from multi-view images.
We demonstrate that foreground objects can be robustly located and segmented from SfM point clouds by leveraging self-supervised 2D vision transformer features.
Experiments on the DTU, BlendedMVS and CO3D-V2 datasets demonstrate the effectiveness and robustness of AutoRecon.
arXiv Detail & Related papers (2023-05-15T17:16:46Z) - Self-Supervised Object Goal Navigation with In-Situ Finetuning [110.6053241629366]
This work builds an agent that builds self-supervised models of the world via exploration.
We identify a strong source of self-supervision that can train all components of an ObjectNav agent.
We show that our agent can perform competitively in the real world and simulation.
arXiv Detail & Related papers (2022-12-09T03:41:40Z) - MonoViT: Self-Supervised Monocular Depth Estimation with a Vision
Transformer [52.0699787446221]
We propose MonoViT, a framework combining the global reasoning enabled by ViT models with the flexibility of self-supervised monocular depth estimation.
By combining plain convolutions with Transformer blocks, our model can reason locally and globally, yielding depth prediction at a higher level of detail and accuracy.
arXiv Detail & Related papers (2022-08-06T16:54:45Z) - Monocular Depth Estimation through Virtual-world Supervision and
Real-world SfM Self-Supervision [0.0]
We perform monocular depth estimation by virtual-world supervision (MonoDEVS) and real-world SfM self-supervision.
Our MonoDEVSNet outperforms previous MDE CNNs trained on monocular and even stereo sequences.
arXiv Detail & Related papers (2021-03-22T22:33:49Z) - Monocular Depth Estimation with Self-supervised Instance Adaptation [138.0231868286184]
In robotics applications, multiple views of a scene may or may not be available, depend-ing on the actions of the robot.
We propose a new approach that extends any off-the-shelf self-supervised monocular depth reconstruction system to usemore than one image at test time.
arXiv Detail & Related papers (2020-04-13T08:32:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.