Towards 3D Scene Reconstruction from Locally Scale-Aligned Monocular
Video Depth
- URL: http://arxiv.org/abs/2202.01470v4
- Date: Thu, 6 Apr 2023 03:08:03 GMT
- Title: Towards 3D Scene Reconstruction from Locally Scale-Aligned Monocular
Video Depth
- Authors: Guangkai Xu, Wei Yin, Hao Chen, Chunhua Shen, Kai Cheng, Feng Wu, Feng
Zhao
- Abstract summary: In some video-based scenarios such as video depth estimation and 3D scene reconstruction from a video, the unknown scale and shift residing in per-frame prediction may cause the depth inconsistency.
We propose a locally weighted linear regression method to recover the scale and shift with very sparse anchor points.
Our method can boost the performance of existing state-of-the-art approaches by 50% at most over several zero-shot benchmarks.
- Score: 90.33296913575818
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Existing monocular depth estimation methods have achieved excellent
robustness in diverse scenes, but they can only retrieve affine-invariant
depth, up to an unknown scale and shift. However, in some video-based scenarios
such as video depth estimation and 3D scene reconstruction from a video, the
unknown scale and shift residing in per-frame prediction may cause the depth
inconsistency. To solve this problem, we propose a locally weighted linear
regression method to recover the scale and shift with very sparse anchor
points, which ensures the scale consistency along consecutive frames. Extensive
experiments show that our method can boost the performance of existing
state-of-the-art approaches by 50% at most over several zero-shot benchmarks.
Besides, we merge over 6.3 million RGBD images to train strong and robust depth
models. Our produced ResNet50-backbone model even outperforms the
state-of-the-art DPT ViT-Large model. Combining with geometry-based
reconstruction methods, we formulate a new dense 3D scene reconstruction
pipeline, which benefits from both the scale consistency of sparse points and
the robustness of monocular methods. By performing the simple per-frame
prediction over a video, the accurate 3D scene shape can be recovered.
Related papers
- FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models [67.96827539201071]
We propose a novel test-time optimization approach for 3D scene reconstruction.
Our method achieves state-of-the-art cross-dataset reconstruction on five zero-shot testing datasets.
arXiv Detail & Related papers (2023-08-10T17:55:02Z) - Towards Accurate Reconstruction of 3D Scene Shape from A Single
Monocular Image [91.71077190961688]
We propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image.
We then exploits 3D point cloud data to predict the depth shift and the camera's focal length that allow us to recover 3D scene shapes.
We test our depth model on nine unseen datasets and achieve state-of-the-art performance on zero-shot evaluation.
arXiv Detail & Related papers (2022-08-28T16:20:14Z) - Unbiased 4D: Monocular 4D Reconstruction with a Neural Deformation Model [76.64071133839862]
Capturing general deforming scenes from monocular RGB video is crucial for many computer graphics and vision applications.
Our method, Ub4D, handles large deformations, performs shape completion in occluded regions, and can operate on monocular RGB videos directly by using differentiable volume rendering.
Results on our new dataset, which will be made publicly available, demonstrate a clear improvement over the state of the art in terms of surface reconstruction accuracy and robustness to large deformations.
arXiv Detail & Related papers (2022-06-16T17:59:54Z) - Real-time dense 3D Reconstruction from monocular video data captured by
low-cost UAVs [0.3867363075280543]
Real-time 3D reconstruction enables fast dense mapping of the environment which benefits numerous applications, such as navigation or live evaluation of an emergency.
In contrast to most real-time capable approaches, our approach does not need an explicit depth sensor.
By exploiting the self-motion of the unmanned aerial vehicle (UAV) flying with oblique view around buildings, we estimate both camera trajectory and depth for selected images with enough novel content.
arXiv Detail & Related papers (2021-04-21T13:12:17Z) - Learning monocular 3D reconstruction of articulated categories from
motion [39.811816510186475]
Video self-supervision forces the consistency of consecutive 3D reconstructions by a motion-based cycle loss.
We introduce an interpretable model of 3D template deformations that controls a 3D surface through the displacement of a small number of local, learnable handles.
We obtain state-of-the-art reconstructions with diverse shapes, viewpoints and textures for multiple articulated object categories.
arXiv Detail & Related papers (2021-03-30T13:50:27Z) - Learning to Recover 3D Scene Shape from a Single Image [98.20106822614392]
We propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image.
We then use 3D point cloud encoders to predict the missing depth shift and focal length that allow us to recover a realistic 3D scene shape.
arXiv Detail & Related papers (2020-12-17T02:35:13Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.