Real-time dense 3D Reconstruction from monocular video data captured by
low-cost UAVs
- URL: http://arxiv.org/abs/2104.10515v1
- Date: Wed, 21 Apr 2021 13:12:17 GMT
- Title: Real-time dense 3D Reconstruction from monocular video data captured by
low-cost UAVs
- Authors: Max Hermann, Boitumelo Ruf, Martin Weinmann
- Abstract summary: Real-time 3D reconstruction enables fast dense mapping of the environment which benefits numerous applications, such as navigation or live evaluation of an emergency.
In contrast to most real-time capable approaches, our approach does not need an explicit depth sensor.
By exploiting the self-motion of the unmanned aerial vehicle (UAV) flying with oblique view around buildings, we estimate both camera trajectory and depth for selected images with enough novel content.
- Score: 0.3867363075280543
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Real-time 3D reconstruction enables fast dense mapping of the environment
which benefits numerous applications, such as navigation or live evaluation of
an emergency. In contrast to most real-time capable approaches, our approach
does not need an explicit depth sensor. Instead, we only rely on a video stream
from a camera and its intrinsic calibration. By exploiting the self-motion of
the unmanned aerial vehicle (UAV) flying with oblique view around buildings, we
estimate both camera trajectory and depth for selected images with enough novel
content. To create a 3D model of the scene, we rely on a three-stage processing
chain. First, we estimate the rough camera trajectory using a simultaneous
localization and mapping (SLAM) algorithm. Once a suitable constellation is
found, we estimate depth for local bundles of images using a Multi-View Stereo
(MVS) approach and then fuse this depth into a global surfel-based model. For
our evaluation, we use 55 video sequences with diverse settings, consisting of
both synthetic and real scenes. We evaluate not only the generated
reconstruction but also the intermediate products and achieve competitive
results both qualitatively and quantitatively. At the same time, our method can
keep up with a 30 fps video for a resolution of 768x448 pixels.
Related papers
- DUSt3R: Geometric 3D Vision Made Easy [8.471330244002564]
We introduce DUSt3R, a novel paradigm for Dense and Unconstrained Stereo 3D Reconstruction of arbitrary image collections.
We show that this formulation smoothly unifies the monocular and binocular reconstruction cases.
Our formulation directly provides a 3D model of the scene as well as depth information, but interestingly, we can seamlessly recover from it, pixel matches, relative and absolute camera.
arXiv Detail & Related papers (2023-12-21T18:52:14Z) - R3D3: Dense 3D Reconstruction of Dynamic Scenes from Multiple Cameras [106.52409577316389]
R3D3 is a multi-camera system for dense 3D reconstruction and ego-motion estimation.
Our approach exploits spatial-temporal information from multiple cameras, and monocular depth refinement.
We show that this design enables a dense, consistent 3D reconstruction of challenging, dynamic outdoor environments.
arXiv Detail & Related papers (2023-08-28T17:13:49Z) - FlowCam: Training Generalizable 3D Radiance Fields without Camera Poses
via Pixel-Aligned Scene Flow [26.528667940013598]
Reconstruction of 3D neural fields from posed images has emerged as a promising method for self-supervised representation learning.
Key challenge preventing the deployment of these 3D scene learners on large-scale video data is their dependence on precise camera poses from structure-from-motion.
We propose a method that jointly reconstructs camera poses and 3D neural scene representations online and in a single forward pass.
arXiv Detail & Related papers (2023-05-31T20:58:46Z) - Scene-Aware 3D Multi-Human Motion Capture from a Single Camera [83.06768487435818]
We consider the problem of estimating the 3D position of multiple humans in a scene as well as their body shape and articulation from a single RGB video recorded with a static camera.
We leverage recent advances in computer vision using large-scale pre-trained models for a variety of modalities, including 2D body joints, joint angles, normalized disparity maps, and human segmentation masks.
In particular, we estimate the scene depth and unique person scale from normalized disparity predictions using the 2D body joints and joint angles.
arXiv Detail & Related papers (2023-01-12T18:01:28Z) - Shakes on a Plane: Unsupervised Depth Estimation from Unstabilized
Photography [54.36608424943729]
We show that in a ''long-burst'', forty-two 12-megapixel RAW frames captured in a two-second sequence, there is enough parallax information from natural hand tremor alone to recover high-quality scene depth.
We devise a test-time optimization approach that fits a neural RGB-D representation to long-burst data and simultaneously estimates scene depth and camera motion.
arXiv Detail & Related papers (2022-12-22T18:54:34Z) - Towards 3D Scene Reconstruction from Locally Scale-Aligned Monocular
Video Depth [90.33296913575818]
In some video-based scenarios such as video depth estimation and 3D scene reconstruction from a video, the unknown scale and shift residing in per-frame prediction may cause the depth inconsistency.
We propose a locally weighted linear regression method to recover the scale and shift with very sparse anchor points.
Our method can boost the performance of existing state-of-the-art approaches by 50% at most over several zero-shot benchmarks.
arXiv Detail & Related papers (2022-02-03T08:52:54Z) - Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo [71.59494156155309]
Existing approaches for multi-view 3D pose estimation explicitly establish cross-view correspondences to group 2D pose detections from multiple camera views.
We present our multi-view 3D pose estimation approach based on plane sweep stereo to jointly address the cross-view fusion and 3D pose reconstruction in a single shot.
arXiv Detail & Related papers (2021-04-06T03:49:35Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.