The Implicit Values of A Good Hand Shake: Handheld Multi-Frame Neural
Depth Refinement
- URL: http://arxiv.org/abs/2111.13738v1
- Date: Fri, 26 Nov 2021 20:24:07 GMT
- Title: The Implicit Values of A Good Hand Shake: Handheld Multi-Frame Neural
Depth Refinement
- Authors: Ilya Chugunov, Yuxuan Zhang, Zhihao Xia, Cecilia Zhang, Jiawen Chen,
and Felix Heide
- Abstract summary: We show how we can combine dense micro-baseline parallax cues with kilopixel LiDAR depth estimates during viewfinding.
The proposed method brings high-resolution depth estimates to 'point-and-shoot' tabletop photography and requires no additional hardware, artificial hand motion, or user interaction beyond the press of a button.
- Score: 25.637162990928676
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern smartphones can continuously stream multi-megapixel RGB images at
60~Hz, synchronized with high-quality 3D pose information and low-resolution
LiDAR-driven depth estimates. During a snapshot photograph, the natural
unsteadiness of the photographer's hands offers millimeter-scale variation in
camera pose, which we can capture along with RGB and depth in a circular
buffer. In this work we explore how, from a bundle of these measurements
acquired during viewfinding, we can combine dense micro-baseline parallax cues
with kilopixel LiDAR depth to distill a high-fidelity depth map. We take a
test-time optimization approach and train a coordinate MLP to output
photometrically and geometrically consistent depth estimates at the continuous
coordinates along the path traced by the photographer's natural hand shake. The
proposed method brings high-resolution depth estimates to 'point-and-shoot'
tabletop photography and requires no additional hardware, artificial hand
motion, or user interaction beyond the press of a button.
Related papers
- Cross-spectral Gated-RGB Stereo Depth Estimation [34.31592077757453]
Gated cameras flood-illuminate a scene and capture the time-gated impulse response of a scene.
We propose a novel stereo-depth estimation method that is capable of exploiting these multi-modal multi-view depth cues.
The proposed method achieves accurate depth at long ranges, outperforming the next best existing method by 39% for ranges of 100 to 220m in MAE on accumulated LiDAR ground-truth.
arXiv Detail & Related papers (2024-05-21T13:10:43Z) - Shakes on a Plane: Unsupervised Depth Estimation from Unstabilized
Photography [54.36608424943729]
We show that in a ''long-burst'', forty-two 12-megapixel RAW frames captured in a two-second sequence, there is enough parallax information from natural hand tremor alone to recover high-quality scene depth.
We devise a test-time optimization approach that fits a neural RGB-D representation to long-burst data and simultaneously estimates scene depth and camera motion.
arXiv Detail & Related papers (2022-12-22T18:54:34Z) - FloatingFusion: Depth from ToF and Image-stabilized Stereo Cameras [37.812681878193914]
smartphones now have multimodal camera systems with time-of-flight (ToF) depth sensors and multiple color cameras.
producing accurate high-resolution depth is still challenging due to the low resolution and limited active illumination power of ToF sensors.
We propose an automatic calibration technique based on dense 2D/3D matching that can estimate camera parameters from a single snapshot.
arXiv Detail & Related papers (2022-10-06T09:57:09Z) - Joint Learning of Salient Object Detection, Depth Estimation and Contour
Extraction [91.43066633305662]
We propose a novel multi-task and multi-modal filtered transformer (MMFT) network for RGB-D salient object detection (SOD)
Specifically, we unify three complementary tasks: depth estimation, salient object detection and contour estimation. The multi-task mechanism promotes the model to learn the task-aware features from the auxiliary tasks.
Experiments show that it not only significantly surpasses the depth-based RGB-D SOD methods on multiple datasets, but also precisely predicts a high-quality depth map and salient contour at the same time.
arXiv Detail & Related papers (2022-03-09T17:20:18Z) - DEVO: Depth-Event Camera Visual Odometry in Challenging Conditions [30.892930944644853]
We present a novel real-time visual odometry framework for a stereo setup of a depth and high-resolution event camera.
Our framework balances accuracy and robustness against computational efficiency towards strong performance in challenging scenarios.
arXiv Detail & Related papers (2022-02-05T13:46:47Z) - Robust Consistent Video Depth Estimation [65.53308117778361]
We present an algorithm for estimating consistent dense depth maps and camera poses from a monocular video.
Our algorithm combines two complementary techniques: (1) flexible deformation-splines for low-frequency large-scale alignment and (2) geometry-aware depth filtering for high-frequency alignment of fine depth details.
In contrast to prior approaches, our method does not require camera poses as input and achieves robust reconstruction for challenging hand-held cell phone captures containing a significant amount of noise, shake, motion blur, and rolling shutter deformations.
arXiv Detail & Related papers (2020-12-10T18:59:48Z) - Deep 3D Capture: Geometry and Reflectance from Sparse Multi-View Images [59.906948203578544]
We introduce a novel learning-based method to reconstruct the high-quality geometry and complex, spatially-varying BRDF of an arbitrary object.
We first estimate per-view depth maps using a deep multi-view stereo network.
These depth maps are used to coarsely align the different views.
We propose a novel multi-view reflectance estimation network architecture.
arXiv Detail & Related papers (2020-03-27T21:28:54Z) - Multi-View Photometric Stereo: A Robust Solution and Benchmark Dataset
for Spatially Varying Isotropic Materials [65.95928593628128]
We present a method to capture both 3D shape and spatially varying reflectance with a multi-view photometric stereo technique.
Our algorithm is suitable for perspective cameras and nearby point light sources.
arXiv Detail & Related papers (2020-01-18T12:26:22Z) - Video Depth Estimation by Fusing Flow-to-Depth Proposals [65.24533384679657]
We present an approach with a differentiable flow-to-depth layer for video depth estimation.
The model consists of a flow-to-depth layer, a camera pose refinement module, and a depth fusion network.
Our approach outperforms state-of-the-art depth estimation methods, and has reasonable cross dataset generalization capability.
arXiv Detail & Related papers (2019-12-30T10:45:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.