Consistent Direct Time-of-Flight Video Depth Super-Resolution
- URL: http://arxiv.org/abs/2211.08658v2
- Date: Wed, 3 May 2023 21:50:14 GMT
- Title: Consistent Direct Time-of-Flight Video Depth Super-Resolution
- Authors: Zhanghao Sun, Wei Ye, Jinhui Xiong, Gyeongmin Choe, Jialiang Wang,
Shuochen Su, Rakesh Ranjan
- Abstract summary: Direct time-of-flight (dToF) sensors are promising for next-generation on-device 3D sensing.
We propose the first multi-frame fusion scheme to mitigate the spatial ambiguity resulting from the low-resolution dToF imaging.
We introduce DyDToF, the first synthetic RGB-dToF video dataset that features dynamic objects and a realistic dToF simulator.
- Score: 9.173767380836852
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Direct time-of-flight (dToF) sensors are promising for next-generation
on-device 3D sensing. However, limited by manufacturing capabilities in a
compact module, the dToF data has a low spatial resolution (e.g., $\sim
20\times30$ for iPhone dToF), and it requires a super-resolution step before
being passed to downstream tasks. In this paper, we solve this super-resolution
problem by fusing the low-resolution dToF data with the corresponding
high-resolution RGB guidance. Unlike the conventional RGB-guided depth
enhancement approaches, which perform the fusion in a per-frame manner, we
propose the first multi-frame fusion scheme to mitigate the spatial ambiguity
resulting from the low-resolution dToF imaging. In addition, dToF sensors
provide unique depth histogram information for each local patch, and we
incorporate this dToF-specific feature in our network design to further
alleviate spatial ambiguity. To evaluate our models on complex dynamic indoor
environments and to provide a large-scale dToF sensor dataset, we introduce
DyDToF, the first synthetic RGB-dToF video dataset that features dynamic
objects and a realistic dToF simulator following the physical imaging process.
We believe the methods and dataset are beneficial to a broad community as dToF
depth sensing is becoming mainstream on mobile devices. Our code and data are
publicly available: https://github.com/facebookresearch/DVSR/
Related papers
- RGB Guided ToF Imaging System: A Survey of Deep Learning-based Methods [30.34690112905212]
Integrating an RGB camera into a ToF imaging system has become a significant technique for perceiving the real world.
This paper comprehensively reviews the works related to RGB guided ToF imaging, including network structures, learning strategies, evaluation metrics, benchmark datasets, and objective functions.
arXiv Detail & Related papers (2024-05-16T17:59:58Z) - FloatingFusion: Depth from ToF and Image-stabilized Stereo Cameras [37.812681878193914]
smartphones now have multimodal camera systems with time-of-flight (ToF) depth sensors and multiple color cameras.
producing accurate high-resolution depth is still challenging due to the low resolution and limited active illumination power of ToF sensors.
We propose an automatic calibration technique based on dense 2D/3D matching that can estimate camera parameters from a single snapshot.
arXiv Detail & Related papers (2022-10-06T09:57:09Z) - DELTAR: Depth Estimation from a Light-weight ToF Sensor and RGB Image [39.389538555506256]
We propose DELTAR, a novel method to empower light-weight ToF sensors with the capability of measuring high resolution and accurate depth.
As the core of DELTAR, a feature extractor customized for depth distribution and an attention-based neural architecture is proposed to fuse the information from the color and ToF domain efficiently.
Experiments show that our method produces more accurate depth than existing frameworks designed for depth completion and depth super-resolution and achieves on par performance with a commodity-level RGB-D sensor.
arXiv Detail & Related papers (2022-09-27T13:11:37Z) - Learning an Efficient Multimodal Depth Completion Model [11.740546882538142]
RGB image-guided sparse depth completion has attracted extensive attention recently, but still faces some problems.
The proposed method can outperform some state-of-the-art methods with a lightweight architecture.
The method also wins the championship in the MIPI2022 RGB+TOF depth completion challenge.
arXiv Detail & Related papers (2022-08-23T07:03:14Z) - Joint Learning of Salient Object Detection, Depth Estimation and Contour
Extraction [91.43066633305662]
We propose a novel multi-task and multi-modal filtered transformer (MMFT) network for RGB-D salient object detection (SOD)
Specifically, we unify three complementary tasks: depth estimation, salient object detection and contour estimation. The multi-task mechanism promotes the model to learn the task-aware features from the auxiliary tasks.
Experiments show that it not only significantly surpasses the depth-based RGB-D SOD methods on multiple datasets, but also precisely predicts a high-quality depth map and salient contour at the same time.
arXiv Detail & Related papers (2022-03-09T17:20:18Z) - EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic Segmentation [62.210091681352914]
We study multi-sensor fusion for 3D semantic segmentation for many applications, such as autonomous driving and robotics.
In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF)
We propose a two-stream network to extract features from the two modalities separately. The extracted features are fused by effective residual-based fusion modules.
arXiv Detail & Related papers (2021-06-21T10:47:26Z) - Middle-level Fusion for Lightweight RGB-D Salient Object Detection [81.43951906434175]
A novel lightweight RGB-D SOD model is presented in this paper.
With IMFF and L modules incorporated in the middle-level fusion structure, our proposed model has only 3.9M parameters and runs at 33 FPS.
The experimental results on several benchmark datasets verify the effectiveness and superiority of the proposed method over some state-of-the-art methods.
arXiv Detail & Related papers (2021-04-23T11:37:15Z) - RGB-D Local Implicit Function for Depth Completion of Transparent
Objects [43.238923881620494]
Majority of perception methods in robotics require depth information provided by RGB-D cameras.
Standard 3D sensors fail to capture depth of transparent objects due to refraction and absorption of light.
We present a novel framework that can complete missing depth given noisy RGB-D input.
arXiv Detail & Related papers (2021-04-01T17:00:04Z) - Self-Supervised Representation Learning for RGB-D Salient Object
Detection [93.17479956795862]
We use Self-Supervised Representation Learning to design two pretext tasks: the cross-modal auto-encoder and the depth-contour estimation.
Our pretext tasks require only a few and un RGB-D datasets to perform pre-training, which make the network capture rich semantic contexts.
For the inherent problem of cross-modal fusion in RGB-D SOD, we propose a multi-path fusion module.
arXiv Detail & Related papers (2021-01-29T09:16:06Z) - MobileSal: Extremely Efficient RGB-D Salient Object Detection [62.04876251927581]
This paper introduces a novel network, methodname, which focuses on efficient RGB-D salient object detection (SOD)
We propose an implicit depth restoration (IDR) technique to strengthen the feature representation capability of mobile networks for RGB-D SOD.
With IDR and CPR incorporated, methodnameperforms favorably against sArt methods on seven challenging RGB-D SOD datasets.
arXiv Detail & Related papers (2020-12-24T04:36:42Z) - A Single Stream Network for Robust and Real-time RGB-D Salient Object
Detection [89.88222217065858]
We design a single stream network to use the depth map to guide early fusion and middle fusion between RGB and depth.
This model is 55.5% lighter than the current lightest model and runs at a real-time speed of 32 FPS when processing a $384 times 384$ image.
arXiv Detail & Related papers (2020-07-14T04:40:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.