SVDC: Consistent Direct Time-of-Flight Video Depth Completion with Frequency Selective Fusion
- URL: http://arxiv.org/abs/2503.01257v1
- Date: Mon, 03 Mar 2025 07:32:25 GMT
- Title: SVDC: Consistent Direct Time-of-Flight Video Depth Completion with Frequency Selective Fusion
- Authors: Xuan Zhu, Jijun Xiang, Xianqi Wang, Longliang Liu, Yu Wang, Hong Zhang, Fei Guo, Xin Yang,
- Abstract summary: Lightweight direct Time-of-Flight (dToF) sensors are ideal for 3D sensing on mobile devices.<n>In this paper, we propose a novel video depth completion method, called SVDC, by fusing the sparse dToF data with the corresponding RGB guidance.<n>Our method employs a multi-frame fusion scheme to mitigate the spatial ambiguity resulting from the sparse dToF imaging.
- Score: 8.23422182642083
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Lightweight direct Time-of-Flight (dToF) sensors are ideal for 3D sensing on mobile devices. However, due to the manufacturing constraints of compact devices and the inherent physical principles of imaging, dToF depth maps are sparse and noisy. In this paper, we propose a novel video depth completion method, called SVDC, by fusing the sparse dToF data with the corresponding RGB guidance. Our method employs a multi-frame fusion scheme to mitigate the spatial ambiguity resulting from the sparse dToF imaging. Misalignment between consecutive frames during multi-frame fusion could cause blending between object edges and the background, which results in a loss of detail. To address this, we introduce an adaptive frequency selective fusion (AFSF) module, which automatically selects convolution kernel sizes to fuse multi-frame features. Our AFSF utilizes a channel-spatial enhancement attention (CSEA) module to enhance features and generates an attention map as fusion weights. The AFSF ensures edge detail recovery while suppressing high-frequency noise in smooth regions. To further enhance temporal consistency, We propose a cross-window consistency loss to ensure consistent predictions across different windows, effectively reducing flickering. Our proposed SVDC achieves optimal accuracy and consistency on the TartanAir and Dynamic Replica datasets. Code is available at https://github.com/Lan1eve/SVDC.
Related papers
- ToF-Splatting: Dense SLAM using Sparse Time-of-Flight Depth and Multi-Frame Integration [40.16200204154956]
We propose ToF-Splatting, the first 3D Gaussian Splatting-based SLAM pipeline tailored for using effectively very sparse ToF input data.
Our approach improves upon the state of the art by introducing a multi-frame integration module, which produces dense depth maps by merging cues from extremely sparse ToF depth, monocular color, and multi-view geometry.
arXiv Detail & Related papers (2025-04-23T09:19:43Z) - FUSE: Label-Free Image-Event Joint Monocular Depth Estimation via Frequency-Decoupled Alignment and Degradation-Robust Fusion [63.87313550399871]
Image-event joint depth estimation methods leverage complementary modalities for robust perception, yet face challenges in generalizability.
We propose Self-supervised Transfer (PST) and FrequencyDe-coupled Fusion module (FreDF)
PST establishes cross-modal knowledge transfer through latent space alignment with image foundation models.
FreDF explicitly decouples high-frequency edge features from low-frequency structural components, resolving modality-specific frequency mismatches.
arXiv Detail & Related papers (2025-03-25T15:04:53Z) - Frequency-aware Feature Fusion for Dense Image Prediction [99.85757278772262]
We propose Frequency-Aware Feature Fusion (FreqFusion) for dense image prediction tasks.
FreqFusion integrates an Adaptive Low-Pass Filter (ALPF) generator, an offset generator, and an Adaptive High-Pass Filter (AHPF) generator.
Comprehensive visualization and quantitative analysis demonstrate that FreqFusion effectively improves feature consistency and sharpens object boundaries.
arXiv Detail & Related papers (2024-08-23T07:30:34Z) - FlatFusion: Delving into Details of Sparse Transformer-based Camera-LiDAR Fusion for Autonomous Driving [63.96049803915402]
The integration of data from diverse sensor modalities constitutes a prevalent methodology within the ambit of autonomous driving scenarios.
Recent advancements in efficient point cloud transformers have underscored the efficacy of integrating information in sparse formats.
In this paper, we conduct a comprehensive exploration of design choices for Transformer-based sparse cameraLiDAR fusion.
arXiv Detail & Related papers (2024-08-13T11:46:32Z) - Deformable Feature Alignment and Refinement for Moving Infrared Dim-small Target Detection [17.765101100010224]
We propose a Deformable Feature Alignment and Refinement (DFAR) method based on deformable convolution to explicitly use motion context in both the training and inference stages.
The proposed DFAR method achieves the state-of-the-art performance on two benchmark datasets including DAUB and IRDST.
arXiv Detail & Related papers (2024-07-10T00:42:25Z) - SSDiff: Spatial-spectral Integrated Diffusion Model for Remote Sensing Pansharpening [14.293042131263924]
We introduce a spatial-spectral integrated diffusion model for the remote sensing pansharpening task, called SSDiff.
SSDiff considers the pansharpening process as the fusion process of spatial and spectral components from the perspective of subspace decomposition.
arXiv Detail & Related papers (2024-04-17T16:30:56Z) - CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for
Multi-Modality Image Fusion [138.40422469153145]
We propose a novel Correlation-Driven feature Decomposition Fusion (CDDFuse) network.
We show that CDDFuse achieves promising results in multiple fusion tasks, including infrared-visible image fusion and medical image fusion.
arXiv Detail & Related papers (2022-11-26T02:40:28Z) - Consistent Direct Time-of-Flight Video Depth Super-Resolution [9.173767380836852]
Direct time-of-flight (dToF) sensors are promising for next-generation on-device 3D sensing.
We propose the first multi-frame fusion scheme to mitigate the spatial ambiguity resulting from the low-resolution dToF imaging.
We introduce DyDToF, the first synthetic RGB-dToF video dataset that features dynamic objects and a realistic dToF simulator.
arXiv Detail & Related papers (2022-11-16T04:16:20Z) - EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic Segmentation [62.210091681352914]
We study multi-sensor fusion for 3D semantic segmentation for many applications, such as autonomous driving and robotics.
In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF)
We propose a two-stream network to extract features from the two modalities separately. The extracted features are fused by effective residual-based fusion modules.
arXiv Detail & Related papers (2021-06-21T10:47:26Z) - Temp-Frustum Net: 3D Object Detection with Temporal Fusion [0.0]
3D object detection is a core component of automated driving systems.
Frame-by-frame 3D object detection suffers from noise, field-of-view obstruction, and sparsity.
We propose a novel Temporal Fusion Module to mitigate these problems.
arXiv Detail & Related papers (2021-04-25T09:08:14Z) - Volumetric Propagation Network: Stereo-LiDAR Fusion for Long-Range Depth
Estimation [81.08111209632501]
We propose a geometry-aware stereo-LiDAR fusion network for long-range depth estimation.
We exploit sparse and accurate point clouds as a cue for guiding correspondences of stereo images in a unified 3D volume space.
Our network achieves state-of-the-art performance on the KITTI and the Virtual- KITTI datasets.
arXiv Detail & Related papers (2021-03-24T03:24:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.