Decomposed Guided Dynamic Filters for Efficient RGB-Guided Depth
Completion
- URL: http://arxiv.org/abs/2309.02043v1
- Date: Tue, 5 Sep 2023 08:37:58 GMT
- Title: Decomposed Guided Dynamic Filters for Efficient RGB-Guided Depth
Completion
- Authors: Yufei Wang, Yuxin Mao, Qi Liu, Yuchao Dai
- Abstract summary: RGB-guided depth completion aims at predicting dense depth maps from sparse depth measurements and corresponding RGB images.
Guided dynamic filters generate spatially-variant depth-wise separable convolutional filters from RGB features to guide depth features.
We propose to decompose the guided dynamic filters into a spatially-shared component multiplied by content-adaptive adaptors at each spatial location.
- Score: 46.04264366475848
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: RGB-guided depth completion aims at predicting dense depth maps from sparse
depth measurements and corresponding RGB images, where how to effectively and
efficiently exploit the multi-modal information is a key issue. Guided dynamic
filters, which generate spatially-variant depth-wise separable convolutional
filters from RGB features to guide depth features, have been proven to be
effective in this task. However, the dynamically generated filters require
massive model parameters, computational costs and memory footprints when the
number of feature channels is large. In this paper, we propose to decompose the
guided dynamic filters into a spatially-shared component multiplied by
content-adaptive adaptors at each spatial location. Based on the proposed idea,
we introduce two decomposition schemes A and B, which decompose the filters by
splitting the filter structure and using spatial-wise attention, respectively.
The decomposed filters not only maintain the favorable properties of guided
dynamic filters as being content-dependent and spatially-variant, but also
reduce model parameters and hardware costs, as the learned adaptors are
decoupled with the number of feature channels. Extensive experimental results
demonstrate that the methods using our schemes outperform state-of-the-art
methods on the KITTI dataset, and rank 1st and 2nd on the KITTI benchmark at
the time of submission. Meanwhile, they also achieve comparable performance on
the NYUv2 dataset. In addition, our proposed methods are general and could be
employed as plug-and-play feature fusion blocks in other multi-modal fusion
tasks such as RGB-D salient object detection.
Related papers
- CasDyF-Net: Image Dehazing via Cascaded Dynamic Filters [0.0]
Image dehazing aims to restore image clarity and visual quality by reducing atmospheric scattering and absorption effects.
Inspired by dynamic filtering, we propose using cascaded dynamic filters to create a multi-branch network.
Experiments on RESIDE, Haze4K, and O-Haze datasets validate our method's effectiveness.
arXiv Detail & Related papers (2024-09-13T03:20:38Z) - Joint Learning of Salient Object Detection, Depth Estimation and Contour
Extraction [91.43066633305662]
We propose a novel multi-task and multi-modal filtered transformer (MMFT) network for RGB-D salient object detection (SOD)
Specifically, we unify three complementary tasks: depth estimation, salient object detection and contour estimation. The multi-task mechanism promotes the model to learn the task-aware features from the auxiliary tasks.
Experiments show that it not only significantly surpasses the depth-based RGB-D SOD methods on multiple datasets, but also precisely predicts a high-quality depth map and salient contour at the same time.
arXiv Detail & Related papers (2022-03-09T17:20:18Z) - Adaptive Convolutions with Per-pixel Dynamic Filter Atom [24.691793951360914]
We introduce scalable dynamic convolutions with per-pixel adapted filters.
As plug-and-play replacements to convolutional layers, the introduced adaptive convolutions with per-pixel dynamic atoms enable explicit modeling of intra-image variance.
We present experiments to show that, the proposed method delivers comparable or even better performance across tasks.
arXiv Detail & Related papers (2021-08-17T22:04:10Z) - MFGNet: Dynamic Modality-Aware Filter Generation for RGB-T Tracking [72.65494220685525]
We propose a new dynamic modality-aware filter generation module (named MFGNet) to boost the message communication between visible and thermal data.
We generate dynamic modality-aware filters with two independent networks. The visible and thermal filters will be used to conduct a dynamic convolutional operation on their corresponding input feature maps respectively.
To address issues caused by heavy occlusion, fast motion, and out-of-view, we propose to conduct a joint local and global search by exploiting a new direction-aware target-driven attention mechanism.
arXiv Detail & Related papers (2021-07-22T03:10:51Z) - Decoupled Dynamic Filter Networks [85.38058820176047]
We propose the Decoupled Dynamic Filter (DDF) that can simultaneously tackle both of these shortcomings.
Inspired by recent advances in attention, DDF decouples a depth-wise dynamic filter into spatial and channel dynamic filters.
We observe a significant boost in performance when replacing standard convolution with DDF in classification networks.
arXiv Detail & Related papers (2021-04-29T04:55:33Z) - Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z) - Hierarchical Dynamic Filtering Network for RGB-D Salient Object
Detection [91.43066633305662]
The main purpose of RGB-D salient object detection (SOD) is how to better integrate and utilize cross-modal fusion information.
In this paper, we explore these issues from a new perspective.
We implement a kind of more flexible and efficient multi-scale cross-modal feature processing.
arXiv Detail & Related papers (2020-07-13T07:59:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.