MFGNet: Dynamic Modality-Aware Filter Generation for RGB-T Tracking
- URL: http://arxiv.org/abs/2107.10433v1
- Date: Thu, 22 Jul 2021 03:10:51 GMT
- Title: MFGNet: Dynamic Modality-Aware Filter Generation for RGB-T Tracking
- Authors: Xiao Wang, Xiujun Shu, Shiliang Zhang, Bo Jiang, Yaowei Wang, Yonghong
Tian, Feng Wu
- Abstract summary: We propose a new dynamic modality-aware filter generation module (named MFGNet) to boost the message communication between visible and thermal data.
We generate dynamic modality-aware filters with two independent networks. The visible and thermal filters will be used to conduct a dynamic convolutional operation on their corresponding input feature maps respectively.
To address issues caused by heavy occlusion, fast motion, and out-of-view, we propose to conduct a joint local and global search by exploiting a new direction-aware target-driven attention mechanism.
- Score: 72.65494220685525
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many RGB-T trackers attempt to attain robust feature representation by
utilizing an adaptive weighting scheme (or attention mechanism). Different from
these works, we propose a new dynamic modality-aware filter generation module
(named MFGNet) to boost the message communication between visible and thermal
data by adaptively adjusting the convolutional kernels for various input images
in practical tracking. Given the image pairs as input, we first encode their
features with the backbone network. Then, we concatenate these feature maps and
generate dynamic modality-aware filters with two independent networks. The
visible and thermal filters will be used to conduct a dynamic convolutional
operation on their corresponding input feature maps respectively. Inspired by
residual connection, both the generated visible and thermal feature maps will
be summarized with input feature maps. The augmented feature maps will be fed
into the RoI align module to generate instance-level features for subsequent
classification. To address issues caused by heavy occlusion, fast motion, and
out-of-view, we propose to conduct a joint local and global search by
exploiting a new direction-aware target-driven attention mechanism. The spatial
and temporal recurrent neural network is used to capture the direction-aware
context for accurate global attention prediction. Extensive experiments on
three large-scale RGB-T tracking benchmark datasets validated the effectiveness
of our proposed algorithm. The project page of this paper is available at
https://sites.google.com/view/mfgrgbttrack/.
Related papers
- CasDyF-Net: Image Dehazing via Cascaded Dynamic Filters [0.0]
Image dehazing aims to restore image clarity and visual quality by reducing atmospheric scattering and absorption effects.
Inspired by dynamic filtering, we propose using cascaded dynamic filters to create a multi-branch network.
Experiments on RESIDE, Haze4K, and O-Haze datasets validate our method's effectiveness.
arXiv Detail & Related papers (2024-09-13T03:20:38Z) - Coordinate-Aware Thermal Infrared Tracking Via Natural Language Modeling [16.873697155916997]
NLMTrack is a coordinate-aware thermal infrared tracking model.
NLMTrack applies an encoder that unifies feature extraction and feature fusion.
Experiments show that NLMTrack achieves state-of-the-art performance on multiple benchmarks.
arXiv Detail & Related papers (2024-07-11T08:06:31Z) - Decomposed Guided Dynamic Filters for Efficient RGB-Guided Depth
Completion [46.04264366475848]
RGB-guided depth completion aims at predicting dense depth maps from sparse depth measurements and corresponding RGB images.
Guided dynamic filters generate spatially-variant depth-wise separable convolutional filters from RGB features to guide depth features.
We propose to decompose the guided dynamic filters into a spatially-shared component multiplied by content-adaptive adaptors at each spatial location.
arXiv Detail & Related papers (2023-09-05T08:37:58Z) - Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient
Object Detection [67.33924278729903]
In this work, we propose Dual Swin-Transformer based Mutual Interactive Network.
We adopt Swin-Transformer as the feature extractor for both RGB and depth modality to model the long-range dependencies in visual inputs.
Comprehensive experiments on five standard RGB-D SOD benchmark datasets demonstrate the superiority of the proposed DTMINet method.
arXiv Detail & Related papers (2022-06-07T08:35:41Z) - Correlation-Aware Deep Tracking [83.51092789908677]
We propose a novel target-dependent feature network inspired by the self-/cross-attention scheme.
Our network deeply embeds cross-image feature correlation in multiple layers of the feature network.
Our model can be flexibly pre-trained on abundant unpaired images, leading to notably faster convergence than the existing methods.
arXiv Detail & Related papers (2022-03-03T11:53:54Z) - Dynamic Graph Convolutional Recurrent Network for Traffic Prediction:
Benchmark and Solution [18.309299822858243]
We propose a novel traffic prediction framework, named Dynamic Graph Contemporalal Recurrent Network (DGCRN)
In DGCRN, hyper-networks are designed to leverage and extract dynamic characteristics from node attributes.
We are the first to employ a generation method to model fine iteration of dynamic graph at each time step.
arXiv Detail & Related papers (2021-04-30T11:25:43Z) - DS-Net: Dynamic Spatiotemporal Network for Video Salient Object
Detection [78.04869214450963]
We propose a novel dynamic temporal-temporal network (DSNet) for more effective fusion of temporal and spatial information.
We show that the proposed method achieves superior performance than state-of-the-art algorithms.
arXiv Detail & Related papers (2020-12-09T06:42:30Z) - Adaptive Context-Aware Multi-Modal Network for Depth Completion [107.15344488719322]
We propose to adopt the graph propagation to capture the observed spatial contexts.
We then apply the attention mechanism on the propagation, which encourages the network to model the contextual information adaptively.
Finally, we introduce the symmetric gated fusion strategy to exploit the extracted multi-modal features effectively.
Our model, named Adaptive Context-Aware Multi-Modal Network (ACMNet), achieves the state-of-the-art performance on two benchmarks.
arXiv Detail & Related papers (2020-08-25T06:00:06Z) - Hierarchical Dynamic Filtering Network for RGB-D Salient Object
Detection [91.43066633305662]
The main purpose of RGB-D salient object detection (SOD) is how to better integrate and utilize cross-modal fusion information.
In this paper, we explore these issues from a new perspective.
We implement a kind of more flexible and efficient multi-scale cross-modal feature processing.
arXiv Detail & Related papers (2020-07-13T07:59:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.