RGBT Tracking via Multi-Adapter Network with Hierarchical Divergence
Loss
- URL: http://arxiv.org/abs/2011.07189v3
- Date: Fri, 4 Jun 2021 06:53:08 GMT
- Title: RGBT Tracking via Multi-Adapter Network with Hierarchical Divergence
Loss
- Authors: Andong Lu, Chenglong Li, Yuqing Yan, Jin Tang, and Bin Luo
- Abstract summary: We propose a novel multi-adapter network to jointly perform modality-shared, modality-specific and instance-aware target representation learning.
Experiments on two RGBT tracking benchmark datasets demonstrate the outstanding performance of the proposed tracker.
- Score: 37.99375824040946
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: RGBT tracking has attracted increasing attention since RGB and thermal
infrared data have strong complementary advantages, which could make trackers
all-day and all-weather work. However, how to effectively represent RGBT data
for visual tracking remains unstudied well. Existing works usually focus on
extracting modality-shared or modality-specific information, but the potentials
of these two cues are not well explored and exploited in RGBT tracking. In this
paper, we propose a novel multi-adapter network to jointly perform
modality-shared, modality-specific and instance-aware target representation
learning for RGBT tracking. To this end, we design three kinds of adapters
within an end-to-end deep learning framework. In specific, we use the modified
VGG-M as the generality adapter to extract the modality-shared target
representations.To extract the modality-specific features while reducing the
computational complexity, we design a modality adapter, which adds a small
block to the generality adapter in each layer and each modality in a parallel
manner. Such a design could learn multilevel modality-specific representations
with a modest number of parameters as the vast majority of parameters are
shared with the generality adapter. We also design instance adapter to capture
the appearance properties and temporal variations of a certain target.
Moreover, to enhance the shared and specific features, we employ the loss of
multiple kernel maximum mean discrepancy to measure the distribution divergence
of different modal features and integrate it into each layer for more robust
representation learning. Extensive experiments on two RGBT tracking benchmark
datasets demonstrate the outstanding performance of the proposed tracker
against the state-of-the-art methods.
Related papers
- Bi-directional Adapter for Multi-modal Tracking [67.01179868400229]
We propose a novel multi-modal visual prompt tracking model based on a universal bi-directional adapter.
We develop a simple but effective light feature adapter to transfer modality-specific information from one modality to another.
Our model achieves superior tracking performance in comparison with both the full fine-tuning methods and the prompt learning-based methods.
arXiv Detail & Related papers (2023-12-17T05:27:31Z) - Single-Model and Any-Modality for Video Object Tracking [85.83753760853142]
We introduce Un-Track, a Unified Tracker of a single set of parameters for any modality.
To handle any modality, our method learns their common latent space through low-rank factorization and reconstruction techniques.
Our Un-Track achieves +8.1 absolute F-score gain, on the DepthTrack dataset, by introducing only +2.14 (over 21.50) GFLOPs with +6.6M (over 93M) parameters.
arXiv Detail & Related papers (2023-11-27T14:17:41Z) - General-Purpose Multimodal Transformer meets Remote Sensing Semantic
Segmentation [35.100738362291416]
Multimodal AI seeks to exploit complementary data sources, particularly for complex tasks like semantic segmentation.
Recent trends in general-purpose multimodal networks have shown great potential to achieve state-of-the-art performance.
We propose a UNet-inspired module that employs 3D convolution to encode vital local information and learn cross-modal features simultaneously.
arXiv Detail & Related papers (2023-07-07T04:58:34Z) - RGB-T Tracking Based on Mixed Attention [5.151994214135177]
RGB-T tracking involves the use of images from both visible and thermal modalities.
An RGB-T tracker based on mixed attention mechanism to achieve complementary fusion of modalities is proposed in this paper.
arXiv Detail & Related papers (2023-04-09T15:59:41Z) - Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient
Object Detection [67.33924278729903]
In this work, we propose Dual Swin-Transformer based Mutual Interactive Network.
We adopt Swin-Transformer as the feature extractor for both RGB and depth modality to model the long-range dependencies in visual inputs.
Comprehensive experiments on five standard RGB-D SOD benchmark datasets demonstrate the superiority of the proposed DTMINet method.
arXiv Detail & Related papers (2022-06-07T08:35:41Z) - Temporal Aggregation for Adaptive RGBT Tracking [14.00078027541162]
We propose an RGBT tracker which takes clues into account for robust appearance model learning.
Unlike most existing RGBT trackers that implement object tracking tasks with only spatial information included, temporal information is further considered in this method.
arXiv Detail & Related papers (2022-01-22T02:31:56Z) - Specificity-preserving RGB-D Saliency Detection [103.3722116992476]
We propose a specificity-preserving network (SP-Net) for RGB-D saliency detection.
Two modality-specific networks and a shared learning network are adopted to generate individual and shared saliency maps.
Experiments on six benchmark datasets demonstrate that our SP-Net outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2021-08-18T14:14:22Z) - MFGNet: Dynamic Modality-Aware Filter Generation for RGB-T Tracking [72.65494220685525]
We propose a new dynamic modality-aware filter generation module (named MFGNet) to boost the message communication between visible and thermal data.
We generate dynamic modality-aware filters with two independent networks. The visible and thermal filters will be used to conduct a dynamic convolutional operation on their corresponding input feature maps respectively.
To address issues caused by heavy occlusion, fast motion, and out-of-view, we propose to conduct a joint local and global search by exploiting a new direction-aware target-driven attention mechanism.
arXiv Detail & Related papers (2021-07-22T03:10:51Z) - Challenge-Aware RGBT Tracking [32.88141817679821]
We propose a novel challenge-aware neural network to handle the modality-shared challenges and the modality-specific ones.
We show that our method operates at a real-time speed while performing well against the state-of-the-art methods on three benchmark datasets.
arXiv Detail & Related papers (2020-07-26T15:11:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.