RGB-T Tracking Based on Mixed Attention
- URL: http://arxiv.org/abs/2304.04264v4
- Date: Tue, 18 Apr 2023 02:00:25 GMT
- Title: RGB-T Tracking Based on Mixed Attention
- Authors: Yang Luo, Xiqing Guo, Mingtao Dong, Jin Yu
- Abstract summary: RGB-T tracking involves the use of images from both visible and thermal modalities.
An RGB-T tracker based on mixed attention mechanism to achieve complementary fusion of modalities is proposed in this paper.
- Score: 5.151994214135177
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: RGB-T tracking involves the use of images from both visible and thermal
modalities. The primary objective is to adaptively leverage the relatively
dominant modality in varying conditions to achieve more robust tracking
compared to single-modality tracking. An RGB-T tracker based on mixed attention
mechanism to achieve complementary fusion of modalities (referred to as MACFT)
is proposed in this paper. In the feature extraction stage, we utilize
different transformer backbone branches to extract specific and shared
information from different modalities. By performing mixed attention operations
in the backbone to enable information interaction and self-enhancement between
the template and search images, it constructs a robust feature representation
that better understands the high-level semantic features of the target. Then,
in the feature fusion stage, a modality-adaptive fusion is achieved through a
mixed attention-based modality fusion network, which suppresses the low-quality
modality noise while enhancing the information of the dominant modality.
Evaluation on multiple RGB-T public datasets demonstrates that our proposed
tracker outperforms other RGB-T trackers on general evaluation metrics while
also being able to adapt to longterm tracking scenarios.
Related papers
- Coordinate-Aware Thermal Infrared Tracking Via Natural Language Modeling [16.873697155916997]
NLMTrack is a coordinate-aware thermal infrared tracking model.
NLMTrack applies an encoder that unifies feature extraction and feature fusion.
Experiments show that NLMTrack achieves state-of-the-art performance on multiple benchmarks.
arXiv Detail & Related papers (2024-07-11T08:06:31Z) - TENet: Targetness Entanglement Incorporating with Multi-Scale Pooling and Mutually-Guided Fusion for RGB-E Object Tracking [30.89375068036783]
Existing approaches perform event feature extraction for RGB-E tracking using traditional appearance models.
We propose an Event backbone (Pooler) to obtain a high-quality feature representation that is cognisant of the intrinsic characteristics of the event data.
Our method significantly outperforms state-of-the-art trackers on two widely used RGB-E tracking datasets.
arXiv Detail & Related papers (2024-05-08T12:19:08Z) - Unified Single-Stage Transformer Network for Efficient RGB-T Tracking [47.88113335927079]
We propose a single-stage Transformer RGB-T tracking network, namely USTrack, which unifies the above three stages into a single ViT (Vision Transformer) backbone.
With this structure, the network can extract fusion features of the template and search region under the mutual interaction of modalities.
Experiments on three popular RGB-T tracking benchmarks demonstrate that our method achieves new state-of-the-art performance while maintaining the fastest inference speed 84.2FPS.
arXiv Detail & Related papers (2023-08-26T05:09:57Z) - Learning Dual-Fused Modality-Aware Representations for RGBD Tracking [67.14537242378988]
Compared with the traditional RGB object tracking, the addition of the depth modality can effectively solve the target and background interference.
Some existing RGBD trackers use the two modalities separately and thus some particularly useful shared information between them is ignored.
We propose a novel Dual-fused Modality-aware Tracker (termed DMTracker) which aims to learn informative and discriminative representations of the target objects for robust RGBD tracking.
arXiv Detail & Related papers (2022-11-06T07:59:07Z) - CIR-Net: Cross-modality Interaction and Refinement for RGB-D Salient
Object Detection [144.66411561224507]
We present a convolutional neural network (CNN) model, named CIR-Net, based on the novel cross-modality interaction and refinement.
Our network outperforms the state-of-the-art saliency detectors both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-10-06T11:59:19Z) - Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient
Object Detection [67.33924278729903]
In this work, we propose Dual Swin-Transformer based Mutual Interactive Network.
We adopt Swin-Transformer as the feature extractor for both RGB and depth modality to model the long-range dependencies in visual inputs.
Comprehensive experiments on five standard RGB-D SOD benchmark datasets demonstrate the superiority of the proposed DTMINet method.
arXiv Detail & Related papers (2022-06-07T08:35:41Z) - Temporal Aggregation for Adaptive RGBT Tracking [14.00078027541162]
We propose an RGBT tracker which takes clues into account for robust appearance model learning.
Unlike most existing RGBT trackers that implement object tracking tasks with only spatial information included, temporal information is further considered in this method.
arXiv Detail & Related papers (2022-01-22T02:31:56Z) - Transformer-based Network for RGB-D Saliency Detection [82.6665619584628]
Key to RGB-D saliency detection is to fully mine and fuse information at multiple scales across the two modalities.
We show that transformer is a uniform operation which presents great efficacy in both feature fusion and feature enhancement.
Our proposed network performs favorably against state-of-the-art RGB-D saliency detection methods.
arXiv Detail & Related papers (2021-12-01T15:53:58Z) - Jointly Modeling Motion and Appearance Cues for Robust RGB-T Tracking [85.333260415532]
We develop a novel late fusion method to infer the fusion weight maps of both RGB and thermal (T) modalities.
When the appearance cue is unreliable, we take motion cues into account to make the tracker robust.
Numerous results on three recent RGB-T tracking datasets show that the proposed tracker performs significantly better than other state-of-the-art algorithms.
arXiv Detail & Related papers (2020-07-04T08:11:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.