Temporal Aggregation for Adaptive RGBT Tracking
- URL: http://arxiv.org/abs/2201.08949v1
- Date: Sat, 22 Jan 2022 02:31:56 GMT
- Title: Temporal Aggregation for Adaptive RGBT Tracking
- Authors: Zhangyong Tang, Tianyang Xu, and Xiao-Jun Wu
- Abstract summary: We propose an RGBT tracker which takes clues into account for robust appearance model learning.
Unlike most existing RGBT trackers that implement object tracking tasks with only spatial information included, temporal information is further considered in this method.
- Score: 14.00078027541162
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Visual object tracking with RGB and thermal infrared (TIR) spectra available,
shorted in RGBT tracking, is a novel and challenging research topic which draws
increasing attention nowadays. In this paper, we propose an RGBT tracker which
takes spatio-temporal clues into account for robust appearance model learning,
and simultaneously, constructs an adaptive fusion sub-network for cross-modal
interactions. Unlike most existing RGBT trackers that implement object tracking
tasks with only spatial information included, temporal information is further
considered in this method. Specifically, different from traditional Siamese
trackers, which only obtain one search image during the process of picking up
template-search image pairs, an extra search sample adjacent to the original
one is selected to predict the temporal transformation, resulting in improved
robustness of tracking performance.As for multi-modal tracking, constrained to
the limited RGBT datasets, the adaptive fusion sub-network is appended to our
method at the decision level to reflect the complementary characteristics
contained in two modalities. To design a thermal infrared assisted RGB tracker,
the outputs of the classification head from the TIR modality are taken into
consideration before the residual connection from the RGB modality. Extensive
experimental results on three challenging datasets, i.e. VOT-RGBT2019, GTOT and
RGBT210, verify the effectiveness of our method. Code will be shared at
\textcolor{blue}{\emph{https://github.com/Zhangyong-Tang/TAAT}}.
Related papers
- Cross Fusion RGB-T Tracking with Bi-directional Adapter [8.425592063392857]
We propose a novel Cross Fusion RGB-T Tracking architecture (CFBT)
The effectiveness of CFBT relies on three newly designed cross-temporal information fusion modules.
Experiments on three popular RGB-T tracking benchmarks demonstrate that our method achieves new state-of-the-art performance.
arXiv Detail & Related papers (2024-08-30T02:45:56Z) - Unified Single-Stage Transformer Network for Efficient RGB-T Tracking [47.88113335927079]
We propose a single-stage Transformer RGB-T tracking network, namely USTrack, which unifies the above three stages into a single ViT (Vision Transformer) backbone.
With this structure, the network can extract fusion features of the template and search region under the mutual interaction of modalities.
Experiments on three popular RGB-T tracking benchmarks demonstrate that our method achieves new state-of-the-art performance while maintaining the fastest inference speed 84.2FPS.
arXiv Detail & Related papers (2023-08-26T05:09:57Z) - RGB-T Tracking Based on Mixed Attention [5.151994214135177]
RGB-T tracking involves the use of images from both visible and thermal modalities.
An RGB-T tracker based on mixed attention mechanism to achieve complementary fusion of modalities is proposed in this paper.
arXiv Detail & Related papers (2023-04-09T15:59:41Z) - Learning Dual-Fused Modality-Aware Representations for RGBD Tracking [67.14537242378988]
Compared with the traditional RGB object tracking, the addition of the depth modality can effectively solve the target and background interference.
Some existing RGBD trackers use the two modalities separately and thus some particularly useful shared information between them is ignored.
We propose a novel Dual-fused Modality-aware Tracker (termed DMTracker) which aims to learn informative and discriminative representations of the target objects for robust RGBD tracking.
arXiv Detail & Related papers (2022-11-06T07:59:07Z) - Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient
Object Detection [67.33924278729903]
In this work, we propose Dual Swin-Transformer based Mutual Interactive Network.
We adopt Swin-Transformer as the feature extractor for both RGB and depth modality to model the long-range dependencies in visual inputs.
Comprehensive experiments on five standard RGB-D SOD benchmark datasets demonstrate the superiority of the proposed DTMINet method.
arXiv Detail & Related papers (2022-06-07T08:35:41Z) - Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline [80.13652104204691]
In this paper, we construct a large-scale benchmark with high diversity for visible-thermal UAV tracking (VTUAV)
We provide a coarse-to-fine attribute annotation, where frame-level attributes are provided to exploit the potential of challenge-specific trackers.
In addition, we design a new RGB-T baseline, named Hierarchical Multi-modal Fusion Tracker (HMFT), which fuses RGB-T data in various levels.
arXiv Detail & Related papers (2022-04-08T15:22:33Z) - RGBT Tracking via Multi-Adapter Network with Hierarchical Divergence
Loss [37.99375824040946]
We propose a novel multi-adapter network to jointly perform modality-shared, modality-specific and instance-aware target representation learning.
Experiments on two RGBT tracking benchmark datasets demonstrate the outstanding performance of the proposed tracker.
arXiv Detail & Related papers (2020-11-14T01:50:46Z) - Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z) - Jointly Modeling Motion and Appearance Cues for Robust RGB-T Tracking [85.333260415532]
We develop a novel late fusion method to infer the fusion weight maps of both RGB and thermal (T) modalities.
When the appearance cue is unreliable, we take motion cues into account to make the tracker robust.
Numerous results on three recent RGB-T tracking datasets show that the proposed tracker performs significantly better than other state-of-the-art algorithms.
arXiv Detail & Related papers (2020-07-04T08:11:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.