Unsupervised Cross-Modal Distillation for Thermal Infrared Tracking
- URL: http://arxiv.org/abs/2108.00187v1
- Date: Sat, 31 Jul 2021 09:19:59 GMT
- Title: Unsupervised Cross-Modal Distillation for Thermal Infrared Tracking
- Authors: Jingxian Sun, Lichao Zhang, Yufei Zha, Abel Gonzalez-Garcia, Peng
Zhang, Wei Huang, and Yanning Zhang
- Abstract summary: The target representation learned by convolutional neural networks plays an important role in Thermal Infrared (TIR) tracking.
We propose to distill representations of the TIR modality from the RGB modality with Cross-Modal Distillation (CMD)
Our tracker outperforms the baseline tracker by achieving absolute gains of 2.3% Success, 2.7% Precision, and 2.5% Normalized Precision respectively.
- Score: 39.505507508776404
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The target representation learned by convolutional neural networks plays an
important role in Thermal Infrared (TIR) tracking. Currently, most of the
top-performing TIR trackers are still employing representations learned by the
model trained on the RGB data. However, this representation does not take into
account the information in the TIR modality itself, limiting the performance of
TIR tracking. To solve this problem, we propose to distill representations of
the TIR modality from the RGB modality with Cross-Modal Distillation (CMD) on a
large amount of unlabeled paired RGB-TIR data. We take advantage of the
two-branch architecture of the baseline tracker, i.e. DiMP, for cross-modal
distillation working on two components of the tracker. Specifically, we use one
branch as a teacher module to distill the representation learned by the model
into the other branch. Benefiting from the powerful model in the RGB modality,
the cross-modal distillation can learn the TIR-specific representation for
promoting TIR tracking. The proposed approach can be incorporated into
different baseline trackers conveniently as a generic and independent
component. Furthermore, the semantic coherence of paired RGB and TIR images is
utilized as a supervised signal in the distillation loss for cross-modal
knowledge transfer. In practice, three different approaches are explored to
generate paired RGB-TIR patches with the same semantics for training in an
unsupervised way. It is easy to extend to an even larger scale of unlabeled
training data. Extensive experiments on the LSOTB-TIR dataset and PTB-TIR
dataset demonstrate that our proposed cross-modal distillation method
effectively learns TIR-specific target representations transferred from the RGB
modality. Our tracker outperforms the baseline tracker by achieving absolute
gains of 2.3% Success, 2.7% Precision, and 2.5% Normalized Precision
respectively.
Related papers
- Progressive Domain Adaptation for Thermal Infrared Object Tracking [9.888266596236578]
In this work, we propose a Progressive Domain Adaptation framework for TIR Tracking.
The framework makes full use of large-scale labeled RGB datasets without requiring time-consuming and labor-intensive labeling of large-scale TIR data.
Experimental results on five TIR tracking benchmarks show that the proposed method gains a nearly 6% success rate, demonstrating its effectiveness.
arXiv Detail & Related papers (2024-07-28T08:43:16Z) - Thermal-Infrared Remote Target Detection System for Maritime Rescue
based on Data Augmentation with 3D Synthetic Data [4.66313002591741]
This paper proposes a thermal-infrared (TIR) remote target detection system for maritime rescue using deep learning and data augmentation.
To address dataset scarcity and improve model robustness, a synthetic dataset from a 3D game (ARMA3) to augment the data is collected.
The proposed segmentation model surpasses the performance of state-of-the-art segmentation methods.
arXiv Detail & Related papers (2023-10-31T12:37:49Z) - Edge-guided Multi-domain RGB-to-TIR image Translation for Training
Vision Tasks with Challenging Labels [12.701191873813583]
The insufficient number of annotated thermal infrared (TIR) image datasets hinders TIR image-based deep learning networks to have comparable performances to that of RGB.
We propose a modified multidomain RGB to TIR image translation model focused on edge preservation to employ annotated RGB images with challenging labels.
We have enabled the supervised learning of deep TIR image-based optical flow estimation and object detection that ameliorated in end point error by 56.5% on average and the best object detection mAP of 23.9% respectively.
arXiv Detail & Related papers (2023-01-30T06:44:38Z) - Self-Supervised RGB-T Tracking with Cross-Input Consistency [33.34113942544558]
In this paper, we propose a self-supervised RGB-T tracking method.
Our tracker is trained using unlabeled RGB-T video pairs in a self-supervised manner.
To the best of our knowledge, our tracker is the first self-supervised RGB-T tracker.
arXiv Detail & Related papers (2023-01-26T18:11:16Z) - Learning Dual-Fused Modality-Aware Representations for RGBD Tracking [67.14537242378988]
Compared with the traditional RGB object tracking, the addition of the depth modality can effectively solve the target and background interference.
Some existing RGBD trackers use the two modalities separately and thus some particularly useful shared information between them is ignored.
We propose a novel Dual-fused Modality-aware Tracker (termed DMTracker) which aims to learn informative and discriminative representations of the target objects for robust RGBD tracking.
arXiv Detail & Related papers (2022-11-06T07:59:07Z) - Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline [80.13652104204691]
In this paper, we construct a large-scale benchmark with high diversity for visible-thermal UAV tracking (VTUAV)
We provide a coarse-to-fine attribute annotation, where frame-level attributes are provided to exploit the potential of challenge-specific trackers.
In addition, we design a new RGB-T baseline, named Hierarchical Multi-modal Fusion Tracker (HMFT), which fuses RGB-T data in various levels.
arXiv Detail & Related papers (2022-04-08T15:22:33Z) - Temporal Aggregation for Adaptive RGBT Tracking [14.00078027541162]
We propose an RGBT tracker which takes clues into account for robust appearance model learning.
Unlike most existing RGBT trackers that implement object tracking tasks with only spatial information included, temporal information is further considered in this method.
arXiv Detail & Related papers (2022-01-22T02:31:56Z) - Data-Level Recombination and Lightweight Fusion Scheme for RGB-D Salient
Object Detection [73.31632581915201]
We propose a novel data-level recombination strategy to fuse RGB with D (depth) before deep feature extraction.
A newly lightweight designed triple-stream network is applied over these novel formulated data to achieve an optimal channel-wise complementary fusion status between the RGB and D.
arXiv Detail & Related papers (2020-08-07T10:13:05Z) - Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z) - Jointly Modeling Motion and Appearance Cues for Robust RGB-T Tracking [85.333260415532]
We develop a novel late fusion method to infer the fusion weight maps of both RGB and thermal (T) modalities.
When the appearance cue is unreliable, we take motion cues into account to make the tracker robust.
Numerous results on three recent RGB-T tracking datasets show that the proposed tracker performs significantly better than other state-of-the-art algorithms.
arXiv Detail & Related papers (2020-07-04T08:11:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.