Self-Supervised RGB-T Tracking with Cross-Input Consistency
- URL: http://arxiv.org/abs/2301.11274v1
- Date: Thu, 26 Jan 2023 18:11:16 GMT
- Title: Self-Supervised RGB-T Tracking with Cross-Input Consistency
- Authors: Xingchen Zhang and Yiannis Demiris
- Abstract summary: In this paper, we propose a self-supervised RGB-T tracking method.
Our tracker is trained using unlabeled RGB-T video pairs in a self-supervised manner.
To the best of our knowledge, our tracker is the first self-supervised RGB-T tracker.
- Score: 33.34113942544558
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In this paper, we propose a self-supervised RGB-T tracking method. Different
from existing deep RGB-T trackers that use a large number of annotated RGB-T
image pairs for training, our RGB-T tracker is trained using unlabeled RGB-T
video pairs in a self-supervised manner. We propose a novel cross-input
consistency-based self-supervised training strategy based on the idea that
tracking can be performed using different inputs. Specifically, we construct
two distinct inputs using unlabeled RGB-T video pairs. We then track objects
using these two inputs to generate results, based on which we construct our
cross-input consistency loss. Meanwhile, we propose a reweighting strategy to
make our loss function robust to low-quality training samples. We build our
tracker on a Siamese correlation filter network. To the best of our knowledge,
our tracker is the first self-supervised RGB-T tracker. Extensive experiments
on two public RGB-T tracking benchmarks demonstrate that the proposed training
strategy is effective. Remarkably, despite training only with a corpus of
unlabeled RGB-T video pairs, our tracker outperforms seven supervised RGB-T
trackers on the GTOT dataset.
Related papers
- Learning Dual-Fused Modality-Aware Representations for RGBD Tracking [67.14537242378988]
Compared with the traditional RGB object tracking, the addition of the depth modality can effectively solve the target and background interference.
Some existing RGBD trackers use the two modalities separately and thus some particularly useful shared information between them is ignored.
We propose a novel Dual-fused Modality-aware Tracker (termed DMTracker) which aims to learn informative and discriminative representations of the target objects for robust RGBD tracking.
arXiv Detail & Related papers (2022-11-06T07:59:07Z) - RGBD1K: A Large-scale Dataset and Benchmark for RGB-D Object Tracking [30.448658049744775]
Given a limited amount of annotated RGB-D tracking data, most state-of-the-art RGB-D trackers are simple extensions of high-performance RGB-only trackers.
To address the dataset deficiency issue, a new RGB-D dataset named RGBD1K is released in this paper.
arXiv Detail & Related papers (2022-08-21T03:07:36Z) - Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline [80.13652104204691]
In this paper, we construct a large-scale benchmark with high diversity for visible-thermal UAV tracking (VTUAV)
We provide a coarse-to-fine attribute annotation, where frame-level attributes are provided to exploit the potential of challenge-specific trackers.
In addition, we design a new RGB-T baseline, named Hierarchical Multi-modal Fusion Tracker (HMFT), which fuses RGB-T data in various levels.
arXiv Detail & Related papers (2022-04-08T15:22:33Z) - RGBD Object Tracking: An In-depth Review [89.96221353160831]
We firstly review RGBD object trackers from different perspectives, including RGBD fusion, depth usage, and tracking framework.
We benchmark a representative set of RGBD trackers, and give detailed analyses based on their performances.
arXiv Detail & Related papers (2022-03-26T18:53:51Z) - DepthTrack : Unveiling the Power of RGBD Tracking [29.457114656913944]
This work introduces a new RGBD tracking dataset - Depth-Track.
It has twice as many sequences (200) and scene types (40) than in the largest existing dataset.
The average length of the sequences (1473), the number of deformable objects (16) and the number of tracking attributes (15) have been increased.
arXiv Detail & Related papers (2021-08-31T16:42:38Z) - Unsupervised Cross-Modal Distillation for Thermal Infrared Tracking [39.505507508776404]
The target representation learned by convolutional neural networks plays an important role in Thermal Infrared (TIR) tracking.
We propose to distill representations of the TIR modality from the RGB modality with Cross-Modal Distillation (CMD)
Our tracker outperforms the baseline tracker by achieving absolute gains of 2.3% Success, 2.7% Precision, and 2.5% Normalized Precision respectively.
arXiv Detail & Related papers (2021-07-31T09:19:59Z) - Jointly Modeling Motion and Appearance Cues for Robust RGB-T Tracking [85.333260415532]
We develop a novel late fusion method to infer the fusion weight maps of both RGB and thermal (T) modalities.
When the appearance cue is unreliable, we take motion cues into account to make the tracker robust.
Numerous results on three recent RGB-T tracking datasets show that the proposed tracker performs significantly better than other state-of-the-art algorithms.
arXiv Detail & Related papers (2020-07-04T08:11:33Z) - Synergistic saliency and depth prediction for RGB-D saliency detection [76.27406945671379]
Existing RGB-D saliency datasets are small, which may lead to overfitting and limited generalization for diverse scenarios.
We propose a semi-supervised system for RGB-D saliency detection that can be trained on smaller RGB-D saliency datasets without saliency ground truth.
arXiv Detail & Related papers (2020-07-03T14:24:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.