Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline
- URL: http://arxiv.org/abs/2204.04120v1
- Date: Fri, 8 Apr 2022 15:22:33 GMT
- Title: Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline
- Authors: Pengyu Zhang, Jie Zhao, Dong Wang, Huchuan Lu, Xiang Ruan
- Abstract summary: In this paper, we construct a large-scale benchmark with high diversity for visible-thermal UAV tracking (VTUAV)
We provide a coarse-to-fine attribute annotation, where frame-level attributes are provided to exploit the potential of challenge-specific trackers.
In addition, we design a new RGB-T baseline, named Hierarchical Multi-modal Fusion Tracker (HMFT), which fuses RGB-T data in various levels.
- Score: 80.13652104204691
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: With the popularity of multi-modal sensors, visible-thermal (RGB-T) object
tracking is to achieve robust performance and wider application scenarios with
the guidance of objects' temperature information. However, the lack of paired
training samples is the main bottleneck for unlocking the power of RGB-T
tracking. Since it is laborious to collect high-quality RGB-T sequences, recent
benchmarks only provide test sequences. In this paper, we construct a
large-scale benchmark with high diversity for visible-thermal UAV tracking
(VTUAV), including 500 sequences with 1.7 million high-resolution (1920
$\times$ 1080 pixels) frame pairs. In addition, comprehensive applications
(short-term tracking, long-term tracking and segmentation mask prediction) with
diverse categories and scenes are considered for exhaustive evaluation.
Moreover, we provide a coarse-to-fine attribute annotation, where frame-level
attributes are provided to exploit the potential of challenge-specific
trackers. In addition, we design a new RGB-T baseline, named Hierarchical
Multi-modal Fusion Tracker (HMFT), which fuses RGB-T data in various levels.
Numerous experiments on several datasets are conducted to reveal the
effectiveness of HMFT and the complement of different fusion types. The project
is available at here.
Related papers
- Visible-Thermal Multiple Object Tracking: Large-scale Video Dataset and Progressive Fusion Approach [17.286142856787222]
We contribute a large-scale Visible-Thermal video benchmark for Multiple Object Tracking (MOT) called VT-MOT.
VT-MOT includes 582 video sequence pairs, 401k frame pairs from surveillance, drone, and handheld platforms.
A comprehensive experiment are conducted on VT-MOT and the results prove the superiority and effectiveness of the proposed method.
arXiv Detail & Related papers (2024-08-02T01:29:43Z) - SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking [19.50096632818305]
Multimodal Visual Object Tracking (VOT) has recently gained significant attention due to its robustness.
Recent studies have utilized prompt tuning to transfer pre-trained RGB-based trackers to multimodal data.
We propose a novel symmetric multimodal tracking framework called SDSTrack.
arXiv Detail & Related papers (2024-03-24T04:15:50Z) - Long-term Frame-Event Visual Tracking: Benchmark Dataset and Baseline [37.06330707742272]
We first propose a new long-term and large-scale frame-event single object tracking dataset, termed FELT.
It contains 742 videos and 1,594,474 RGB frames and event stream pairs and has become the largest frame-event tracking dataset to date.
We propose a novel associative memory Transformer network as a unified backbone by introducing modern Hopfield layers into multi-head self-attention blocks to fuse both RGB and event data.
arXiv Detail & Related papers (2024-03-09T08:49:50Z) - EANet: Enhanced Attribute-based RGBT Tracker Network [0.0]
We propose a deep learning-based image tracking approach that fuses RGB and thermal images (RGBT)
The proposed model consists of two main components: a feature extractor and a tracker.
The proposed methods are evaluated on the RGBT234 citeLiCLiang 2018 and LasHeR citeLiLasher 2021 datasets.
arXiv Detail & Related papers (2023-07-04T19:34:53Z) - Revisiting Color-Event based Tracking: A Unified Network, Dataset, and
Metric [53.88188265943762]
We propose a single-stage backbone network for Color-Event Unified Tracking (CEUTrack), which achieves the above functions simultaneously.
Our proposed CEUTrack is simple, effective, and efficient, which achieves over 75 FPS and new SOTA performance.
arXiv Detail & Related papers (2022-11-20T16:01:31Z) - AVisT: A Benchmark for Visual Object Tracking in Adverse Visibility [125.77396380698639]
AVisT is a benchmark for visual tracking in diverse scenarios with adverse visibility.
AVisT comprises 120 challenging sequences with 80k annotated frames, spanning 18 diverse scenarios.
We benchmark 17 popular and recent trackers on AVisT with detailed analysis of their tracking performance across attributes.
arXiv Detail & Related papers (2022-08-14T17:49:37Z) - Temporal Aggregation for Adaptive RGBT Tracking [14.00078027541162]
We propose an RGBT tracker which takes clues into account for robust appearance model learning.
Unlike most existing RGBT trackers that implement object tracking tasks with only spatial information included, temporal information is further considered in this method.
arXiv Detail & Related papers (2022-01-22T02:31:56Z) - Multi-modal Visual Tracking: Review and Experimental Comparison [85.20414397784937]
We summarize the multi-modal tracking algorithms, especially visible-depth (RGB-D) tracking and visible-thermal (RGB-T) tracking.
We conduct experiments to analyze the effectiveness of trackers on five datasets.
arXiv Detail & Related papers (2020-12-08T02:39:38Z) - LSOTB-TIR:A Large-Scale High-Diversity Thermal Infrared Object Tracking
Benchmark [51.1506855334948]
This paper presents a Large-Scale and high-diversity general Thermal InfraRed (TIR) Object Tracking Benchmark, called LSOTBTIR.
We annotate the bounding box of objects in every frame of all sequences and generate over 730K bounding boxes in total.
We evaluate and analyze more than 30 trackers on LSOTB-TIR to provide a series of baselines, and the results show that deep trackers achieve promising performance.
arXiv Detail & Related papers (2020-08-03T12:36:06Z) - Jointly Modeling Motion and Appearance Cues for Robust RGB-T Tracking [85.333260415532]
We develop a novel late fusion method to infer the fusion weight maps of both RGB and thermal (T) modalities.
When the appearance cue is unreliable, we take motion cues into account to make the tracker robust.
Numerous results on three recent RGB-T tracking datasets show that the proposed tracker performs significantly better than other state-of-the-art algorithms.
arXiv Detail & Related papers (2020-07-04T08:11:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.