MUST: The First Dataset and Unified Framework for Multispectral UAV Single Object Tracking
- URL: http://arxiv.org/abs/2503.17699v1
- Date: Sat, 22 Mar 2025 08:47:28 GMT
- Title: MUST: The First Dataset and Unified Framework for Multispectral UAV Single Object Tracking
- Authors: Haolin Qin, Tingfa Xu, Tianhao Li, Zhenxiang Chen, Tao Feng, Jianan Li,
- Abstract summary: We introduce the first large-scale Multispectral UAV Single Object Tracking dataset (MUST)<n>MUST includes 250 video sequences spanning diverse environments and challenges.<n>We also propose a novel tracking framework, UNTrack, which encodes unified spectral, spatial, and temporal features from spectrum prompts.
- Score: 17.96400810834486
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: UAV tracking faces significant challenges in real-world scenarios, such as small-size targets and occlusions, which limit the performance of RGB-based trackers. Multispectral images (MSI), which capture additional spectral information, offer a promising solution to these challenges. However, progress in this field has been hindered by the lack of relevant datasets. To address this gap, we introduce the first large-scale Multispectral UAV Single Object Tracking dataset (MUST), which includes 250 video sequences spanning diverse environments and challenges, providing a comprehensive data foundation for multispectral UAV tracking. We also propose a novel tracking framework, UNTrack, which encodes unified spectral, spatial, and temporal features from spectrum prompts, initial templates, and sequential searches. UNTrack employs an asymmetric transformer with a spectral background eliminate mechanism for optimal relationship modeling and an encoder that continuously updates the spectrum prompt to refine tracking, improving both accuracy and efficiency. Extensive experiments show that our proposed UNTrack outperforms state-of-the-art UAV trackers. We believe our dataset and framework will drive future research in this area. The dataset is available on https://github.com/q2479036243/MUST-Multispectral-UAV-Single-Object-Tracking.
Related papers
- COST: Contrastive One-Stage Transformer for Vision-Language Small Object Tracking [52.62149024881728]
We propose a contrastive one-stage transformer fusion framework for vision-language (VL) tracking.
We introduce a contrastive alignment strategy that maximizes mutual information between a video and its corresponding language description.
By leveraging a visual-linguistic transformer, we establish an efficient multi-modal fusion and reasoning mechanism.
arXiv Detail & Related papers (2025-04-02T03:12:38Z) - SFTrack: A Robust Scale and Motion Adaptive Algorithm for Tracking Small and Fast Moving Objects [2.9803250365852443]
This paper addresses the problem of multi-object tracking in Unmanned Aerial Vehicle (UAV) footage.
It plays a critical role in various UAV applications, including traffic monitoring systems and real-time suspect tracking by the police.
We propose a new tracking strategy, which initiates the tracking of target objects from low-confidence detections.
arXiv Detail & Related papers (2024-10-26T05:09:20Z) - STCMOT: Spatio-Temporal Cohesion Learning for UAV-Based Multiple Object Tracking [13.269416985959404]
Multiple object tracking (MOT) in Unmanned Aerial Vehicle (UAV) videos is important for diverse applications in computer vision.
We propose a novel Spatio-Temporal Cohesion Multiple Object Tracking framework (STCMOT)
We use historical embedding features to model the representation of ReID and detection features in a sequential order.
Our framework sets a new state-of-the-art performance in MOTA and IDF1 metrics.
arXiv Detail & Related papers (2024-09-17T14:34:18Z) - AViTMP: A Tracking-Specific Transformer for Single-Branch Visual Tracking [17.133735660335343]
We propose an Adaptive ViT Model Prediction tracker (AViTMP) to design a customised tracking method.
This method bridges the single-branch network with discriminative models for the first time.
We show that AViTMP achieves state-of-the-art performance, especially in terms of long-term tracking and robustness.
arXiv Detail & Related papers (2023-10-30T13:48:04Z) - You Only Need Two Detectors to Achieve Multi-Modal 3D Multi-Object Tracking [9.20064374262956]
The proposed framework can achieve robust tracking by using only a 2D detector and a 3D detector.
It is proven more accurate than many of the state-of-the-art TBD-based multi-modal tracking methods.
arXiv Detail & Related papers (2023-04-18T02:45:18Z) - OmniTracker: Unifying Object Tracking by Tracking-with-Detection [119.51012668709502]
OmniTracker is presented to resolve all the tracking tasks with a fully shared network architecture, model weights, and inference pipeline.
Experiments on 7 tracking datasets, including LaSOT, TrackingNet, DAVIS16-17, MOT17, MOTS20, and YTVIS19, demonstrate that OmniTracker achieves on-par or even better results than both task-specific and unified tracking models.
arXiv Detail & Related papers (2023-03-21T17:59:57Z) - DIVOTrack: A Novel Dataset and Baseline Method for Cross-View
Multi-Object Tracking in DIVerse Open Scenes [74.64897845999677]
We introduce a new cross-view multi-object tracking dataset for DIVerse Open scenes with dense tracking pedestrians.
Our DIVOTrack has fifteen distinct scenarios and 953 cross-view tracks, surpassing all cross-view multi-object tracking datasets currently available.
Furthermore, we provide a novel baseline cross-view tracking method with a unified joint detection and cross-view tracking framework named CrossMOT.
arXiv Detail & Related papers (2023-02-15T14:10:42Z) - SOMPT22: A Surveillance Oriented Multi-Pedestrian Tracking Dataset [5.962184741057505]
We introduce SOMPT22 dataset; a new set for multi person tracking with annotated short videos captured from static cameras located on poles with 6-8 meters in height positioned for city surveillance.
We analyze MOT trackers classified as one-shot and two-stage with respect to the way of use of detection and reID networks on this new dataset.
The experimental results of our new dataset indicate that SOTA is still far from high efficiency, and single-shot trackers are good candidates to unify fast execution and accuracy with competitive performance.
arXiv Detail & Related papers (2022-08-04T11:09:19Z) - Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline [80.13652104204691]
In this paper, we construct a large-scale benchmark with high diversity for visible-thermal UAV tracking (VTUAV)
We provide a coarse-to-fine attribute annotation, where frame-level attributes are provided to exploit the potential of challenge-specific trackers.
In addition, we design a new RGB-T baseline, named Hierarchical Multi-modal Fusion Tracker (HMFT), which fuses RGB-T data in various levels.
arXiv Detail & Related papers (2022-04-08T15:22:33Z) - Unified Transformer Tracker for Object Tracking [58.65901124158068]
We present the Unified Transformer Tracker (UTT) to address tracking problems in different scenarios with one paradigm.
A track transformer is developed in our UTT to track the target in both Single Object Tracking (SOT) and Multiple Object Tracking (MOT)
arXiv Detail & Related papers (2022-03-29T01:38:49Z) - TAO: A Large-Scale Benchmark for Tracking Any Object [95.87310116010185]
Tracking Any Object dataset consists of 2,907 high resolution videos, captured in diverse environments, which are half a minute long on average.
We ask annotators to label objects that move at any point in the video, and give names to them post factum.
Our vocabulary is both significantly larger and qualitatively different from existing tracking datasets.
arXiv Detail & Related papers (2020-05-20T21:07:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.