Related papers: A Tri-Modal Dataset and a Baseline System for Tracking Unmanned Aerial Vehicles

A Tri-Modal Dataset and a Baseline System for Tracking Unmanned Aerial Vehicles

URL: http://arxiv.org/abs/2511.18344v1
Date: Sun, 23 Nov 2025 08:42:17 GMT
Title: A Tri-Modal Dataset and a Baseline System for Tracking Unmanned Aerial Vehicles
Authors: Tianyang Xu, Jinjie Gu, Xuefeng Zhu, XiaoJun Wu, Josef Kittler,
Abstract summary: MM-UAV is the first large-scale benchmark for Multi-Modal UAV Tracking.<n>The dataset spans over 30 challenging scenarios, with 1,321 synchronised multi-modal sequences, and more than 2.8 million annotated frames.<n>Accompanying the dataset, we provide a novel multi-modal multi-UAV tracking framework.
Score: 74.8162337823142
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: With the proliferation of low altitude unmanned aerial vehicles (UAVs), visual multi-object tracking is becoming a critical security technology, demanding significant robustness even in complex environmental conditions. However, tracking UAVs using a single visual modality often fails in challenging scenarios, such as low illumination, cluttered backgrounds, and rapid motion. Although multi-modal multi-object UAV tracking is more resilient, the development of effective solutions has been hindered by the absence of dedicated public datasets. To bridge this gap, we release MM-UAV, the first large-scale benchmark for Multi-Modal UAV Tracking, integrating three key sensing modalities, e.g. RGB, infrared (IR), and event signals. The dataset spans over 30 challenging scenarios, with 1,321 synchronised multi-modal sequences, and more than 2.8 million annotated frames. Accompanying the dataset, we provide a novel multi-modal multi-UAV tracking framework, designed specifically for UAV tracking applications and serving as a baseline for future research. Our framework incorporates two key technical innovations, e.g. an offset-guided adaptive alignment module to resolve spatio mismatches across sensors, and an adaptive dynamic fusion module to balance complementary information conveyed by different modalities. Furthermore, to overcome the limitations of conventional appearance modelling in multi-object tracking, we introduce an event-enhanced association mechanism that leverages motion cues from the event modality for more reliable identity maintenance. Comprehensive experiments demonstrate that the proposed framework consistently outperforms state-of-the-art methods. To foster further research in multi-modal UAV tracking, both the dataset and source code will be made publicly available at https://xuefeng-zhu5.github.io/MM-UAV/.

Related papers

UAUTrack: Towards Unified Multimodal Anti-UAV Visual Tracking [4.161228439649909]
UAUTrack is a unified single-target tracking framework built upon a single-stream, single-stage, end-to-end architecture.<n>Results show that UAUTrack achieves state-of-the-art performance on the Anti-UAV and DUT Anti-UAV datasets.
arXiv Detail & Related papers (2025-12-02T11:47:13Z)
Tracking the Unstable: Appearance-Guided Motion Modeling for Robust Multi-Object Tracking in UAV-Captured Videos [58.156141601478794]
Multi-object tracking (UAVT) aims to track multiple objects while maintaining consistent identities across frames of a given video.<n>Existing methods typically model motion cues and appearance separately, overlooking their interplay and resulting in suboptimal tracking performance.<n>We propose AMOT, which exploits appearance and motion cues through two key components: an Appearance-Motion Consistency (AMC) matrix and a Motion-aware Track Continuation (MTC) module.
arXiv Detail & Related papers (2025-08-03T12:06:47Z)
DINO-CoDT: Multi-class Collaborative Detection and Tracking with Vision Foundation Models [11.34839442803445]
We propose a multi-class collaborative detection and tracking framework tailored for diverse road users.<n>We first present a detector with a global spatial attention fusion (GSAF) module, enhancing multi-scale feature learning for objects of varying sizes.<n>Next, we introduce a tracklet RE-IDentification (REID) module that leverages visual semantics with a vision foundation model to effectively reduce ID SWitch (IDSW) errors.
arXiv Detail & Related papers (2025-06-09T02:49:10Z)
CAMELTrack: Context-Aware Multi-cue ExpLoitation for Online Multi-Object Tracking [68.24998698508344]
We introduce CAMEL, a novel association module for Context-Aware Multi-Cue ExpLoitation.<n>Unlike end-to-end detection-by-tracking approaches, our method remains lightweight and fast to train while being able to leverage external off-the-shelf models.<n>Our proposed online tracking pipeline, CAMELTrack, achieves state-of-the-art performance on multiple tracking benchmarks.
arXiv Detail & Related papers (2025-05-02T13:26:23Z)
MUST: The First Dataset and Unified Framework for Multispectral UAV Single Object Tracking [17.96400810834486]
We introduce the first large-scale Multispectral UAV Single Object Tracking dataset (MUST)<n>MUST includes 250 video sequences spanning diverse environments and challenges.<n>We also propose a novel tracking framework, UNTrack, which encodes unified spectral, spatial, and temporal features from spectrum prompts.
arXiv Detail & Related papers (2025-03-22T08:47:28Z)
SM3Det: A Unified Model for Multi-Modal Remote Sensing Object Detection [79.58755811919366]
This paper introduces a new task called Multi-Modal datasets and Multi-Task Object Detection (M2Det) for remote sensing.<n>It is designed to accurately detect horizontal or oriented objects from any sensor modality.<n>This task poses challenges due to 1) the trade-offs involved in managing multi-modal modelling and 2) the complexities of multi-task optimization.
arXiv Detail & Related papers (2024-12-30T02:47:51Z)
Bi-directional Adapter for Multi-modal Tracking [67.01179868400229]
We propose a novel multi-modal visual prompt tracking model based on a universal bi-directional adapter. We develop a simple but effective light feature adapter to transfer modality-specific information from one modality to another. Our model achieves superior tracking performance in comparison with both the full fine-tuning methods and the prompt learning-based methods.
arXiv Detail & Related papers (2023-12-17T05:27:31Z)
You Only Need Two Detectors to Achieve Multi-Modal 3D Multi-Object Tracking [9.20064374262956]
The proposed framework can achieve robust tracking by using only a 2D detector and a 3D detector. It is proven more accurate than many of the state-of-the-art TBD-based multi-modal tracking methods.
arXiv Detail & Related papers (2023-04-18T02:45:18Z)
Prompting for Multi-Modal Tracking [70.0522146292258]
We propose a novel multi-modal prompt tracker (ProTrack) for multi-modal tracking. ProTrack can transfer the multi-modal inputs to a single modality by the prompt paradigm. Our ProTrack can achieve high-performance multi-modal tracking by only altering the inputs, even without any extra training on multi-modal data.
arXiv Detail & Related papers (2022-07-29T09:35:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.