MMOT: The First Challenging Benchmark for Drone-based Multispectral Multi-Object Tracking
- URL: http://arxiv.org/abs/2510.12565v1
- Date: Tue, 14 Oct 2025 14:25:17 GMT
- Title: MMOT: The First Challenging Benchmark for Drone-based Multispectral Multi-Object Tracking
- Authors: Tianhao Li, Tingfa Xu, Ying Wang, Haolin Qin, Xu Lin, Jianan Li,
- Abstract summary: MMOT is the first benchmark for drone-based multispectral multi-object tracking.<n>It features 125 video sequences with over 488.8K annotations across eight categories.<n>To better extract spectral features and leverage oriented annotations, we present a multispectral and orientation-aware MOT scheme.
- Score: 30.3437683353074
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Drone-based multi-object tracking is essential yet highly challenging due to small targets, severe occlusions, and cluttered backgrounds. Existing RGB-based tracking algorithms heavily depend on spatial appearance cues such as color and texture, which often degrade in aerial views, compromising reliability. Multispectral imagery, capturing pixel-level spectral reflectance, provides crucial cues that enhance object discriminability under degraded spatial conditions. However, the lack of dedicated multispectral UAV datasets has hindered progress in this domain. To bridge this gap, we introduce MMOT, the first challenging benchmark for drone-based multispectral multi-object tracking. It features three key characteristics: (i) Large Scale - 125 video sequences with over 488.8K annotations across eight categories; (ii) Comprehensive Challenges - covering diverse conditions such as extreme small targets, high-density scenarios, severe occlusions, and complex motion; and (iii) Precise Oriented Annotations - enabling accurate localization and reduced ambiguity under aerial perspectives. To better extract spectral features and leverage oriented annotations, we further present a multispectral and orientation-aware MOT scheme adapting existing methods, featuring: (i) a lightweight Spectral 3D-Stem integrating spectral features while preserving compatibility with RGB pretraining; (ii) an orientation-aware Kalman filter for precise state estimation; and (iii) an end-to-end orientation-adaptive transformer. Extensive experiments across representative trackers consistently show that multispectral input markedly improves tracking performance over RGB baselines, particularly for small and densely packed objects. We believe our work will advance drone-based multispectral multi-object tracking research. Our MMOT, code, and benchmarks are publicly available at https://github.com/Annzstbl/MMOT.
Related papers
- MODA: The First Challenging Benchmark for Multispectral Object Detection in Aerial Images [26.48439423478357]
We introduce the first large-scale dataset for Multispectral Object Detection in Aerial images (MODA)<n>This dataset comprises 14,041 MSIs and 330,191 annotations across diverse, challenging scenarios.<n>We also propose OSSDet, a framework that integrates spectral and spatial information with object-aware cues.
arXiv Detail & Related papers (2025-12-10T10:07:06Z) - A Tri-Modal Dataset and a Baseline System for Tracking Unmanned Aerial Vehicles [74.8162337823142]
MM-UAV is the first large-scale benchmark for Multi-Modal UAV Tracking.<n>The dataset spans over 30 challenging scenarios, with 1,321 synchronised multi-modal sequences, and more than 2.8 million annotated frames.<n>Accompanying the dataset, we provide a novel multi-modal multi-UAV tracking framework.
arXiv Detail & Related papers (2025-11-23T08:42:17Z) - MSITrack: A Challenging Benchmark for Multispectral Single Object Tracking [28.794206918661818]
MSITrack is the largest and most diverse multispectral single object tracking dataset to date.<n>It includes 55 object categories and 300 distinct natural scenes.<n> MSITrack significantly improves performance over RGB-only baselines.
arXiv Detail & Related papers (2025-10-08T03:56:36Z) - Collaborating Vision, Depth, and Thermal Signals for Multi-Modal Tracking: Dataset and Algorithm [103.36490810025752]
This work introduces a novel multi-modal tracking task that leverages three complementary modalities, including visible RGB, Depth (D), and Thermal Infrared (TIR)<n>We propose a novel multi-modal tracker, dubbed RDTTrack, which integrates tri-modal information for robust tracking by leveraging a pretrained RGB-only tracking model and prompt learning techniques.<n>The experimental results demonstrate the effectiveness and advantages of the proposed method, showing significant improvements over existing dual-modal approaches.
arXiv Detail & Related papers (2025-09-29T13:05:15Z) - MCOD: The First Challenging Benchmark for Multispectral Camouflaged Object Detection [26.760763912987795]
Camouflaged Object Detection (COD) aims to identify objects that blend seamlessly into natural scenes.<n>Existing COD benchmark datasets are exclusively RGB-based, lacking essential support for multispectral approaches.<n>We introduce MCOD, the first challenging benchmark dataset specifically designed for multispectral camouflaged object detection.
arXiv Detail & Related papers (2025-09-19T08:29:33Z) - DEPFusion: Dual-Domain Enhancement and Priority-Guided Mamba Fusion for UAV Multispectral Object Detection [6.4402018224356015]
A framework named DEPFusion is proposed for UAV multispectral object detection.<n>It consists of Dual-Domain Enhancement (DDE) and Priority-Guided Mamba Fusion (PGMF)<n>Experiments on DroneVehicle and VEDAI datasets demonstrate that DEPFusion achieves good performance with state-of-the-art methods.
arXiv Detail & Related papers (2025-09-09T01:51:57Z) - GRASPTrack: Geometry-Reasoned Association via Segmentation and Projection for Multi-Object Tracking [11.436294975354556]
GRASPTrack is a novel MOT framework that integrates monocular depth estimation and instance segmentation into a standard TBD pipeline.<n>These 3D point clouds are then voxelized to enable a precise and robust Voxel-Based 3D Intersection-over-Union.
arXiv Detail & Related papers (2025-08-11T15:56:21Z) - Tracking the Unstable: Appearance-Guided Motion Modeling for Robust Multi-Object Tracking in UAV-Captured Videos [58.156141601478794]
Multi-object tracking (UAVT) aims to track multiple objects while maintaining consistent identities across frames of a given video.<n>Existing methods typically model motion cues and appearance separately, overlooking their interplay and resulting in suboptimal tracking performance.<n>We propose AMOT, which exploits appearance and motion cues through two key components: an Appearance-Motion Consistency (AMC) matrix and a Motion-aware Track Continuation (MTC) module.
arXiv Detail & Related papers (2025-08-03T12:06:47Z) - MUST: The First Dataset and Unified Framework for Multispectral UAV Single Object Tracking [17.96400810834486]
We introduce the first large-scale Multispectral UAV Single Object Tracking dataset (MUST)<n>MUST includes 250 video sequences spanning diverse environments and challenges.<n>We also propose a novel tracking framework, UNTrack, which encodes unified spectral, spatial, and temporal features from spectrum prompts.
arXiv Detail & Related papers (2025-03-22T08:47:28Z) - SpectralGPT: Spectral Remote Sensing Foundation Model [60.023956954916414]
A universal RS foundation model, named SpectralGPT, is purpose-built to handle spectral RS images using a novel 3D generative pretrained transformer (GPT)
Compared to existing foundation models, SpectralGPT accommodates input images with varying sizes, resolutions, time series, and regions in a progressive training fashion, enabling full utilization of extensive RS big data.
Our evaluation highlights significant performance improvements with pretrained SpectralGPT models, signifying substantial potential in advancing spectral RS big data applications within the field of geoscience.
arXiv Detail & Related papers (2023-11-13T07:09:30Z) - Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline [80.13652104204691]
In this paper, we construct a large-scale benchmark with high diversity for visible-thermal UAV tracking (VTUAV)
We provide a coarse-to-fine attribute annotation, where frame-level attributes are provided to exploit the potential of challenge-specific trackers.
In addition, we design a new RGB-T baseline, named Hierarchical Multi-modal Fusion Tracker (HMFT), which fuses RGB-T data in various levels.
arXiv Detail & Related papers (2022-04-08T15:22:33Z) - M3DSSD: Monocular 3D Single Stage Object Detector [82.25793227026443]
We propose a Monocular 3D Single Stage object Detector (M3DSSD) with feature alignment and asymmetric non-local attention.
The proposed M3DSSD achieves significantly better performance than the monocular 3D object detection methods on the KITTI dataset.
arXiv Detail & Related papers (2021-03-24T13:09:11Z) - Dense Scene Multiple Object Tracking with Box-Plane Matching [73.54369833671772]
Multiple Object Tracking (MOT) is an important task in computer vision.
We propose the Box-Plane Matching (BPM) method to improve the MOT performacne in dense scenes.
With the effectiveness of the three modules, our team achieves the 1st place on the Track-1 leaderboard in the ACM MM Grand Challenge HiEve 2020.
arXiv Detail & Related papers (2020-07-30T16:39:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.