Target-aware Bidirectional Fusion Transformer for Aerial Object Tracking
- URL: http://arxiv.org/abs/2503.09951v1
- Date: Thu, 13 Mar 2025 01:53:29 GMT
- Title: Target-aware Bidirectional Fusion Transformer for Aerial Object Tracking
- Authors: Xinglong Sun, Haijiang Sun, Shan Jiang, Jiacheng Wang, Jiasong Wang,
- Abstract summary: We propose a novel target-aware Bidirectional Fusion transformer (BFTrans) for UAV tracking.<n>Our approach can exceed other state-of-the-art trackers and run with an average speed of 30.5 FPS on embedded platform.
- Score: 4.199091332200661
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The trackers based on lightweight neural networks have achieved great success in the field of aerial remote sensing, most of which aggregate multi-stage deep features to lift the tracking quality. However, existing algorithms usually only generate single-stage fusion features for state decision, which ignore that diverse kinds of features are required for identifying and locating the object, limiting the robustness and precision of tracking. In this paper, we propose a novel target-aware Bidirectional Fusion transformer (BFTrans) for UAV tracking. Specifically, we first present a two-stream fusion network based on linear self and cross attentions, which can combine the shallow and the deep features from both forward and backward directions, providing the adjusted local details for location and global semantics for recognition. Besides, a target-aware positional encoding strategy is designed for the above fusion model, which is helpful to perceive the object-related attributes during the fusion phase. Finally, the proposed method is evaluated on several popular UAV benchmarks, including UAV-123, UAV20L and UAVTrack112. Massive experimental results demonstrate that our approach can exceed other state-of-the-art trackers and run with an average speed of 30.5 FPS on embedded platform, which is appropriate for practical drone deployments.
Related papers
- SFTrack: A Robust Scale and Motion Adaptive Algorithm for Tracking Small and Fast Moving Objects [2.9803250365852443]
This paper addresses the problem of multi-object tracking in Unmanned Aerial Vehicle (UAV) footage.
It plays a critical role in various UAV applications, including traffic monitoring systems and real-time suspect tracking by the police.
We propose a new tracking strategy, which initiates the tracking of target objects from low-confidence detections.
arXiv Detail & Related papers (2024-10-26T05:09:20Z) - SOOD++: Leveraging Unlabeled Data to Boost Oriented Object Detection [59.868772767818975]
We propose a simple yet effective Semi-supervised Oriented Object Detection method termed SOOD++.
Specifically, we observe that objects from aerial images are usually arbitrary orientations, small scales, and aggregation.
Extensive experiments conducted on various multi-oriented object datasets under various labeled settings demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2024-07-01T07:03:51Z) - Multi-Modal UAV Detection, Classification and Tracking Algorithm -- Technical Report for CVPR 2024 UG2 Challenge [20.459377705070043]
This report presents the 1st winning model for UG2+, a task in CVPR 2024 UAV Tracking and Pose-Estimation Challenge.
We propose a multi-modal UAV detection, classification, and 3D tracking method for accurate UAV classification and tracking.
Our system integrates cutting-edge classification techniques and sophisticated post-processing steps to boost accuracy and robustness.
arXiv Detail & Related papers (2024-05-26T07:21:18Z) - An Improved End-to-End Multi-Target Tracking Method Based on Transformer
Self-Attention [24.17627001939523]
This study proposes an improved end-to-end multi-target tracking algorithm.
It adapts to multi-view multi-scale scenes based on the self-attentive mechanism of the transformer's encoder-decoder structure.
arXiv Detail & Related papers (2022-11-11T04:58:46Z) - Joint Spatial-Temporal and Appearance Modeling with Transformer for
Multiple Object Tracking [59.79252390626194]
We propose a novel solution named TransSTAM, which leverages Transformer to model both the appearance features of each object and the spatial-temporal relationships among objects.
The proposed method is evaluated on multiple public benchmarks including MOT16, MOT17, and MOT20, and it achieves a clear performance improvement in both IDF1 and HOTA.
arXiv Detail & Related papers (2022-05-31T01:19:18Z) - Trajectory Design for UAV-Based Internet-of-Things Data Collection: A
Deep Reinforcement Learning Approach [93.67588414950656]
In this paper, we investigate an unmanned aerial vehicle (UAV)-assisted Internet-of-Things (IoT) system in a 3D environment.
We present a TD3-based trajectory design for completion time minimization (TD3-TDCTM) algorithm.
Our simulation results show the superiority of the proposed TD3-TDCTM algorithm over three conventional non-learning based baseline methods.
arXiv Detail & Related papers (2021-07-23T03:33:29Z) - CFTrack: Center-based Radar and Camera Fusion for 3D Multi-Object
Tracking [9.62721286522053]
We propose an end-to-end network for joint object detection and tracking based on radar and camera sensor fusion.
Our proposed method uses a center-based radar-camera fusion algorithm for object detection and utilizes a greedy algorithm for object association.
We evaluate our method on the challenging nuScenes dataset, where it achieves 20.0 AMOTA and outperforms all vision-based 3D tracking methods in the benchmark.
arXiv Detail & Related papers (2021-07-11T23:56:53Z) - Distractor-Aware Fast Tracking via Dynamic Convolutions and MOT
Philosophy [63.91005999481061]
A practical long-term tracker typically contains three key properties, i.e. an efficient model design, an effective global re-detection strategy and a robust distractor awareness mechanism.
We propose a two-task tracking frame work (named DMTrack) to achieve distractor-aware fast tracking via Dynamic convolutions (d-convs) and Multiple object tracking (MOT) philosophy.
Our tracker achieves state-of-the-art performance on the LaSOT, OxUvA, TLP, VOT2018LT and VOT 2019LT benchmarks and runs in real-time (3x faster
arXiv Detail & Related papers (2021-04-25T00:59:53Z) - MRDet: A Multi-Head Network for Accurate Oriented Object Detection in
Aerial Images [51.227489316673484]
We propose an arbitrary-oriented region proposal network (AO-RPN) to generate oriented proposals transformed from horizontal anchors.
To obtain accurate bounding boxes, we decouple the detection task into multiple subtasks and propose a multi-head network.
Each head is specially designed to learn the features optimal for the corresponding task, which allows our network to detect objects accurately.
arXiv Detail & Related papers (2020-12-24T06:36:48Z) - Tracking-by-Counting: Using Network Flows on Crowd Density Maps for
Tracking Multiple Targets [96.98888948518815]
State-of-the-art multi-object tracking(MOT) methods follow the tracking-by-detection paradigm.
We propose a new MOT paradigm, tracking-by-counting, tailored for crowded scenes.
arXiv Detail & Related papers (2020-07-18T19:51:53Z) - Autonomous and cooperative design of the monitor positions for a team of
UAVs to maximize the quantity and quality of detected objects [0.5801044612920815]
This paper tackles the problem of positioning a swarm of UAVs inside a completely unknown terrain.
YOLOv3 and a system to identify duplicate objects of interest were employed to assign a single score to each UAVs' configuration.
A novel navigation algorithm, capable of optimizing the previously defined score, is proposed.
arXiv Detail & Related papers (2020-07-02T16:52:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.