Learning Motion Blur Robust Vision Transformers with Dynamic Early Exit for Real-Time UAV Tracking
- URL: http://arxiv.org/abs/2407.05383v1
- Date: Sun, 7 Jul 2024 14:10:04 GMT
- Title: Learning Motion Blur Robust Vision Transformers with Dynamic Early Exit for Real-Time UAV Tracking
- Authors: You Wu, Xucheng Wang, Dan Zeng, Hengzhou Ye, Xiaolan Xie, Qijun Zhao, Shuiwang Li,
- Abstract summary: Single-stream architectures utilizing pre-trained ViT backbones offer improved performance, efficiency, and robustness.
We boost the efficiency of this framework by tailoring it into an adaptive framework that dynamically exits Transformer blocks for real-time UAV tracking.
We also improve the effectiveness of ViTs in handling motion blur, a common issue in UAV tracking caused by the fast movements of either the UAV, the tracked objects, or both.
- Score: 14.382072224997074
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, the surge in the adoption of single-stream architectures utilizing pre-trained ViT backbones represents a promising advancement in the field of generic visual tracking. By integrating feature extraction and fusion into a cohesive framework, these architectures offer improved performance, efficiency, and robustness. However, there has been limited exploration into optimizing these frameworks for UAV tracking. In this paper, we boost the efficiency of this framework by tailoring it into an adaptive computation framework that dynamically exits Transformer blocks for real-time UAV tracking. The motivation behind this is that tracking tasks with fewer challenges can be adequately addressed using low-level feature representations. Simpler tasks can often be handled with less demanding, lower-level features. This approach allows the model use computational resources more efficiently by focusing on complex tasks and conserving resources for easier ones. Another significant enhancement introduced in this paper is the improved effectiveness of ViTs in handling motion blur, a common issue in UAV tracking caused by the fast movements of either the UAV, the tracked objects, or both. This is achieved by acquiring motion blur robust representations through enforcing invariance in the feature representation of the target with respect to simulated motion blur. The proposed approach is dubbed BDTrack. Extensive experiments conducted on five tracking benchmarks validate the effectiveness and versatility of our approach, establishing it as a cutting-edge solution in real-time UAV tracking. Code is released at: https://github.com/wuyou3474/BDTrack.
Related papers
- SFTrack: A Robust Scale and Motion Adaptive Algorithm for Tracking Small and Fast Moving Objects [2.9803250365852443]
This paper addresses the problem of multi-object tracking in Unmanned Aerial Vehicle (UAV) footage.
It plays a critical role in various UAV applications, including traffic monitoring systems and real-time suspect tracking by the police.
We propose a new tracking strategy, which initiates the tracking of target objects from low-confidence detections.
arXiv Detail & Related papers (2024-10-26T05:09:20Z) - Adaptively Bypassing Vision Transformer Blocks for Efficient Visual Tracking [11.361394596302334]
ABTrack is an adaptive computation framework that adaptively bypassing transformer blocks for efficient visual tracking.
We propose a Bypass Decision Module (BDM) to determine if a transformer block should be bypassed.
We introduce a novel ViT pruning method to reduce the dimension of the latent representation of tokens in each transformer block.
arXiv Detail & Related papers (2024-06-12T09:39:18Z) - Exploring Dynamic Transformer for Efficient Object Tracking [58.120191254379854]
We propose DyTrack, a dynamic transformer framework for efficient tracking.
DyTrack automatically learns to configure proper reasoning routes for various inputs, gaining better utilization of the available computational budget.
Experiments on multiple benchmarks demonstrate that DyTrack achieves promising speed-precision trade-offs with only a single model.
arXiv Detail & Related papers (2024-03-26T12:31:58Z) - BEVTrack: A Simple and Strong Baseline for 3D Single Object Tracking in Bird's-Eye View [56.77287041917277]
3D Single Object Tracking (SOT) is a fundamental task of computer vision, proving essential for applications like autonomous driving.
In this paper, we propose BEVTrack, a simple yet effective baseline method.
By estimating the target motion in Bird's-Eye View (BEV) to perform tracking, BEVTrack demonstrates surprising simplicity from various aspects, i.e., network designs, training objectives, and tracking pipeline, while achieving superior performance.
arXiv Detail & Related papers (2023-09-05T12:42:26Z) - Learning Disentangled Representation with Mutual Information
Maximization for Real-Time UAV Tracking [1.0541541376305243]
This paper exploits disentangled representation with mutual information (DR-MIM) to improve precision and efficiency for UAV tracking.
Our DR-MIM tracker significantly outperforms state-of-the-art UAV tracking methods.
arXiv Detail & Related papers (2023-08-20T13:16:15Z) - MotionTrack: Learning Motion Predictor for Multiple Object Tracking [68.68339102749358]
We introduce a novel motion-based tracker, MotionTrack, centered around a learnable motion predictor.
Our experimental results demonstrate that MotionTrack yields state-of-the-art performance on datasets such as Dancetrack and SportsMOT.
arXiv Detail & Related papers (2023-06-05T04:24:11Z) - SGDViT: Saliency-Guided Dynamic Vision Transformer for UAV Tracking [12.447854608181833]
This work presents a novel saliency-guided dynamic vision Transformer (SGDViT) for UAV tracking.
The proposed method designs a new task-specific object saliency mining network to refine the cross-correlation operation.
A lightweight saliency filtering Transformer further refines saliency information and increases the focus on appearance information.
arXiv Detail & Related papers (2023-03-08T05:01:00Z) - Joint Spatial-Temporal and Appearance Modeling with Transformer for
Multiple Object Tracking [59.79252390626194]
We propose a novel solution named TransSTAM, which leverages Transformer to model both the appearance features of each object and the spatial-temporal relationships among objects.
The proposed method is evaluated on multiple public benchmarks including MOT16, MOT17, and MOT20, and it achieves a clear performance improvement in both IDF1 and HOTA.
arXiv Detail & Related papers (2022-05-31T01:19:18Z) - Implicit Motion Handling for Video Camouflaged Object Detection [60.98467179649398]
We propose a new video camouflaged object detection (VCOD) framework.
It can exploit both short-term and long-term temporal consistency to detect camouflaged objects from video frames.
arXiv Detail & Related papers (2022-03-14T17:55:41Z) - Distractor-Aware Fast Tracking via Dynamic Convolutions and MOT
Philosophy [63.91005999481061]
A practical long-term tracker typically contains three key properties, i.e. an efficient model design, an effective global re-detection strategy and a robust distractor awareness mechanism.
We propose a two-task tracking frame work (named DMTrack) to achieve distractor-aware fast tracking via Dynamic convolutions (d-convs) and Multiple object tracking (MOT) philosophy.
Our tracker achieves state-of-the-art performance on the LaSOT, OxUvA, TLP, VOT2018LT and VOT 2019LT benchmarks and runs in real-time (3x faster
arXiv Detail & Related papers (2021-04-25T00:59:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.