Multi-step Temporal Modeling for UAV Tracking
- URL: http://arxiv.org/abs/2403.04363v1
- Date: Thu, 7 Mar 2024 09:48:13 GMT
- Title: Multi-step Temporal Modeling for UAV Tracking
- Authors: Xiaoying Yuan, Tingfa Xu, Xincong Liu, Ying Wang, Haolin Qin, Yuqiang
Fang and Jianan Li
- Abstract summary: We introduce MT-Track, a streamlined and efficient multi-step temporal modeling framework for enhanced UAV tracking.
We unveil a unique temporal correlation module that dynamically assesses the interplay between the template and search region features.
We propose a mutual transformer module to refine the correlation maps of historical and current frames by modeling the temporal knowledge in the tracking sequence.
- Score: 14.687636301587045
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the realm of unmanned aerial vehicle (UAV) tracking, Siamese-based
approaches have gained traction due to their optimal balance between efficiency
and precision. However, UAV scenarios often present challenges such as
insufficient sampling resolution, fast motion and small objects with limited
feature information. As a result, temporal context in UAV tracking tasks plays
a pivotal role in target location, overshadowing the target's precise features.
In this paper, we introduce MT-Track, a streamlined and efficient multi-step
temporal modeling framework designed to harness the temporal context from
historical frames for enhanced UAV tracking. This temporal integration occurs
in two steps: correlation map generation and correlation map refinement.
Specifically, we unveil a unique temporal correlation module that dynamically
assesses the interplay between the template and search region features. This
module leverages temporal information to refresh the template feature, yielding
a more precise correlation map. Subsequently, we propose a mutual transformer
module to refine the correlation maps of historical and current frames by
modeling the temporal knowledge in the tracking sequence. This method
significantly trims computational demands compared to the raw transformer. The
compact yet potent nature of our tracking framework ensures commendable
tracking outcomes, particularly in extended tracking scenarios.
Related papers
- STCMOT: Spatio-Temporal Cohesion Learning for UAV-Based Multiple Object Tracking [13.269416985959404]
Multiple object tracking (MOT) in Unmanned Aerial Vehicle (UAV) videos is important for diverse applications in computer vision.
We propose a novel Spatio-Temporal Cohesion Multiple Object Tracking framework (STCMOT)
We use historical embedding features to model the representation of ReID and detection features in a sequential order.
Our framework sets a new state-of-the-art performance in MOTA and IDF1 metrics.
arXiv Detail & Related papers (2024-09-17T14:34:18Z) - DeTra: A Unified Model for Object Detection and Trajectory Forecasting [68.85128937305697]
Our approach formulates the union of the two tasks as a trajectory refinement problem.
To tackle this unified task, we design a refinement transformer that infers the presence, pose, and multi-modal future behaviors of objects.
In our experiments, we observe that ourmodel outperforms the state-of-the-art on Argoverse 2 Sensor and Open dataset.
arXiv Detail & Related papers (2024-06-06T18:12:04Z) - Autoregressive Queries for Adaptive Tracking with Spatio-TemporalTransformers [55.46413719810273]
rich-temporal information is crucial to the complicated target appearance in visual tracking.
Our method improves the tracker's performance on six popular tracking benchmarks.
arXiv Detail & Related papers (2024-03-15T02:39:26Z) - ACTrack: Adding Spatio-Temporal Condition for Visual Object Tracking [0.5371337604556311]
Efficiently modeling-temporal relations of objects is a key challenge in visual object tracking (VOT)
Existing methods track by appearance-based similarity or long-term relation modeling, resulting in rich temporal contexts between consecutive frames being easily overlooked.
In this paper we present ACTrack, a new framework with additive pre-temporal tracking framework with large memory conditions. It preserves the quality and capabilities of the pre-trained backbone by freezing its parameters, and makes a trainable lightweight additive net to model temporal relations in tracking.
We design an additive siamese convolutional network to ensure the integrity of spatial features and temporal sequence
arXiv Detail & Related papers (2024-02-27T07:34:08Z) - ProContEXT: Exploring Progressive Context Transformer for Tracking [20.35886416084831]
Existing Visual Object Tracking (VOT) only takes the target area in the first frame as a template.
This causes tracking to inevitably fail in fast-changing and crowded scenes, as it cannot account for changes in object appearance between frames.
We revamped the framework with Progressive Context.
Transformer Tracker (ProContEXT), which coherently exploits spatial and temporal contexts to predict object motion trajectories.
arXiv Detail & Related papers (2022-10-27T14:47:19Z) - Ret3D: Rethinking Object Relations for Efficient 3D Object Detection in
Driving Scenes [82.4186966781934]
We introduce a simple, efficient, and effective two-stage detector, termed as Ret3D.
At the core of Ret3D is the utilization of novel intra-frame and inter-frame relation modules.
With negligible extra overhead, Ret3D achieves the state-of-the-art performance.
arXiv Detail & Related papers (2022-08-18T03:48:58Z) - AiATrack: Attention in Attention for Transformer Visual Tracking [89.94386868729332]
Transformer trackers have achieved impressive advancements recently, where the attention mechanism plays an important role.
We propose an attention in attention (AiA) module, which enhances appropriate correlations and suppresses erroneous ones by seeking consensus among all correlation vectors.
Our AiA module can be readily applied to both self-attention blocks and cross-attention blocks to facilitate feature aggregation and information propagation for visual tracking.
arXiv Detail & Related papers (2022-07-20T00:44:03Z) - Joint Feature Learning and Relation Modeling for Tracking: A One-Stream
Framework [76.70603443624012]
We propose a novel one-stream tracking (OSTrack) framework that unifies feature learning and relation modeling.
In this way, discriminative target-oriented features can be dynamically extracted by mutual guidance.
OSTrack achieves state-of-the-art performance on multiple benchmarks, in particular, it shows impressive results on the one-shot tracking benchmark GOT-10k.
arXiv Detail & Related papers (2022-03-22T18:37:11Z) - DetectorNet: Transformer-enhanced Spatial Temporal Graph Neural Network
for Traffic Prediction [4.302265301004301]
Detectors with high coverage have direct and far-reaching benefits for road users in route planning and avoiding traffic congestion.
utilizing these data presents unique challenges including: the dynamic temporal correlation, and the dynamic spatial correlation caused by changes in road conditions.
We propose DetectorNet enhanced by Transformer to address these challenges.
arXiv Detail & Related papers (2021-10-19T03:47:38Z) - Forecast Network-Wide Traffic States for Multiple Steps Ahead: A Deep
Learning Approach Considering Dynamic Non-Local Spatial Correlation and
Non-Stationary Temporal Dependency [6.019104024723682]
This research studies two particular problems in traffic forecasting: (1) capture the dynamic and non-local spatial correlation between traffic links and (2) model the dynamics of temporal dependency for accurate multiple steps ahead predictions.
We propose a deep learning framework named Spatial-Temporal Sequence to Sequence model (STSeq2Seq) to address these issues.
This model builds on sequence to sequence (seq2seq) architecture to capture temporal feature and relies on graph convolution for aggregating spatial information.
arXiv Detail & Related papers (2020-04-06T03:40:56Z) - Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence.
This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time.
Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
arXiv Detail & Related papers (2020-02-21T05:00:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.