Related papers: Multi-step Temporal Modeling for UAV Tracking

Multi-step Temporal Modeling for UAV Tracking

URL: http://arxiv.org/abs/2403.04363v1
Date: Thu, 7 Mar 2024 09:48:13 GMT
Title: Multi-step Temporal Modeling for UAV Tracking
Authors: Xiaoying Yuan, Tingfa Xu, Xincong Liu, Ying Wang, Haolin Qin, Yuqiang Fang and Jianan Li
Abstract summary: We introduce MT-Track, a streamlined and efficient multi-step temporal modeling framework for enhanced UAV tracking. We unveil a unique temporal correlation module that dynamically assesses the interplay between the template and search region features. We propose a mutual transformer module to refine the correlation maps of historical and current frames by modeling the temporal knowledge in the tracking sequence.
Score: 14.687636301587045
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the realm of unmanned aerial vehicle (UAV) tracking, Siamese-based approaches have gained traction due to their optimal balance between efficiency and precision. However, UAV scenarios often present challenges such as insufficient sampling resolution, fast motion and small objects with limited feature information. As a result, temporal context in UAV tracking tasks plays a pivotal role in target location, overshadowing the target's precise features. In this paper, we introduce MT-Track, a streamlined and efficient multi-step temporal modeling framework designed to harness the temporal context from historical frames for enhanced UAV tracking. This temporal integration occurs in two steps: correlation map generation and correlation map refinement. Specifically, we unveil a unique temporal correlation module that dynamically assesses the interplay between the template and search region features. This module leverages temporal information to refresh the template feature, yielding a more precise correlation map. Subsequently, we propose a mutual transformer module to refine the correlation maps of historical and current frames by modeling the temporal knowledge in the tracking sequence. This method significantly trims computational demands compared to the raw transformer. The compact yet potent nature of our tracking framework ensures commendable tracking outcomes, particularly in extended tracking scenarios.

Related papers

Tracktention: Leveraging Point Tracking to Attend Videos Faster and Better [61.381599921020175]
Temporal consistency is critical in video prediction to ensure that outputs are coherent and free of artifacts. Traditional methods, such as temporal attention and 3D convolution, may struggle with significant object motion. We propose the Tracktention Layer, a novel architectural component that explicitly integrates motion information using point tracks.
arXiv Detail & Related papers (2025-03-25T17:58:48Z)
Exploiting Multimodal Spatial-temporal Patterns for Video Object Tracking [53.33637391723555]
We propose a unified multimodal spatial-temporal tracking approach named STTrack. In contrast to previous paradigms, we introduced a temporal state generator (TSG) that continuously generates a sequence of tokens containing multimodal temporal information. These temporal information tokens are used to guide the localization of the target in the next time state, establish long-range contextual relationships between video frames, and capture the temporal trajectory of the target.
arXiv Detail & Related papers (2024-12-20T09:10:17Z)
STCMOT: Spatio-Temporal Cohesion Learning for UAV-Based Multiple Object Tracking [13.269416985959404]
Multiple object tracking (MOT) in Unmanned Aerial Vehicle (UAV) videos is important for diverse applications in computer vision. We propose a novel Spatio-Temporal Cohesion Multiple Object Tracking framework (STCMOT) We use historical embedding features to model the representation of ReID and detection features in a sequential order. Our framework sets a new state-of-the-art performance in MOTA and IDF1 metrics.
arXiv Detail & Related papers (2024-09-17T14:34:18Z)
DeTra: A Unified Model for Object Detection and Trajectory Forecasting [68.85128937305697]
Our approach formulates the union of the two tasks as a trajectory refinement problem. To tackle this unified task, we design a refinement transformer that infers the presence, pose, and multi-modal future behaviors of objects. In our experiments, we observe that ourmodel outperforms the state-of-the-art on Argoverse 2 Sensor and Open dataset.
arXiv Detail & Related papers (2024-06-06T18:12:04Z)
Autoregressive Queries for Adaptive Tracking with Spatio-TemporalTransformers [55.46413719810273]
rich-temporal information is crucial to the complicated target appearance in visual tracking. Our method improves the tracker's performance on six popular tracking benchmarks.
arXiv Detail & Related papers (2024-03-15T02:39:26Z)
ACTrack: Adding Spatio-Temporal Condition for Visual Object Tracking [0.5371337604556311]
Efficiently modeling-temporal relations of objects is a key challenge in visual object tracking (VOT) Existing methods track by appearance-based similarity or long-term relation modeling, resulting in rich temporal contexts between consecutive frames being easily overlooked. In this paper we present ACTrack, a new framework with additive pre-temporal tracking framework with large memory conditions. It preserves the quality and capabilities of the pre-trained backbone by freezing its parameters, and makes a trainable lightweight additive net to model temporal relations in tracking. We design an additive siamese convolutional network to ensure the integrity of spatial features and temporal sequence
arXiv Detail & Related papers (2024-02-27T07:34:08Z)
ProContEXT: Exploring Progressive Context Transformer for Tracking [20.35886416084831]
Existing Visual Object Tracking (VOT) only takes the target area in the first frame as a template. This causes tracking to inevitably fail in fast-changing and crowded scenes, as it cannot account for changes in object appearance between frames. We revamped the framework with Progressive Context. Transformer Tracker (ProContEXT), which coherently exploits spatial and temporal contexts to predict object motion trajectories.
arXiv Detail & Related papers (2022-10-27T14:47:19Z)
Ret3D: Rethinking Object Relations for Efficient 3D Object Detection in Driving Scenes [82.4186966781934]
We introduce a simple, efficient, and effective two-stage detector, termed as Ret3D. At the core of Ret3D is the utilization of novel intra-frame and inter-frame relation modules. With negligible extra overhead, Ret3D achieves the state-of-the-art performance.
arXiv Detail & Related papers (2022-08-18T03:48:58Z)
AiATrack: Attention in Attention for Transformer Visual Tracking [89.94386868729332]
Transformer trackers have achieved impressive advancements recently, where the attention mechanism plays an important role. We propose an attention in attention (AiA) module, which enhances appropriate correlations and suppresses erroneous ones by seeking consensus among all correlation vectors. Our AiA module can be readily applied to both self-attention blocks and cross-attention blocks to facilitate feature aggregation and information propagation for visual tracking.
arXiv Detail & Related papers (2022-07-20T00:44:03Z)
Joint Feature Learning and Relation Modeling for Tracking: A One-Stream Framework [76.70603443624012]
We propose a novel one-stream tracking (OSTrack) framework that unifies feature learning and relation modeling. In this way, discriminative target-oriented features can be dynamically extracted by mutual guidance. OSTrack achieves state-of-the-art performance on multiple benchmarks, in particular, it shows impressive results on the one-shot tracking benchmark GOT-10k.
arXiv Detail & Related papers (2022-03-22T18:37:11Z)
DetectorNet: Transformer-enhanced Spatial Temporal Graph Neural Network for Traffic Prediction [4.302265301004301]
Detectors with high coverage have direct and far-reaching benefits for road users in route planning and avoiding traffic congestion. utilizing these data presents unique challenges including: the dynamic temporal correlation, and the dynamic spatial correlation caused by changes in road conditions. We propose DetectorNet enhanced by Transformer to address these challenges.
arXiv Detail & Related papers (2021-10-19T03:47:38Z)
Forecast Network-Wide Traffic States for Multiple Steps Ahead: A Deep Learning Approach Considering Dynamic Non-Local Spatial Correlation and Non-Stationary Temporal Dependency [6.019104024723682]
This research studies two particular problems in traffic forecasting: (1) capture the dynamic and non-local spatial correlation between traffic links and (2) model the dynamics of temporal dependency for accurate multiple steps ahead predictions. We propose a deep learning framework named Spatial-Temporal Sequence to Sequence model (STSeq2Seq) to address these issues. This model builds on sequence to sequence (seq2seq) architecture to capture temporal feature and relies on graph convolution for aggregating spatial information.
arXiv Detail & Related papers (2020-04-06T03:40:56Z)
Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence. This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time. Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
arXiv Detail & Related papers (2020-02-21T05:00:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.