TCTrack: Temporal Contexts for Aerial Tracking
- URL: http://arxiv.org/abs/2203.01885v2
- Date: Sat, 5 Mar 2022 05:13:29 GMT
- Title: TCTrack: Temporal Contexts for Aerial Tracking
- Authors: Ziang Cao, Ziyuan Huang, Liang Pan, Shiwei Zhang, Ziwei Liu, Changhong
Fu
- Abstract summary: TCTrack is a comprehensive framework to fully exploit temporal contexts for aerial tracking.
For feature extraction, an online temporally adaptive convolution is proposed to enhance the spatial features.
For similarity map refinement, we propose an adaptive temporal transformer, which first effectively encodes temporal knowledge in a memory-efficient way.
- Score: 38.87248176223548
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Temporal contexts among consecutive frames are far from being fully utilized
in existing visual trackers. In this work, we present TCTrack, a comprehensive
framework to fully exploit temporal contexts for aerial tracking. The temporal
contexts are incorporated at \textbf{two levels}: the extraction of
\textbf{features} and the refinement of \textbf{similarity maps}. Specifically,
for feature extraction, an online temporally adaptive convolution is proposed
to enhance the spatial features using temporal information, which is achieved
by dynamically calibrating the convolution weights according to the previous
frames. For similarity map refinement, we propose an adaptive temporal
transformer, which first effectively encodes temporal knowledge in a
memory-efficient way, before the temporal knowledge is decoded for accurate
adjustment of the similarity map. TCTrack is effective and efficient:
evaluation on four aerial tracking benchmarks shows its impressive performance;
real-world UAV tests show its high speed of over 27 FPS on NVIDIA Jetson AGX
Xavier.
Related papers
- Local All-Pair Correspondence for Point Tracking [59.76186266230608]
We introduce LocoTrack, a highly accurate and efficient model designed for the task of tracking any point (TAP) across video sequences.
LocoTrack achieves unmatched accuracy on all TAP-Vid benchmarks and operates at a speed almost 6 times faster than the current state-of-the-art.
arXiv Detail & Related papers (2024-07-22T06:49:56Z) - MapTracker: Tracking with Strided Memory Fusion for Consistent Vector HD Mapping [21.5611219371754]
The paper presents a vector HD-mapping algorithm that formulates the mapping as a tracking task and uses a history of memory latents to ensure consistent reconstructions over time.
MapTracker significantly outperforms existing methods on both nuScenes and Agroverse2 datasets by over 8% and 19% on the conventional and the new consistency-aware metrics, respectively.
arXiv Detail & Related papers (2024-03-23T23:05:25Z) - Multi-step Temporal Modeling for UAV Tracking [14.687636301587045]
We introduce MT-Track, a streamlined and efficient multi-step temporal modeling framework for enhanced UAV tracking.
We unveil a unique temporal correlation module that dynamically assesses the interplay between the template and search region features.
We propose a mutual transformer module to refine the correlation maps of historical and current frames by modeling the temporal knowledge in the tracking sequence.
arXiv Detail & Related papers (2024-03-07T09:48:13Z) - ACTrack: Adding Spatio-Temporal Condition for Visual Object Tracking [0.5371337604556311]
Efficiently modeling-temporal relations of objects is a key challenge in visual object tracking (VOT)
Existing methods track by appearance-based similarity or long-term relation modeling, resulting in rich temporal contexts between consecutive frames being easily overlooked.
In this paper we present ACTrack, a new framework with additive pre-temporal tracking framework with large memory conditions. It preserves the quality and capabilities of the pre-trained backbone by freezing its parameters, and makes a trainable lightweight additive net to model temporal relations in tracking.
We design an additive siamese convolutional network to ensure the integrity of spatial features and temporal sequence
arXiv Detail & Related papers (2024-02-27T07:34:08Z) - Towards Real-World Visual Tracking with Temporal Contexts [64.7981374129495]
We propose a two-level framework (TCTrack) that can exploit temporal contexts efficiently.
Based on it, we propose a stronger version for real-world visual tracking, i.e., TCTrack++.
For feature extraction, we propose an attention-based temporally adaptive convolution to enhance the spatial features.
For similarity map refinement, we introduce an adaptive temporal transformer to encode the temporal knowledge efficiently.
arXiv Detail & Related papers (2023-08-20T17:59:40Z) - ProContEXT: Exploring Progressive Context Transformer for Tracking [20.35886416084831]
Existing Visual Object Tracking (VOT) only takes the target area in the first frame as a template.
This causes tracking to inevitably fail in fast-changing and crowded scenes, as it cannot account for changes in object appearance between frames.
We revamped the framework with Progressive Context.
Transformer Tracker (ProContEXT), which coherently exploits spatial and temporal contexts to predict object motion trajectories.
arXiv Detail & Related papers (2022-10-27T14:47:19Z) - Progressive Temporal Feature Alignment Network for Video Inpainting [51.26380898255555]
Video convolution aims to fill in-temporal "corrupted regions" with plausible content.
Current methods achieve this goal through attention, flow-based warping, or 3D temporal convolution.
We propose 'Progressive Temporal Feature Alignment Network', which progressively enriches features extracted from the current frame with the warped feature from neighbouring frames.
arXiv Detail & Related papers (2021-04-08T04:50:33Z) - AdaFuse: Adaptive Temporal Fusion Network for Efficient Action
Recognition [68.70214388982545]
Temporal modelling is the key for efficient video action recognition.
We introduce an adaptive temporal fusion network, called AdaFuse, that fuses channels from current and past feature maps.
Our approach can achieve about 40% computation savings with comparable accuracy to state-of-the-art methods.
arXiv Detail & Related papers (2021-02-10T23:31:02Z) - DS-Net: Dynamic Spatiotemporal Network for Video Salient Object
Detection [78.04869214450963]
We propose a novel dynamic temporal-temporal network (DSNet) for more effective fusion of temporal and spatial information.
We show that the proposed method achieves superior performance than state-of-the-art algorithms.
arXiv Detail & Related papers (2020-12-09T06:42:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.