ProContEXT: Exploring Progressive Context Transformer for Tracking
- URL: http://arxiv.org/abs/2210.15511v4
- Date: Thu, 30 Mar 2023 06:12:26 GMT
- Title: ProContEXT: Exploring Progressive Context Transformer for Tracking
- Authors: Jin-Peng Lan, Zhi-Qi Cheng, Jun-Yan He, Chenyang Li, Bin Luo, Xu Bao,
Wangmeng Xiang, Yifeng Geng, Xuansong Xie
- Abstract summary: Existing Visual Object Tracking (VOT) only takes the target area in the first frame as a template.
This causes tracking to inevitably fail in fast-changing and crowded scenes, as it cannot account for changes in object appearance between frames.
We revamped the framework with Progressive Context.
Transformer Tracker (ProContEXT), which coherently exploits spatial and temporal contexts to predict object motion trajectories.
- Score: 20.35886416084831
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing Visual Object Tracking (VOT) only takes the target area in the first
frame as a template. This causes tracking to inevitably fail in fast-changing
and crowded scenes, as it cannot account for changes in object appearance
between frames. To this end, we revamped the tracking framework with
Progressive Context Encoding Transformer Tracker (ProContEXT), which coherently
exploits spatial and temporal contexts to predict object motion trajectories.
Specifically, ProContEXT leverages a context-aware self-attention module to
encode the spatial and temporal context, refining and updating the multi-scale
static and dynamic templates to progressively perform accurately tracking. It
explores the complementary between spatial and temporal context, raising a new
pathway to multi-context modeling for transformer-based trackers. In addition,
ProContEXT revised the token pruning technique to reduce computational
complexity. Extensive experiments on popular benchmark datasets such as GOT-10k
and TrackingNet demonstrate that the proposed ProContEXT achieves
state-of-the-art performance.
Related papers
- Exploring Dynamic Transformer for Efficient Object Tracking [58.120191254379854]
We propose DyTrack, a dynamic transformer framework for efficient tracking.
DyTrack automatically learns to configure proper reasoning routes for various inputs, gaining better utilization of the available computational budget.
Experiments on multiple benchmarks demonstrate that DyTrack achieves promising speed-precision trade-offs with only a single model.
arXiv Detail & Related papers (2024-03-26T12:31:58Z) - Autoregressive Queries for Adaptive Tracking with Spatio-TemporalTransformers [55.46413719810273]
rich-temporal information is crucial to the complicated target appearance in visual tracking.
Our method improves the tracker's performance on six popular tracking benchmarks.
arXiv Detail & Related papers (2024-03-15T02:39:26Z) - Multi-step Temporal Modeling for UAV Tracking [14.687636301587045]
We introduce MT-Track, a streamlined and efficient multi-step temporal modeling framework for enhanced UAV tracking.
We unveil a unique temporal correlation module that dynamically assesses the interplay between the template and search region features.
We propose a mutual transformer module to refine the correlation maps of historical and current frames by modeling the temporal knowledge in the tracking sequence.
arXiv Detail & Related papers (2024-03-07T09:48:13Z) - ACTrack: Adding Spatio-Temporal Condition for Visual Object Tracking [0.5371337604556311]
Efficiently modeling-temporal relations of objects is a key challenge in visual object tracking (VOT)
Existing methods track by appearance-based similarity or long-term relation modeling, resulting in rich temporal contexts between consecutive frames being easily overlooked.
In this paper we present ACTrack, a new framework with additive pre-temporal tracking framework with large memory conditions. It preserves the quality and capabilities of the pre-trained backbone by freezing its parameters, and makes a trainable lightweight additive net to model temporal relations in tracking.
We design an additive siamese convolutional network to ensure the integrity of spatial features and temporal sequence
arXiv Detail & Related papers (2024-02-27T07:34:08Z) - Tracking Objects and Activities with Attention for Temporal Sentence
Grounding [51.416914256782505]
Temporal sentence (TSG) aims to localize the temporal segment which is semantically aligned with a natural language query in an untrimmed segment.
We propose a novel Temporal Sentence Tracking Network (TSTNet), which contains (A) a Cross-modal Targets Generator to generate multi-modal and search space, and (B) a Temporal Sentence Tracker to track multi-modal targets' behavior and to predict query-related segment.
arXiv Detail & Related papers (2023-02-21T16:42:52Z) - PTSEFormer: Progressive Temporal-Spatial Enhanced TransFormer Towards
Video Object Detection [28.879484515844375]
We introduce a progressive way to introduce both temporal information and spatial information for an integrated enhancement.
PTSEFormer follows an end-to-end fashion to avoid heavy post-processing procedures while achieving 88.1% mAP on the ImageNet VID dataset.
arXiv Detail & Related papers (2022-09-06T06:32:57Z) - Joint Spatial-Temporal and Appearance Modeling with Transformer for
Multiple Object Tracking [59.79252390626194]
We propose a novel solution named TransSTAM, which leverages Transformer to model both the appearance features of each object and the spatial-temporal relationships among objects.
The proposed method is evaluated on multiple public benchmarks including MOT16, MOT17, and MOT20, and it achieves a clear performance improvement in both IDF1 and HOTA.
arXiv Detail & Related papers (2022-05-31T01:19:18Z) - Context-aware Visual Tracking with Joint Meta-updating [11.226947525556813]
We propose a context-aware tracking model to optimize the tracker over the representation space, which jointly meta-update both branches by exploiting information along the whole sequence.
The proposed tracking method achieves an EAO score of 0.514 on VOT2018 with the speed of 40FPS, demonstrating its capability of improving the accuracy and robustness of the underlying tracker with little speed drop.
arXiv Detail & Related papers (2022-04-04T14:16:00Z) - Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual
Tracking [47.205979159070445]
We bridge the individual video frames and explore the temporal contexts across them via a transformer architecture for robust object tracking.
Different from classic usage of the transformer in natural language processing tasks, we separate its encoder and decoder into two parallel branches.
Our method sets several new state-of-the-art records on prevalent tracking benchmarks.
arXiv Detail & Related papers (2021-03-22T09:20:05Z) - TrackFormer: Multi-Object Tracking with Transformers [92.25832593088421]
TrackFormer is an end-to-end multi-object tracking and segmentation model based on an encoder-decoder Transformer architecture.
New track queries are spawned by the DETR object detector and embed the position of their corresponding object over time.
TrackFormer achieves a seamless data association between frames in a new tracking-by-attention paradigm.
arXiv Detail & Related papers (2021-01-07T18:59:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.