Related papers: ACTrack: Adding Spatio-Temporal Condition for Visual Object Tracking

ACTrack: Adding Spatio-Temporal Condition for Visual Object Tracking

URL: http://arxiv.org/abs/2403.07914v1
Date: Tue, 27 Feb 2024 07:34:08 GMT
Title: ACTrack: Adding Spatio-Temporal Condition for Visual Object Tracking
Authors: Yushan Han, Kaer Huang,
Abstract summary: Efficiently modeling-temporal relations of objects is a key challenge in visual object tracking (VOT) Existing methods track by appearance-based similarity or long-term relation modeling, resulting in rich temporal contexts between consecutive frames being easily overlooked. In this paper we present ACTrack, a new framework with additive pre-temporal tracking framework with large memory conditions. It preserves the quality and capabilities of the pre-trained backbone by freezing its parameters, and makes a trainable lightweight additive net to model temporal relations in tracking. We design an additive siamese convolutional network to ensure the integrity of spatial features and temporal sequence
Score: 0.5371337604556311
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Efficiently modeling spatio-temporal relations of objects is a key challenge in visual object tracking (VOT). Existing methods track by appearance-based similarity or long-term relation modeling, resulting in rich temporal contexts between consecutive frames being easily overlooked. Moreover, training trackers from scratch or fine-tuning large pre-trained models needs more time and memory consumption. In this paper, we present ACTrack, a new tracking framework with additive spatio-temporal conditions. It preserves the quality and capabilities of the pre-trained Transformer backbone by freezing its parameters, and makes a trainable lightweight additive net to model spatio-temporal relations in tracking. We design an additive siamese convolutional network to ensure the integrity of spatial features and perform temporal sequence modeling to simplify the tracking pipeline. Experimental results on several benchmarks prove that ACTrack could balance training efficiency and tracking performance.

Related papers

Decoupled Spatio-Temporal Consistency Learning for Self-Supervised Tracking [12.910676293067231]
We present a Self-Supervised Tracking framework named textbftracker designed to eliminate the need of box annotations.<n>We show that tracker surpasses textitOTA self-supervised tracking methods, achieving an improvement of more than 25.3%, 20.4%, and 14.8% in AUC (AO) score on the GOT10K, LaSOT, TrackingNet datasets, respectively.
arXiv Detail & Related papers (2025-07-29T09:04:03Z)
Tracktention: Leveraging Point Tracking to Attend Videos Faster and Better [61.381599921020175]
Temporal consistency is critical in video prediction to ensure that outputs are coherent and free of artifacts. Traditional methods, such as temporal attention and 3D convolution, may struggle with significant object motion. We propose the Tracktention Layer, a novel architectural component that explicitly integrates motion information using point tracks.
arXiv Detail & Related papers (2025-03-25T17:58:48Z)
Online Dense Point Tracking with Streaming Memory [54.22820729477756]
Dense point tracking is a challenging task requiring the continuous tracking of every point in the initial frame throughout a substantial portion of a video. Recent point tracking algorithms usually depend on sliding windows for indirect information propagation from the first frame to the current one. We present a lightweight and fast model with textbfStreaming memory for dense textbfPOint textbfTracking and online video processing.
arXiv Detail & Related papers (2025-03-09T06:16:49Z)
STCMOT: Spatio-Temporal Cohesion Learning for UAV-Based Multiple Object Tracking [13.269416985959404]
Multiple object tracking (MOT) in Unmanned Aerial Vehicle (UAV) videos is important for diverse applications in computer vision. We propose a novel Spatio-Temporal Cohesion Multiple Object Tracking framework (STCMOT) We use historical embedding features to model the representation of ReID and detection features in a sequential order. Our framework sets a new state-of-the-art performance in MOTA and IDF1 metrics.
arXiv Detail & Related papers (2024-09-17T14:34:18Z)
Representation Alignment Contrastive Regularization for Multi-Object Tracking [29.837560662395713]
Mainstream-performance in multi-object tracking algorithms relies on modeling heavily-temporal relationships during the data association stage. This work aims to simplify deep learning-based,temporal relationship models and introduce interpretability into data association designs.
arXiv Detail & Related papers (2024-04-03T08:33:08Z)
Exploring Dynamic Transformer for Efficient Object Tracking [58.120191254379854]
We propose DyTrack, a dynamic transformer framework for efficient tracking. DyTrack automatically learns to configure proper reasoning routes for various inputs, gaining better utilization of the available computational budget. Experiments on multiple benchmarks demonstrate that DyTrack achieves promising speed-precision trade-offs with only a single model.
arXiv Detail & Related papers (2024-03-26T12:31:58Z)
Autoregressive Queries for Adaptive Tracking with Spatio-TemporalTransformers [55.46413719810273]
rich-temporal information is crucial to the complicated target appearance in visual tracking. Our method improves the tracker's performance on six popular tracking benchmarks.
arXiv Detail & Related papers (2024-03-15T02:39:26Z)
Multi-step Temporal Modeling for UAV Tracking [14.687636301587045]
We introduce MT-Track, a streamlined and efficient multi-step temporal modeling framework for enhanced UAV tracking. We unveil a unique temporal correlation module that dynamically assesses the interplay between the template and search region features. We propose a mutual transformer module to refine the correlation maps of historical and current frames by modeling the temporal knowledge in the tracking sequence.
arXiv Detail & Related papers (2024-03-07T09:48:13Z)
LEAP-VO: Long-term Effective Any Point Tracking for Visual Odometry [52.131996528655094]
We present the Long-term Effective Any Point Tracking (LEAP) module. LEAP innovatively combines visual, inter-track, and temporal cues with mindfully selected anchors for dynamic track estimation. Based on these traits, we develop LEAP-VO, a robust visual odometry system adept at handling occlusions and dynamic scenes.
arXiv Detail & Related papers (2024-01-03T18:57:27Z)
Tracking Objects and Activities with Attention for Temporal Sentence Grounding [51.416914256782505]
Temporal sentence (TSG) aims to localize the temporal segment which is semantically aligned with a natural language query in an untrimmed segment. We propose a novel Temporal Sentence Tracking Network (TSTNet), which contains (A) a Cross-modal Targets Generator to generate multi-modal and search space, and (B) a Temporal Sentence Tracker to track multi-modal targets' behavior and to predict query-related segment.
arXiv Detail & Related papers (2023-02-21T16:42:52Z)
ProContEXT: Exploring Progressive Context Transformer for Tracking [20.35886416084831]
Existing Visual Object Tracking (VOT) only takes the target area in the first frame as a template. This causes tracking to inevitably fail in fast-changing and crowded scenes, as it cannot account for changes in object appearance between frames. We revamped the framework with Progressive Context. Transformer Tracker (ProContEXT), which coherently exploits spatial and temporal contexts to predict object motion trajectories.
arXiv Detail & Related papers (2022-10-27T14:47:19Z)
Joint Feature Learning and Relation Modeling for Tracking: A One-Stream Framework [76.70603443624012]
We propose a novel one-stream tracking (OSTrack) framework that unifies feature learning and relation modeling. In this way, discriminative target-oriented features can be dynamically extracted by mutual guidance. OSTrack achieves state-of-the-art performance on multiple benchmarks, in particular, it shows impressive results on the one-shot tracking benchmark GOT-10k.
arXiv Detail & Related papers (2022-03-22T18:37:11Z)
Continuity-Discrimination Convolutional Neural Network for Visual Object Tracking [150.51667609413312]
This paper proposes a novel model, named Continuity-Discrimination Convolutional Neural Network (CD-CNN) for visual object tracking. To address this problem, CD-CNN models temporal appearance continuity based on the idea of temporal slowness. In order to alleviate inaccurate target localization and drifting, we propose a novel notion, object-centroid.
arXiv Detail & Related papers (2021-04-18T06:35:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.