SMTrack: State-Aware Mamba for Efficient Temporal Modeling in Visual Tracking
- URL: http://arxiv.org/abs/2602.01677v1
- Date: Mon, 02 Feb 2026 05:44:59 GMT
- Title: SMTrack: State-Aware Mamba for Efficient Temporal Modeling in Visual Tracking
- Authors: Yinchao Ma, Dengqing Yang, Zhangyu He, Wenfei Yang, Tianzhu Zhang,
- Abstract summary: We propose a novel temporal modeling paradigm for visual tracking, termed State-aware Mamba Tracker (SMTrack)<n>SMTrack provides a neat pipeline for training and tracking without needing customized modules or substantial computational costs to build long-range temporal dependencies.<n>Extensive experimental results demonstrate that SMTrack achieves promising performance with low computational costs.
- Score: 39.1131712751769
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual tracking aims to automatically estimate the state of a target object in a video sequence, which is challenging especially in dynamic scenarios. Thus, numerous methods are proposed to introduce temporal cues to enhance tracking robustness. However, conventional CNN and Transformer architectures exhibit inherent limitations in modeling long-range temporal dependencies in visual tracking, often necessitating either complex customized modules or substantial computational costs to integrate temporal cues. Inspired by the success of the state space model, we propose a novel temporal modeling paradigm for visual tracking, termed State-aware Mamba Tracker (SMTrack), providing a neat pipeline for training and tracking without needing customized modules or substantial computational costs to build long-range temporal dependencies. It enjoys several merits. First, we propose a novel selective state-aware space model with state-wise parameters to capture more diverse temporal cues for robust tracking. Second, SMTrack facilitates long-range temporal interactions with linear computational complexity during training. Third, SMTrack enables each frame to interact with previously tracked frames via hidden state propagation and updating, which releases computational costs of handling temporal cues during tracking. Extensive experimental results demonstrate that SMTrack achieves promising performance with low computational costs.
Related papers
- Exploring Reliable Spatiotemporal Dependencies for Efficient Visual Tracking [9.64398631601942]
STDTrack is a framework that pioneers the integration of reliabletemporal dependencies into lightweight trackers.<n>We introduce a temporally propagatingtemporal token to guide per-frame feature extraction.<n>We develop a multi-scale prediction head to dynamically adapt to objects of different sizes.
arXiv Detail & Related papers (2026-01-14T02:22:05Z) - TrackingMiM: Efficient Mamba-in-Mamba Serialization for Real-time UAV Object Tracking [4.6672950054734255]
We propose TrackingMiM, a minimal-computation burden model for handling image sequence of tracking problem.<n>In our framework, the mamba scan is performed in a nested way while independently process temporal and spatial coherent patch tokens.
arXiv Detail & Related papers (2025-07-02T09:40:37Z) - CAMELTrack: Context-Aware Multi-cue ExpLoitation for Online Multi-Object Tracking [68.24998698508344]
We introduce CAMEL, a novel association module for Context-Aware Multi-Cue ExpLoitation.<n>Unlike end-to-end detection-by-tracking approaches, our method remains lightweight and fast to train while being able to leverage external off-the-shelf models.<n>Our proposed online tracking pipeline, CAMELTrack, achieves state-of-the-art performance on multiple tracking benchmarks.
arXiv Detail & Related papers (2025-05-02T13:26:23Z) - Online Dense Point Tracking with Streaming Memory [54.22820729477756]
Dense point tracking is a challenging task requiring the continuous tracking of every point in the initial frame throughout a substantial portion of a video.<n>Recent point tracking algorithms usually depend on sliding windows for indirect information propagation from the first frame to the current one.<n>We present a lightweight and fast model with textbfStreaming memory for dense textbfPOint textbfTracking and online video processing.
arXiv Detail & Related papers (2025-03-09T06:16:49Z) - Exploiting Multimodal Spatial-temporal Patterns for Video Object Tracking [53.33637391723555]
We propose a unified multimodal spatial-temporal tracking approach named STTrack.<n>In contrast to previous paradigms, we introduced a temporal state generator (TSG) that continuously generates a sequence of tokens containing multimodal temporal information.<n>These temporal information tokens are used to guide the localization of the target in the next time state, establish long-range contextual relationships between video frames, and capture the temporal trajectory of the target.
arXiv Detail & Related papers (2024-12-20T09:10:17Z) - Samba: Synchronized Set-of-Sequences Modeling for Multiple Object Tracking [97.25156823720211]
Multiple object tracking in complex scenarios, such as coordinated dance performances, team sports, or dynamic animal groups, presents unique challenges.
We introduce Samba, a novel linear-time set-of-sequences model designed to jointly process multiple tracklets.
Samba autoregressively predicts the future track query for each sequence while maintaining synchronized long-term memory representations.
We introduce an effective technique for dealing with uncertain observations (MaskObs) and an efficient training recipe to scale SambaMOTR to longer sequences.
arXiv Detail & Related papers (2024-10-02T17:59:57Z) - Exploring Dynamic Transformer for Efficient Object Tracking [58.120191254379854]
We propose DyTrack, a dynamic transformer framework for efficient tracking.<n>DyTrack automatically learns to configure proper reasoning routes for various inputs, gaining better utilization of the available computational budget.<n>Experiments on multiple benchmarks demonstrate that DyTrack achieves promising speed-precision trade-offs with only a single model.
arXiv Detail & Related papers (2024-03-26T12:31:58Z) - Multi-step Temporal Modeling for UAV Tracking [14.687636301587045]
We introduce MT-Track, a streamlined and efficient multi-step temporal modeling framework for enhanced UAV tracking.
We unveil a unique temporal correlation module that dynamically assesses the interplay between the template and search region features.
We propose a mutual transformer module to refine the correlation maps of historical and current frames by modeling the temporal knowledge in the tracking sequence.
arXiv Detail & Related papers (2024-03-07T09:48:13Z) - ACTrack: Adding Spatio-Temporal Condition for Visual Object Tracking [0.5371337604556311]
Efficiently modeling-temporal relations of objects is a key challenge in visual object tracking (VOT)
Existing methods track by appearance-based similarity or long-term relation modeling, resulting in rich temporal contexts between consecutive frames being easily overlooked.
In this paper we present ACTrack, a new framework with additive pre-temporal tracking framework with large memory conditions. It preserves the quality and capabilities of the pre-trained backbone by freezing its parameters, and makes a trainable lightweight additive net to model temporal relations in tracking.
We design an additive siamese convolutional network to ensure the integrity of spatial features and temporal sequence
arXiv Detail & Related papers (2024-02-27T07:34:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.