Multi-State Tracker: Enhancing Efficient Object Tracking via Multi-State Specialization and Interaction
- URL: http://arxiv.org/abs/2508.11531v1
- Date: Fri, 15 Aug 2025 15:19:39 GMT
- Title: Multi-State Tracker: Enhancing Efficient Object Tracking via Multi-State Specialization and Interaction
- Authors: Shilei Wang, Gong Cheng, Pujian Lai, Dong Gao, Junwei Han,
- Abstract summary: Multi-State Tracker (MST) outperforms all previous efficient trackers across multiple datasets.<n>MST generates multiple state representations at multiple stages during feature extraction.<n>It incurs only 0.1 GFLOPs in computation and 0.66 M in parameters.
- Score: 49.36913716757758
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Efficient trackers achieve faster runtime by reducing computational complexity and model parameters. However, this efficiency often compromises the expense of weakened feature representation capacity, thus limiting their ability to accurately capture target states using single-layer features. To overcome this limitation, we propose Multi-State Tracker (MST), which utilizes highly lightweight state-specific enhancement (SSE) to perform specialized enhancement on multi-state features produced by multi-state generation (MSG) and aggregates them in an interactive and adaptive manner using cross-state interaction (CSI). This design greatly enhances feature representation while incurring minimal computational overhead, leading to improved tracking robustness in complex environments. Specifically, the MSG generates multiple state representations at multiple stages during feature extraction, while SSE refines them to highlight target-specific features. The CSI module facilitates information exchange between these states and ensures the integration of complementary features. Notably, the introduced SSE and CSI modules adopt a highly lightweight hidden state adaptation-based state space duality (HSA-SSD) design, incurring only 0.1 GFLOPs in computation and 0.66 M in parameters. Experimental results demonstrate that MST outperforms all previous efficient trackers across multiple datasets, significantly improving tracking accuracy and robustness. In particular, it shows excellent runtime performance, with an AO score improvement of 4.5\% over the previous SOTA efficient tracker HCAT on the GOT-10K dataset. The code is available at https://github.com/wsumel/MST.
Related papers
- UETrack: A Unified and Efficient Framework for Single Object Tracking [46.50641228786134]
UETrack is an efficient framework for single object tracking.<n>It efficiently handles multiple modalities including RGB, Depth, Thermal, Event, and Language.<n>It achieves a superior speed-accuracy trade-off compared to previous methods.
arXiv Detail & Related papers (2026-03-02T03:32:30Z) - Exploring Reliable Spatiotemporal Dependencies for Efficient Visual Tracking [9.64398631601942]
STDTrack is a framework that pioneers the integration of reliabletemporal dependencies into lightweight trackers.<n>We introduce a temporally propagatingtemporal token to guide per-frame feature extraction.<n>We develop a multi-scale prediction head to dynamically adapt to objects of different sizes.
arXiv Detail & Related papers (2026-01-14T02:22:05Z) - SocialTrack: Multi-Object Tracking in Complex Urban Traffic Scenes Inspired by Social Behavior [17.501890320034693]
This paper proposes a novel multi-object tracking framework, SocialTrack, to enhance the tracking accuracy and robustness of small targets in complex urban traffic environments.<n>The specialized small-target detector enhances the detection performance by employing a multi-scale feature enhancement mechanism.<n> Extensive experiments on the UAVDT and MOT17 datasets demonstrate that SocialTrack outperforms existing state-of-the-art (SOTA) methods across several key metrics.
arXiv Detail & Related papers (2025-08-18T09:53:32Z) - Efficient Semantic Segmentation via Lightweight Multiple-Information Interaction Network [37.84039482457571]
We propose a Lightweight Multiple-Information Interaction Network (LMIINet) for real-time semantic segmentation.<n>With only 0.72M parameters and 11.74G FLOPs, LMIINet excels at balancing accuracy and efficiency.
arXiv Detail & Related papers (2024-10-03T05:45:24Z) - Exploring Dynamic Transformer for Efficient Object Tracking [58.120191254379854]
We propose DyTrack, a dynamic transformer framework for efficient tracking.<n>DyTrack automatically learns to configure proper reasoning routes for various inputs, gaining better utilization of the available computational budget.<n>Experiments on multiple benchmarks demonstrate that DyTrack achieves promising speed-precision trade-offs with only a single model.
arXiv Detail & Related papers (2024-03-26T12:31:58Z) - SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking [19.50096632818305]
Multimodal Visual Object Tracking (VOT) has recently gained significant attention due to its robustness.
Recent studies have utilized prompt tuning to transfer pre-trained RGB-based trackers to multimodal data.
We propose a novel symmetric multimodal tracking framework called SDSTrack.
arXiv Detail & Related papers (2024-03-24T04:15:50Z) - Joint Spatial-Temporal and Appearance Modeling with Transformer for
Multiple Object Tracking [59.79252390626194]
We propose a novel solution named TransSTAM, which leverages Transformer to model both the appearance features of each object and the spatial-temporal relationships among objects.
The proposed method is evaluated on multiple public benchmarks including MOT16, MOT17, and MOT20, and it achieves a clear performance improvement in both IDF1 and HOTA.
arXiv Detail & Related papers (2022-05-31T01:19:18Z) - Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline [80.13652104204691]
In this paper, we construct a large-scale benchmark with high diversity for visible-thermal UAV tracking (VTUAV)
We provide a coarse-to-fine attribute annotation, where frame-level attributes are provided to exploit the potential of challenge-specific trackers.
In addition, we design a new RGB-T baseline, named Hierarchical Multi-modal Fusion Tracker (HMFT), which fuses RGB-T data in various levels.
arXiv Detail & Related papers (2022-04-08T15:22:33Z) - RGBT Tracking via Multi-Adapter Network with Hierarchical Divergence
Loss [37.99375824040946]
We propose a novel multi-adapter network to jointly perform modality-shared, modality-specific and instance-aware target representation learning.
Experiments on two RGBT tracking benchmark datasets demonstrate the outstanding performance of the proposed tracker.
arXiv Detail & Related papers (2020-11-14T01:50:46Z) - Object Tracking through Residual and Dense LSTMs [67.98948222599849]
Deep learning-based trackers based on LSTMs (Long Short-Term Memory) recurrent neural networks have emerged as a powerful alternative.
DenseLSTMs outperform Residual and regular LSTM, and offer a higher resilience to nuisances.
Our case study supports the adoption of residual-based RNNs for enhancing the robustness of other trackers.
arXiv Detail & Related papers (2020-06-22T08:20:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.