Related papers: Mamba-FETrack: Frame-Event Tracking via State Space Model

Mamba-FETrack: Frame-Event Tracking via State Space Model

URL: http://arxiv.org/abs/2404.18174v1
Date: Sun, 28 Apr 2024 13:12:49 GMT
Title: Mamba-FETrack: Frame-Event Tracking via State Space Model
Authors: Ju Huang, Shiao Wang, Shuai Wang, Zhe Wu, Xiao Wang, Bo Jiang,
Abstract summary: This paper proposes a novel RGB-Event tracking framework, Mamba-FETrack, based on the State Space Model (SSM) Specifically, we adopt two modality-specific Mamba backbone networks to extract the features of RGB frames and Event streams. Extensive experiments on FELT and FE108 datasets fully validated the efficiency and effectiveness of our proposed tracker.
Score: 14.610806117193116
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: RGB-Event based tracking is an emerging research topic, focusing on how to effectively integrate heterogeneous multi-modal data (synchronized exposure video frames and asynchronous pulse Event stream). Existing works typically employ Transformer based networks to handle these modalities and achieve decent accuracy through input-level or feature-level fusion on multiple datasets. However, these trackers require significant memory consumption and computational complexity due to the use of self-attention mechanism. This paper proposes a novel RGB-Event tracking framework, Mamba-FETrack, based on the State Space Model (SSM) to achieve high-performance tracking while effectively reducing computational costs and realizing more efficient tracking. Specifically, we adopt two modality-specific Mamba backbone networks to extract the features of RGB frames and Event streams. Then, we also propose to boost the interactive learning between the RGB and Event features using the Mamba network. The fused features will be fed into the tracking head for target object localization. Extensive experiments on FELT and FE108 datasets fully validated the efficiency and effectiveness of our proposed tracker. Specifically, our Mamba-based tracker achieves 43.5/55.6 on the SR/PR metric, while the ViT-S based tracker (OSTrack) obtains 40.0/50.9. The GPU memory cost of ours and ViT-S based tracker is 13.98GB and 15.44GB, which decreased about $9.5\%$. The FLOPs and parameters of ours/ViT-S based OSTrack are 59GB/1076GB and 7MB/60MB, which decreased about $94.5\%$ and $88.3\%$, respectively. We hope this work can bring some new insights to the tracking field and greatly promote the application of the Mamba architecture in tracking. The source code of this work will be released on \url{https://github.com/Event-AHU/Mamba_FETrack}.

Related papers

Mamba-FETrack V2: Revisiting State Space Model for Frame-Event based Visual Object Tracking [9.353589376846902]
We propose an efficient RGB-Event object tracking framework based on the linear-complexity Vision Mamba network.<n>The source code and pre-trained models will be released at https://github.com/Event-AHU/Mamba_FETrack.
arXiv Detail & Related papers (2025-06-30T12:24:01Z)
Lightweight RGB-T Tracking with Mobile Vision Transformers [2.209921757303168]
We propose a novel lightweight RGB-T tracking algorithm based on Mobile Vision Transformers (MobileViT)<n>Compared to state-of-the-art efficient multimodal trackers, our model achieves comparable accuracy while offering significantly lower parameter counts.<n>This paper is the first to propose a tracker using Mobile Vision Transformers for RGB-T tracking and multimodal tracking at large.
arXiv Detail & Related papers (2025-06-23T21:46:22Z)
Temporal Correlation Meets Embedding: Towards a 2nd Generation of JDE-based Real-Time Multi-Object Tracking [52.04679257903805]
Joint Detection and Embedding (JDE) trackers have demonstrated excellent performance in Multi-Object Tracking (MOT) tasks. Our tracker, named TCBTrack, achieves state-of-the-art performance on multiple public benchmarks.
arXiv Detail & Related papers (2024-07-19T07:48:45Z)
OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning [33.521077115333696]
We present a general framework to unify various tracking tasks, termed as OneTracker. OneTracker first performs a large-scale pre-training on a RGB tracker called Foundation Tracker. Then we regard other modality information as prompt and build Prompt Tracker upon Foundation Tracker.
arXiv Detail & Related papers (2024-03-14T17:59:13Z)
Long-Term Visual Object Tracking with Event Cameras: An Associative Memory Augmented Tracker and A Benchmark Dataset [9.366068518600583]
We propose a new long-term, large-scale frame-event visual object tracking dataset, termed FELT.<n>We also propose a novel Associative Memory Transformer based RGB-Event long-term visual tracker, termed AMTTrack.
arXiv Detail & Related papers (2024-03-09T08:49:50Z)
Single-Model and Any-Modality for Video Object Tracking [85.83753760853142]
We introduce Un-Track, a Unified Tracker of a single set of parameters for any modality. To handle any modality, our method learns their common latent space through low-rank factorization and reconstruction techniques. Our Un-Track achieves +8.1 absolute F-score gain, on the DepthTrack dataset, by introducing only +2.14 (over 21.50) GFLOPs with +6.6M (over 93M) parameters.
arXiv Detail & Related papers (2023-11-27T14:17:41Z)
Separable Self and Mixed Attention Transformers for Efficient Object Tracking [3.9160947065896803]
This paper proposes an efficient self and mixed attention transformer-based architecture for lightweight tracking. With these contributions, the proposed lightweight tracker deploys a transformer-based backbone and head module concurrently for the first time. Simulations show that our Separable Self and Mixed Attention-based Tracker, SMAT, surpasses the performance of related lightweight trackers on GOT10k, TrackingNet, LaSOT, NfS30, UAV123, and AVisT datasets.
arXiv Detail & Related papers (2023-09-07T19:23:02Z)
Learning Dual-Fused Modality-Aware Representations for RGBD Tracking [67.14537242378988]
Compared with the traditional RGB object tracking, the addition of the depth modality can effectively solve the target and background interference. Some existing RGBD trackers use the two modalities separately and thus some particularly useful shared information between them is ignored. We propose a novel Dual-fused Modality-aware Tracker (termed DMTracker) which aims to learn informative and discriminative representations of the target objects for robust RGBD tracking.
arXiv Detail & Related papers (2022-11-06T07:59:07Z)
Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline [80.13652104204691]
In this paper, we construct a large-scale benchmark with high diversity for visible-thermal UAV tracking (VTUAV) We provide a coarse-to-fine attribute annotation, where frame-level attributes are provided to exploit the potential of challenge-specific trackers. In addition, we design a new RGB-T baseline, named Hierarchical Multi-modal Fusion Tracker (HMFT), which fuses RGB-T data in various levels.
arXiv Detail & Related papers (2022-04-08T15:22:33Z)
LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search [104.84999119090887]
We present LightTrack, which uses neural architecture search (NAS) to design more lightweight and efficient object trackers. Comprehensive experiments show that our LightTrack is effective. It can find trackers that achieve superior performance compared to handcrafted SOTA trackers, such as SiamRPN++ and Ocean.
arXiv Detail & Related papers (2021-04-29T17:55:24Z)
STMTrack: Template-free Visual Tracking with Space-time Memory Networks [42.06375415765325]
Existing trackers with template updating mechanisms rely on time-consuming numerical optimization and complex hand-designed strategies to achieve competitive performance. We propose a novel tracking framework built on top of a space-time memory network that is competent to make full use of historical information related to the target. Specifically, a novel memory mechanism is introduced, which stores the historical information of the target to guide the tracker to focus on the most informative regions in the current frame.
arXiv Detail & Related papers (2021-04-01T08:10:56Z)
Simultaneous Detection and Tracking with Motion Modelling for Multiple Object Tracking [94.24393546459424]
We introduce Deep Motion Modeling Network (DMM-Net) that can estimate multiple objects' motion parameters to perform joint detection and association. DMM-Net achieves PR-MOTA score of 12.80 @ 120+ fps for the popular UA-DETRAC challenge, which is better performance and orders of magnitude faster. We also contribute a synthetic large-scale public dataset Omni-MOT for vehicle tracking that provides precise ground-truth annotations.
arXiv Detail & Related papers (2020-08-20T08:05:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.