SMMT: Siamese Motion Mamba with Self-attention for Thermal Infrared Target Tracking
- URL: http://arxiv.org/abs/2505.04088v3
- Date: Wed, 11 Jun 2025 14:19:28 GMT
- Title: SMMT: Siamese Motion Mamba with Self-attention for Thermal Infrared Target Tracking
- Authors: Shang Zhang, Huanbin Zhang, Dali Feng, Yujie Cui, Ruoyan Xiong, Cen He,
- Abstract summary: This paper pro-poses a novel Siamese Motion Mamba Tracker (SMMT)<n>We introduce the Motion Mamba module into the Siamese architecture to ex-tract motion features and recover overlooked edge details.<n>In addition, we design a motion edge-aware regression loss to improve tracking accuracy, especially for motion-blurred targets.
- Score: 0.32985979395737786
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Thermal infrared (TIR) object tracking often suffers from challenges such as target occlusion, motion blur, and background clutter, which significantly degrade the performance of trackers. To address these issues, this paper pro-poses a novel Siamese Motion Mamba Tracker (SMMT), which integrates a bidirectional state-space model and a self-attention mechanism. Specifically, we introduce the Motion Mamba module into the Siamese architecture to ex-tract motion features and recover overlooked edge details using bidirectional modeling and self-attention. We propose a Siamese parameter-sharing strate-gy that allows certain convolutional layers to share weights. This approach reduces computational redundancy while preserving strong feature represen-tation. In addition, we design a motion edge-aware regression loss to improve tracking accuracy, especially for motion-blurred targets. Extensive experi-ments are conducted on four TIR tracking benchmarks, including LSOTB-TIR, PTB-TIR, VOT-TIR2015, and VOT-TIR 2017. The results show that SMMT achieves superior performance in TIR target tracking.
Related papers
- Tracking the Unstable: Appearance-Guided Motion Modeling for Robust Multi-Object Tracking in UAV-Captured Videos [58.156141601478794]
Multi-object tracking (UAVT) aims to track multiple objects while maintaining consistent identities across frames of a given video.<n>Existing methods typically model motion cues and appearance separately, overlooking their interplay and resulting in suboptimal tracking performance.<n>We propose AMOT, which exploits appearance and motion cues through two key components: an Appearance-Motion Consistency (AMC) matrix and a Motion-aware Track Continuation (MTC) module.
arXiv Detail & Related papers (2025-08-03T12:06:47Z) - FGSGT: Saliency-Guided Siamese Network Tracker Based on Key Fine-Grained Feature Information for Thermal Infrared Target Tracking [11.599952876425736]
We propose a novel saliency-guided Siamese network tracker based on key fine-grained feature infor-mation.<n>This design captures essential global features from shallow layers, enhances feature diversity, and minimizes the loss of fine-grained in-formation.<n>Experiment results demonstrate that the pro-posed tracker achieves the highest precision and success rates.
arXiv Detail & Related papers (2025-04-19T14:13:15Z) - Temporal Correlation Meets Embedding: Towards a 2nd Generation of JDE-based Real-Time Multi-Object Tracking [52.04679257903805]
Joint Detection and Embedding (JDE) trackers have demonstrated excellent performance in Multi-Object Tracking (MOT) tasks.
Our tracker, named TCBTrack, achieves state-of-the-art performance on multiple public benchmarks.
arXiv Detail & Related papers (2024-07-19T07:48:45Z) - MM-Tracker: Motion Mamba with Margin Loss for UAV-platform Multiple Object Tracking [12.326023523101806]
Multiple object tracking (MOT) from unmanned aerial vehicle platforms requires efficient motion modeling.<n>We propose the Motion Mamba Module, which explores both local and global motion features.<n>We also design motion margin loss to effectively improve the detection accuracy of motion blurred objects.<n>Based on the Motion Mamba module and motion margin loss, our proposed MM-Tracker surpasses the state-of-the-art in two widely open-source UAV-MOT datasets.
arXiv Detail & Related papers (2024-07-15T07:13:27Z) - Mamba-FETrack: Frame-Event Tracking via State Space Model [14.610806117193116]
This paper proposes a novel RGB-Event tracking framework, Mamba-FETrack, based on the State Space Model (SSM)
Specifically, we adopt two modality-specific Mamba backbone networks to extract the features of RGB frames and Event streams.
Extensive experiments on FELT and FE108 datasets fully validated the efficiency and effectiveness of our proposed tracker.
arXiv Detail & Related papers (2024-04-28T13:12:49Z) - Spatio-Temporal Bi-directional Cross-frame Memory for Distractor Filtering Point Cloud Single Object Tracking [2.487142846438629]
3 single object tracking within LIDAR point is pivotal task in computer vision.
Existing methods, which depend solely on appearance matching via networks or utilize information from successive frames, encounter significant challenges.
We design an innovative cross-frame bi-temporal motion tracker, named STMD-Tracker, to mitigate these challenges.
arXiv Detail & Related papers (2024-03-23T13:15:44Z) - MotionTrack: Learning Motion Predictor for Multiple Object Tracking [68.68339102749358]
We introduce a novel motion-based tracker, MotionTrack, centered around a learnable motion predictor.
Our experimental results demonstrate that MotionTrack yields state-of-the-art performance on datasets such as Dancetrack and SportsMOT.
arXiv Detail & Related papers (2023-06-05T04:24:11Z) - An Effective Motion-Centric Paradigm for 3D Single Object Tracking in
Point Clouds [50.19288542498838]
3D single object tracking in LiDAR point clouds (LiDAR SOT) plays a crucial role in autonomous driving.
Current approaches all follow the Siamese paradigm based on appearance matching.
We introduce a motion-centric paradigm to handle LiDAR SOT from a new perspective.
arXiv Detail & Related papers (2023-03-21T17:28:44Z) - Probabilistic Tracklet Scoring and Inpainting for Multiple Object
Tracking [83.75789829291475]
We introduce a probabilistic autoregressive motion model to score tracklet proposals.
This is achieved by training our model to learn the underlying distribution of natural tracklets.
Our experiments demonstrate the superiority of our approach at tracking objects in challenging sequences.
arXiv Detail & Related papers (2020-12-03T23:59:27Z) - Jointly Modeling Motion and Appearance Cues for Robust RGB-T Tracking [85.333260415532]
We develop a novel late fusion method to infer the fusion weight maps of both RGB and thermal (T) modalities.
When the appearance cue is unreliable, we take motion cues into account to make the tracker robust.
Numerous results on three recent RGB-T tracking datasets show that the proposed tracker performs significantly better than other state-of-the-art algorithms.
arXiv Detail & Related papers (2020-07-04T08:11:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.