Related papers: Efficient Motion Prompt Learning for Robust Visual Tracking

Efficient Motion Prompt Learning for Robust Visual Tracking

URL: http://arxiv.org/abs/2505.16321v1
Date: Thu, 22 May 2025 07:22:58 GMT
Title: Efficient Motion Prompt Learning for Robust Visual Tracking
Authors: Jie Zhao, Xin Chen, Yongsheng Yuan, Michael Felsberg, Dong Wang, Huchuan Lu,
Abstract summary: We propose a lightweight and plug-and-play motion prompt tracking method.<n>It can be easily integrated into existing vision-based trackers to build a joint tracking framework.<n>Experiments on seven tracking benchmarks demonstrate that the proposed motion module significantly improves the robustness of vision-based trackers.
Score: 58.59714916705317
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Due to the challenges of processing temporal information, most trackers depend solely on visual discriminability and overlook the unique temporal coherence of video data. In this paper, we propose a lightweight and plug-and-play motion prompt tracking method. It can be easily integrated into existing vision-based trackers to build a joint tracking framework leveraging both motion and vision cues, thereby achieving robust tracking through efficient prompt learning. A motion encoder with three different positional encodings is proposed to encode the long-term motion trajectory into the visual embedding space, while a fusion decoder and an adaptive weight mechanism are designed to dynamically fuse visual and motion features. We integrate our motion module into three different trackers with five models in total. Experiments on seven challenging tracking benchmarks demonstrate that the proposed motion module significantly improves the robustness of vision-based trackers, with minimal training costs and negligible speed sacrifice. Code is available at https://github.com/zj5559/Motion-Prompt-Tracking.

Related papers

MoSiC: Optimal-Transport Motion Trajectory for Dense Self-Supervised Learning [66.53533434848369]
We propose a motion-guided self-learning framework that learns densely consistent representations.<n>We improve state-of-the-art by 1% to 6% on six image and video datasets and four evaluation benchmarks.
arXiv Detail & Related papers (2025-06-10T11:20:32Z)
TrackNetV4: Enhancing Fast Sports Object Tracking with Motion Attention Maps [6.548400020461624]
We introduce an enhancement to the TrackNet family by fusing high-level visual features with learnable motion attention maps. Our approach leverages frame differencing maps, modulated by a motion prompt layer, to highlight key motion regions over time. We refer to our lightweight, plug-and-play solution, built on top of the existing TrackNet, as TrackNetV4.
arXiv Detail & Related papers (2024-09-22T17:58:09Z)
Learning Motion Blur Robust Vision Transformers with Dynamic Early Exit for Real-Time UAV Tracking [14.382072224997074]
Single-stream architectures utilizing pre-trained ViT backbones offer improved performance, efficiency, and robustness. We boost the efficiency of this framework by tailoring it into an adaptive framework that dynamically exits Transformer blocks for real-time UAV tracking. We also improve the effectiveness of ViTs in handling motion blur, a common issue in UAV tracking caused by the fast movements of either the UAV, the tracked objects, or both.
arXiv Detail & Related papers (2024-07-07T14:10:04Z)
Motion-Guided Dual-Camera Tracker for Endoscope Tracking and Motion Analysis in a Mechanical Gastric Simulator [5.073179848641095]
Motion-guided dual-camera vision tracker is proposed to provide robust and accurate tracking of the endoscope tip's 3D position.<n>The proposed tracker achieves superior performance against state-of-the-art vision trackers, achieving 42% and 72% improvements against the second-best method in average error and maximum error.
arXiv Detail & Related papers (2024-03-08T08:31:46Z)
Delving into Motion-Aware Matching for Monocular 3D Object Tracking [81.68608983602581]
We find that the motion cue of objects along different time frames is critical in 3D multi-object tracking. We propose MoMA-M3T, a framework that mainly consists of three motion-aware components. We conduct extensive experiments on the nuScenes and KITTI datasets to demonstrate our MoMA-M3T achieves competitive performance against state-of-the-art methods.
arXiv Detail & Related papers (2023-08-22T17:53:58Z)
MotionTrack: Learning Motion Predictor for Multiple Object Tracking [68.68339102749358]
We introduce a novel motion-based tracker, MotionTrack, centered around a learnable motion predictor. Our experimental results demonstrate that MotionTrack yields state-of-the-art performance on datasets such as Dancetrack and SportsMOT.
arXiv Detail & Related papers (2023-06-05T04:24:11Z)
An Effective Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds [50.19288542498838]
3D single object tracking in LiDAR point clouds (LiDAR SOT) plays a crucial role in autonomous driving. Current approaches all follow the Siamese paradigm based on appearance matching. We introduce a motion-centric paradigm to handle LiDAR SOT from a new perspective.
arXiv Detail & Related papers (2023-03-21T17:28:44Z)
MotionTrack: Learning Robust Short-term and Long-term Motions for Multi-Object Tracking [56.92165669843006]
We propose MotionTrack, which learns robust short-term and long-term motions in a unified framework to associate trajectories from a short to long range. For dense crowds, we design a novel Interaction Module to learn interaction-aware motions from short-term trajectories, which can estimate the complex movement of each target. For extreme occlusions, we build a novel Refind Module to learn reliable long-term motions from the target's history trajectory, which can link the interrupted trajectory with its corresponding detection.
arXiv Detail & Related papers (2023-03-18T12:38:33Z)
ChallenCap: Monocular 3D Capture of Challenging Human Performances using Multi-Modal References [18.327101908143113]
We propose ChallenCap -- a template-based approach to capture challenging 3D human motions using a single RGB camera. We adopt a novel learning-and-optimization framework, with the aid of multi-modal references. Experiments on our new challenging motion dataset demonstrate the effectiveness and robustness of our approach to capture challenging human motions.
arXiv Detail & Related papers (2021-03-11T15:49:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.