Less is More: Token Context-aware Learning for Object Tracking
- URL: http://arxiv.org/abs/2501.00758v1
- Date: Wed, 01 Jan 2025 07:05:31 GMT
- Title: Less is More: Token Context-aware Learning for Object Tracking
- Authors: Chenlong Xu, Bineng Zhong, Qihua Liang, Yaozong Zheng, Guorong Li, Shuxiang Song,
- Abstract summary: LMTrack is a token context-aware tracking pipeline.
It automatically learns high-quality reference tokens for efficient visual tracking.
It achieves state-of-the-art results on tracking benchmarks such as GOT-10K, TrackingNet, and LaSOT.
- Score: 20.222950380244377
- License:
- Abstract: Recently, several studies have shown that utilizing contextual information to perceive target states is crucial for object tracking. They typically capture context by incorporating multiple video frames. However, these naive frame-context methods fail to consider the importance of each patch within a reference frame, making them susceptible to noise and redundant tokens, which deteriorates tracking performance. To address this challenge, we propose a new token context-aware tracking pipeline named LMTrack, designed to automatically learn high-quality reference tokens for efficient visual tracking. Embracing the principle of Less is More, the core idea of LMTrack is to analyze the importance distribution of all reference tokens, where important tokens are collected, continually attended to, and updated. Specifically, a novel Token Context Memory module is designed to dynamically collect high-quality spatio-temporal information of a target in an autoregressive manner, eliminating redundant background tokens from the reference frames. Furthermore, an effective Unidirectional Token Attention mechanism is designed to establish dependencies between reference tokens and search frame, enabling robust cross-frame association and target localization. Extensive experiments demonstrate the superiority of our tracker, achieving state-of-the-art results on tracking benchmarks such as GOT-10K, TrackingNet, and LaSOT.
Related papers
- RTracker: Recoverable Tracking via PN Tree Structured Memory [71.05904715104411]
We propose a recoverable tracking framework, RTracker, that uses a tree-structured memory to dynamically associate a tracker and a detector to enable self-recovery.
Specifically, we propose a Positive-Negative Tree-structured memory to chronologically store and maintain positive and negative target samples.
Our core idea is to use the support samples of positive and negative target categories to establish a relative distance-based criterion for a reliable assessment of target loss.
arXiv Detail & Related papers (2024-03-28T08:54:40Z) - Optimized Information Flow for Transformer Tracking [0.7199733380797579]
One-stream Transformer trackers have shown outstanding performance in challenging benchmark datasets.
We propose a novel OIFTrack framework to enhance the discriminative capability of the tracker.
arXiv Detail & Related papers (2024-02-13T03:39:15Z) - Tracking with Human-Intent Reasoning [64.69229729784008]
This work proposes a new tracking task -- Instruction Tracking.
It involves providing implicit tracking instructions that require the trackers to perform tracking automatically in video frames.
TrackGPT is capable of performing complex reasoning-based tracking.
arXiv Detail & Related papers (2023-12-29T03:22:18Z) - iKUN: Speak to Trackers without Retraining [21.555469501789577]
We propose an insertable Knowledge Unification Network, termed iKUN, to enable communication with off-the-shelf trackers.
To improve the localization accuracy, we present a neural version of Kalman filter (NKF) to dynamically adjust process noise.
We also contribute a more challenging dataset, Refer-Dance, by extending public DanceTrack dataset with motion and dressing descriptions.
arXiv Detail & Related papers (2023-12-25T11:48:55Z) - Single-Shot and Multi-Shot Feature Learning for Multi-Object Tracking [55.13878429987136]
We propose a simple yet effective two-stage feature learning paradigm to jointly learn single-shot and multi-shot features for different targets.
Our method has achieved significant improvements on MOT17 and MOT20 datasets while reaching state-of-the-art performance on DanceTrack dataset.
arXiv Detail & Related papers (2023-11-17T08:17:49Z) - Towards Unified Token Learning for Vision-Language Tracking [65.96561538356315]
We present a vision-language (VL) tracking pipeline, termed textbfMMTrack, which casts VL tracking as a token generation task.
Our proposed framework serializes language description and bounding box into a sequence of discrete tokens.
In this new design paradigm, all token queries are required to perceive the desired target and directly predict spatial coordinates of the target.
arXiv Detail & Related papers (2023-08-27T13:17:34Z) - Context-aware Visual Tracking with Joint Meta-updating [11.226947525556813]
We propose a context-aware tracking model to optimize the tracker over the representation space, which jointly meta-update both branches by exploiting information along the whole sequence.
The proposed tracking method achieves an EAO score of 0.514 on VOT2018 with the speed of 40FPS, demonstrating its capability of improving the accuracy and robustness of the underlying tracker with little speed drop.
arXiv Detail & Related papers (2022-04-04T14:16:00Z) - Learning Dynamic Compact Memory Embedding for Deformable Visual Object
Tracking [82.34356879078955]
We propose a compact memory embedding to enhance the discrimination of the segmentation-based deformable visual tracking method.
Our method outperforms the excellent segmentation-based trackers, i.e., D3S and SiamMask on DAVIS 2017 benchmark.
arXiv Detail & Related papers (2021-11-23T03:07:12Z) - Tracking by Joint Local and Global Search: A Target-aware Attention
based Approach [63.50045332644818]
We propose a novel target-aware attention mechanism (termed TANet) to conduct joint local and global search for robust tracking.
Specifically, we extract the features of target object patch and continuous video frames, then we track and feed them into a decoder network to generate target-aware global attention maps.
In the tracking procedure, we integrate the target-aware attention with multiple trackers by exploring candidate search regions for robust tracking.
arXiv Detail & Related papers (2021-06-09T06:54:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.