Context-Aware Token Pruning and Discriminative Selective Attention for Transformer Tracking
- URL: http://arxiv.org/abs/2511.19928v1
- Date: Tue, 25 Nov 2025 05:12:17 GMT
- Title: Context-Aware Token Pruning and Discriminative Selective Attention for Transformer Tracking
- Authors: Janani Kugarajeevan, Thanikasalam Kokul, Amirthalingam Ramanan, Subha Fernando,
- Abstract summary: One-stream Transformer-based trackers have demonstrated remarkable performance by concatenating template and search region tokens.<n>An excessive proportion of background search tokens to attend to the target template tokens weakens the tracker's discriminative capability.<n>We propose CPDATrack, a novel tracking framework designed to suppress interference from background and distractor tokens.
- Score: 2.557588419790226
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: One-stream Transformer-based trackers have demonstrated remarkable performance by concatenating template and search region tokens, thereby enabling joint attention across all tokens. However, enabling an excessive proportion of background search tokens to attend to the target template tokens weakens the tracker's discriminative capability. Several token pruning methods have been proposed to mitigate background interference; however, they often remove tokens near the target, leading to the loss of essential contextual information and degraded tracking performance. Moreover, the presence of distractors within the search tokens further reduces the tracker's ability to accurately identify the target. To address these limitations, we propose CPDATrack, a novel tracking framework designed to suppress interference from background and distractor tokens while enhancing computational efficiency. First, a learnable module is integrated between two designated encoder layers to estimate the probability of each search token being associated with the target. Based on these estimates, less-informative background tokens are pruned from the search region while preserving the contextual cues surrounding the target. To further suppress background interference, a discriminative selective attention mechanism is employed that fully blocks search-to-template attention in the early layers. In the subsequent encoder layers, high-probability target tokens are selectively extracted from a localized region to attend to the template tokens, thereby reducing the influence of background and distractor tokens. The proposed CPDATrack achieves state-of-the-art performance across multiple benchmarks, particularly on GOT-10k, where it attains an average overlap of 75.1 percent.
Related papers
- Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection [13.937483869660648]
Token Sparse Attention is a dynamic token-level sparsification mechanism that compresses per-head $Q$, $K$, $V$ to a reduced token set during attention.<n>We show that Token Sparse Attention consistently improves accuracy-latency trade-off, achieving up to $times$3.23 attention speedup at 128K context with less than 1% accuracy degradation.
arXiv Detail & Related papers (2026-02-03T07:31:14Z) - Training-Free Token Pruning via Zeroth-Order Gradient Estimation in Vision-Language Models [16.540220733551823]
Large Vision-Language Models (VLMs) enable strong multimodal reasoning but incur heavy inference costs from redundant visual tokens.<n> Attention-based methods rely on raw attention scores, which are often unstable across layers and heads.<n>We propose ours, a training-free framework built on a simple intuition.
arXiv Detail & Related papers (2025-09-29T14:20:05Z) - Less is More: Token Context-aware Learning for Object Tracking [20.222950380244377]
LMTrack is a token context-aware tracking pipeline.<n>It automatically learns high-quality reference tokens for efficient visual tracking.<n>It achieves state-of-the-art results on tracking benchmarks such as GOT-10K, TrackingNet, and LaSOT.
arXiv Detail & Related papers (2025-01-01T07:05:31Z) - Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability [53.51560766150442]
Critical tokens are elements within reasoning trajectories that significantly influence incorrect outcomes.<n>We present a novel framework for identifying these tokens through rollout sampling.<n>We show that identifying and replacing critical tokens significantly improves model accuracy.
arXiv Detail & Related papers (2024-11-29T18:58:22Z) - ToSA: Token Selective Attention for Efficient Vision Transformers [50.13756218204456]
ToSA is a token selective attention approach that can identify tokens that need to be attended as well as those that can skip a transformer layer.
We show that ToSA can significantly reduce computation costs while maintaining accuracy on the ImageNet classification benchmark.
arXiv Detail & Related papers (2024-06-13T05:17:21Z) - Optimized Information Flow for Transformer Tracking [0.7199733380797579]
One-stream Transformer trackers have shown outstanding performance in challenging benchmark datasets.
We propose a novel OIFTrack framework to enhance the discriminative capability of the tracker.
arXiv Detail & Related papers (2024-02-13T03:39:15Z) - Token-Level Adversarial Prompt Detection Based on Perplexity Measures
and Contextual Information [67.78183175605761]
Large Language Models are susceptible to adversarial prompt attacks.
This vulnerability underscores a significant concern regarding the robustness and reliability of LLMs.
We introduce a novel approach to detecting adversarial prompts at a token level.
arXiv Detail & Related papers (2023-11-20T03:17:21Z) - Dynamic Focus-aware Positional Queries for Semantic Segmentation [94.6834904076914]
We propose a simple yet effective query design for semantic segmentation termed Dynamic Focus-aware Positional Queries.
Our framework achieves SOTA performance and outperforms Mask2former by clear margins of 1.1%, 1.9%, and 1.1% single-scale mIoU with ResNet-50, Swin-T, and Swin-B backbones.
arXiv Detail & Related papers (2022-04-04T05:16:41Z) - Graph Attention Tracking [76.19829750144564]
We propose a simple target-aware Siamese graph attention network for general object tracking.
Experiments on challenging benchmarks including GOT-10k, UAV123, OTB-100 and LaSOT demonstrate that the proposed SiamGAT outperforms many state-of-the-art trackers.
arXiv Detail & Related papers (2020-11-23T04:26:45Z) - A Self-Training Approach for Point-Supervised Object Detection and
Counting in Crowds [54.73161039445703]
We propose a novel self-training approach that enables a typical object detector trained only with point-level annotations.
During training, we utilize the available point annotations to supervise the estimation of the center points of objects.
Experimental results show that our approach significantly outperforms state-of-the-art point-supervised methods under both detection and counting tasks.
arXiv Detail & Related papers (2020-07-25T02:14:42Z) - Semi-Supervised Crowd Counting via Self-Training on Surrogate Tasks [50.78037828213118]
This paper tackles the semi-supervised crowd counting problem from the perspective of feature learning.
We propose a novel semi-supervised crowd counting method which is built upon two innovative components.
arXiv Detail & Related papers (2020-07-07T05:30:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.