Related papers: UETrack: A Unified and Efficient Framework for Single Object Tracking

UETrack: A Unified and Efficient Framework for Single Object Tracking

URL: http://arxiv.org/abs/2603.01412v2
Date: Tue, 03 Mar 2026 06:18:46 GMT
Title: UETrack: A Unified and Efficient Framework for Single Object Tracking
Authors: Ben Kang, Jie Zhao, Xin Chen, Wanting Geng, Bin Zhang, Lu Zhang, Dong Wang, Huchuan Lu,
Abstract summary: UETrack is an efficient framework for single object tracking.<n>It efficiently handles multiple modalities including RGB, Depth, Thermal, Event, and Language.<n>It achieves a superior speed-accuracy trade-off compared to previous methods.
Score: 46.50641228786134
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With growing real-world demands, efficient tracking has received increasing attention. However, most existing methods are limited to RGB inputs and struggle in multi-modal scenarios. Moreover, current multi-modal tracking approaches typically use complex designs, making them too heavy and slow for resource-constrained deployment. To tackle these limitations, we propose UETrack, an efficient framework for single object tracking. UETrack demonstrates high practicality and versatility, efficiently handling multiple modalities including RGB, Depth, Thermal, Event, and Language, and addresses the gap in efficient multi-modal tracking. It introduces two key components: a Token-Pooling-based Mixture-of-Experts mechanism that enhances modeling capacity through feature aggregation and expert specialization, and a Target-aware Adaptive Distillation strategy that selectively performs distillation based on sample characteristics, reducing redundant supervision and improving performance. Extensive experiments on 12 benchmarks across 3 hardware platforms show that UETrack achieves a superior speed-accuracy trade-off compared to previous methods. For instance, UETrack-B achieves 69.2% AUC on LaSOT and runs at 163/56/60 FPS on GPU/CPU/AGX, demonstrating strong practicality and versatility. Code is available at https://github.com/kangben258/UETrack.

Related papers

UTPTrack: Towards Simple and Unified Token Pruning for Visual Tracking [23.83535022949326]
One-stream Transformer-based trackers achieve advanced performance in visual object tracking but suffer from significant computational overhead.<n>We introduce UTPTrack, a simple and Unified Token Pruning framework that, for the first time, jointly compresses all three components.<n>Extensive evaluations on 10 benchmarks demonstrate that UTPTrack achieves a new state-of-the-art in the accuracy-efficiency trade-off for pruning-based trackers.
arXiv Detail & Related papers (2026-02-27T06:58:09Z)
UBATrack: Spatio-Temporal State Space Model for General Multi-Modal Tracking [40.8191099453086]
We introduce a novel multi-modal tracking framework based on a mamba-style state, termed UBATrack.<n>UBATrack comprises two simple yet effective work space: a S-temporal Mamba Adapter (MA) and a Dynamic Multi-modal Feature Mixer.<n>Experiments show that UBATrack outperforms state-of-the-art methods on RGB-T, RGB-D, and RGB-E tracking benchmarks.
arXiv Detail & Related papers (2026-01-21T09:24:19Z)
Towards Low-Latency Event Stream-based Visual Object Tracking: A Slow-Fast Approach [32.91982063297922]
We propose a novel Slow-Fast Tracking paradigm that flexibly adapts to different operational requirements, termed SFTrack.<n>The proposed framework supports two complementary modes, i.e., a high-precision slow tracker for scenarios with sufficient computational resources, and an efficient fast tracker tailored for latency-aware, resource-constrained environments.<n>Our framework first performs graph-based representation learning from high-temporal-resolution event streams, and then integrates the learned graph-structured information into two FlashAttention-based vision backbones.
arXiv Detail & Related papers (2025-05-19T09:37:23Z)
CAMELTrack: Context-Aware Multi-cue ExpLoitation for Online Multi-Object Tracking [68.24998698508344]
We introduce CAMEL, a novel association module for Context-Aware Multi-Cue ExpLoitation.<n>Unlike end-to-end detection-by-tracking approaches, our method remains lightweight and fast to train while being able to leverage external off-the-shelf models.<n>Our proposed online tracking pipeline, CAMELTrack, achieves state-of-the-art performance on multiple tracking benchmarks.
arXiv Detail & Related papers (2025-05-02T13:26:23Z)
Exploring Dynamic Transformer for Efficient Object Tracking [58.120191254379854]
We propose DyTrack, a dynamic transformer framework for efficient tracking.<n>DyTrack automatically learns to configure proper reasoning routes for various inputs, gaining better utilization of the available computational budget.<n>Experiments on multiple benchmarks demonstrate that DyTrack achieves promising speed-precision trade-offs with only a single model.
arXiv Detail & Related papers (2024-03-26T12:31:58Z)
Single-Model and Any-Modality for Video Object Tracking [85.83753760853142]
We introduce Un-Track, a Unified Tracker of a single set of parameters for any modality. To handle any modality, our method learns their common latent space through low-rank factorization and reconstruction techniques. Our Un-Track achieves +8.1 absolute F-score gain, on the DepthTrack dataset, by introducing only +2.14 (over 21.50) GFLOPs with +6.6M (over 93M) parameters.
arXiv Detail & Related papers (2023-11-27T14:17:41Z)
Prompting for Multi-Modal Tracking [70.0522146292258]
We propose a novel multi-modal prompt tracker (ProTrack) for multi-modal tracking. ProTrack can transfer the multi-modal inputs to a single modality by the prompt paradigm. Our ProTrack can achieve high-performance multi-modal tracking by only altering the inputs, even without any extra training on multi-modal data.
arXiv Detail & Related papers (2022-07-29T09:35:02Z)
Joint Feature Learning and Relation Modeling for Tracking: A One-Stream Framework [76.70603443624012]
We propose a novel one-stream tracking (OSTrack) framework that unifies feature learning and relation modeling. In this way, discriminative target-oriented features can be dynamically extracted by mutual guidance. OSTrack achieves state-of-the-art performance on multiple benchmarks, in particular, it shows impressive results on the one-shot tracking benchmark GOT-10k.
arXiv Detail & Related papers (2022-03-22T18:37:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.