Related papers: Deep Learning-based Lightweight RGB Object Tracking for Augmented Reality Devices

Deep Learning-based Lightweight RGB Object Tracking for Augmented Reality Devices

URL: http://arxiv.org/abs/2511.17508v1
Date: Sat, 04 Oct 2025 02:39:55 GMT
Title: Deep Learning-based Lightweight RGB Object Tracking for Augmented Reality Devices
Authors: Alice Smith, Bob Johnson, Xiaoyu Zhu, Carol Lee,
Abstract summary: Augmented Reality (AR) applications require robust real-time tracking of objects in the user's environment to correctly overlay virtual content.<n>Recent advances in computer vision have produced highly accurate deep learning-based object trackers, but these models are typically too heavy in computation and memory for wearable AR devices.<n>We present a lightweight RGB object tracking algorithm designed specifically for resource-constrained AR platforms.
Score: 2.3102477806624084
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Augmented Reality (AR) applications often require robust real-time tracking of objects in the user's environment to correctly overlay virtual content. Recent advances in computer vision have produced highly accurate deep learning-based object trackers, but these models are typically too heavy in computation and memory for wearable AR devices. In this paper, we present a lightweight RGB object tracking algorithm designed specifically for resource-constrained AR platforms. The proposed tracker employs a compact Siamese neural network architecture and incorporates optimization techniques such as model pruning, quantization, and knowledge distillation to drastically reduce model size and inference cost while maintaining high tracking accuracy. We train the tracker offline on large video datasets using deep convolutional neural networks and then deploy it on-device for real-time tracking. Experimental results on standard tracking benchmarks show that our approach achieves comparable accuracy to state-of-the-art trackers, yet runs in real-time on a mobile AR headset at around 30 FPS -- more than an order of magnitude faster than prior high-performance trackers on the same hardware. This work enables practical, robust object tracking for AR use-cases, opening the door to more interactive and dynamic AR experiences on lightweight devices.

Related papers

SwiTrack: Tri-State Switch for Cross-Modal Object Tracking [74.15663758681849]
Cross-modal object tracking (CMOT) is an emerging task that maintains target consistency while the video stream switches between different modalities.<n>We propose SwiTrack, a novel state-switching framework that redefines CMOT through the deployment of three specialized streams.
arXiv Detail & Related papers (2025-11-20T10:52:54Z)
SMTrack: End-to-End Trained Spiking Neural Networks for Multi-Object Tracking in RGB Videos [8.673924616309698]
Brain-inspired Spiking Neural Networks (SNNs) exhibit significant potential for low-power computation.<n>Their application in visual tasks remains largely confined to image classification, object detection, and event-based tracking.<n>We propose SMTrack-the first directly trained deep SNN framework for end-to-end multi-object tracking on standard RGB videos.
arXiv Detail & Related papers (2025-08-20T10:47:37Z)
RGBTrack: Fast, Robust Depth-Free 6D Pose Estimation and Tracking [24.866881488130407]
We introduce a robust framework, RGBTrack, for real-time 6D pose estimation and tracking.<n>We devise a novel binary search strategy combined with a render-and-compare mechanism to efficiently infer depth.<n>We show that RGBTrack's novel depth-free approach achieves competitive accuracy and real-time performance.
arXiv Detail & Related papers (2025-06-20T16:19:28Z)
LiteTracker: Leveraging Temporal Causality for Accurate Low-latency Tissue Tracking [86.67583223579851]
LiteTracker is a low-latency method for tissue tracking in endoscopic video streams.<n> LiteTracker builds on a state-of-the-art long-term point tracking method, and introduces a set of training-free runtime optimizations.
arXiv Detail & Related papers (2025-04-14T05:53:57Z)
Depth Attention for Robust RGB Tracking [21.897255266278275]
We propose a new framework that leverages monocular depth estimation to counter the challenges of tracking targets that are out of view or affected by motion blur in RGB video sequences. To the best of our knowledge, we are the first to propose a depth attention mechanism and to formulate a simple framework that allows seamlessly integration of depth information with state of the art tracking algorithms.
arXiv Detail & Related papers (2024-10-27T09:47:47Z)
Exploring Dynamic Transformer for Efficient Object Tracking [58.120191254379854]
We propose DyTrack, a dynamic transformer framework for efficient tracking.<n>DyTrack automatically learns to configure proper reasoning routes for various inputs, gaining better utilization of the available computational budget.<n>Experiments on multiple benchmarks demonstrate that DyTrack achieves promising speed-precision trade-offs with only a single model.
arXiv Detail & Related papers (2024-03-26T12:31:58Z)
PNAS-MOT: Multi-Modal Object Tracking with Pareto Neural Architecture Search [64.28335667655129]
Multiple object tracking is a critical task in autonomous driving. As tracking accuracy improves, neural networks become increasingly complex, posing challenges for their practical application in real driving scenarios due to the high level of latency. In this paper, we explore the use of the neural architecture search (NAS) methods to search for efficient architectures for tracking, aiming for low real-time latency while maintaining relatively high accuracy.
arXiv Detail & Related papers (2024-03-23T04:18:49Z)
BEVTrack: A Simple and Strong Baseline for 3D Single Object Tracking in Bird's-Eye View [54.48052449493636]
3D Single Object Tracking (SOT) is a fundamental task in computer vision and plays a critical role in applications like autonomous driving.<n>We propose BEVTrack, a simple yet effective motion-based tracking method.<n>We show that BEVTrack achieves state-of-the-art results while operating at 200 FPS, enabling real-time applicability.
arXiv Detail & Related papers (2023-09-05T12:42:26Z)
Learnable Online Graph Representations for 3D Multi-Object Tracking [156.58876381318402]
We propose a unified and learning based approach to the 3D MOT problem. We employ a Neural Message Passing network for data association that is fully trainable. We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.
arXiv Detail & Related papers (2021-04-23T17:59:28Z)
Robust Visual Object Tracking with Two-Stream Residual Convolutional Networks [62.836429958476735]
We propose a Two-Stream Residual Convolutional Network (TS-RCN) for visual tracking. Our TS-RCN can be integrated with existing deep learning based visual trackers. To further improve the tracking performance, we adopt a "wider" residual network ResNeXt as its feature extraction backbone.
arXiv Detail & Related papers (2020-05-13T19:05:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.