PVT++: A Simple End-to-End Latency-Aware Visual Tracking Framework
- URL: http://arxiv.org/abs/2211.11629v3
- Date: Mon, 17 Jul 2023 03:33:14 GMT
- Title: PVT++: A Simple End-to-End Latency-Aware Visual Tracking Framework
- Authors: Bowen Li, Ziyuan Huang, Junjie Ye, Yiming Li, Sebastian Scherer, Hang
Zhao, Changhong Fu
- Abstract summary: We present a framework for end-to-end latency-aware tracking, i.e., end-to-end predictive visual tracking (PVT++)
Unlike existing solutions that naively append Kalman Filters after trackers, PVT++ can be jointly optimized.
PVT++ can achieve significant performance gain on various trackers and exhibit higher accuracy than prior solutions.
- Score: 33.7932898514321
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Visual object tracking is essential to intelligent robots. Most existing
approaches have ignored the online latency that can cause severe performance
degradation during real-world processing. Especially for unmanned aerial
vehicles (UAVs), where robust tracking is more challenging and onboard
computation is limited, the latency issue can be fatal. In this work, we
present a simple framework for end-to-end latency-aware tracking, i.e.,
end-to-end predictive visual tracking (PVT++). Unlike existing solutions that
naively append Kalman Filters after trackers, PVT++ can be jointly optimized,
so that it takes not only motion information but can also leverage the rich
visual knowledge in most pre-trained tracker models for robust prediction.
Besides, to bridge the training-evaluation domain gap, we propose a relative
motion factor, empowering PVT++ to generalize to the challenging and complex
UAV tracking scenes. These careful designs have made the small-capacity
lightweight PVT++ a widely effective solution. Additionally, this work presents
an extended latency-aware evaluation benchmark for assessing an any-speed
tracker in the online setting. Empirical results on a robotic platform from the
aerial perspective show that PVT++ can achieve significant performance gain on
various trackers and exhibit higher accuracy than prior solutions, largely
mitigating the degradation brought by latency.
Related papers
- Learning Motion Blur Robust Vision Transformers with Dynamic Early Exit for Real-Time UAV Tracking [14.382072224997074]
Single-stream architectures utilizing pre-trained ViT backbones offer improved performance, efficiency, and robustness.
We boost the efficiency of this framework by tailoring it into an adaptive framework that dynamically exits Transformer blocks for real-time UAV tracking.
We also improve the effectiveness of ViTs in handling motion blur, a common issue in UAV tracking caused by the fast movements of either the UAV, the tracked objects, or both.
arXiv Detail & Related papers (2024-07-07T14:10:04Z) - Exploring Dynamic Transformer for Efficient Object Tracking [58.120191254379854]
We propose DyTrack, a dynamic transformer framework for efficient tracking.
DyTrack automatically learns to configure proper reasoning routes for various inputs, gaining better utilization of the available computational budget.
Experiments on multiple benchmarks demonstrate that DyTrack achieves promising speed-precision trade-offs with only a single model.
arXiv Detail & Related papers (2024-03-26T12:31:58Z) - PNAS-MOT: Multi-Modal Object Tracking with Pareto Neural Architecture Search [64.28335667655129]
Multiple object tracking is a critical task in autonomous driving.
As tracking accuracy improves, neural networks become increasingly complex, posing challenges for their practical application in real driving scenarios due to the high level of latency.
In this paper, we explore the use of the neural architecture search (NAS) methods to search for efficient architectures for tracking, aiming for low real-time latency while maintaining relatively high accuracy.
arXiv Detail & Related papers (2024-03-23T04:18:49Z) - Real-time Object Detection for Streaming Perception [84.2559631820007]
Streaming perception is proposed to jointly evaluate the latency and accuracy into a single metric for video online perception.
We build a simple and effective framework for streaming perception.
Our method achieves competitive performance on Argoverse-HD dataset and improves the AP by 4.9% compared to the strong baseline.
arXiv Detail & Related papers (2022-03-23T11:33:27Z) - Predictive Visual Tracking: A New Benchmark and Baseline Approach [27.87099869398515]
In the real-world scenarios, the onboard processing time of the image streams inevitably leads to a discrepancy between the tracking results and the real-world states.
Existing visual tracking benchmarks commonly run the trackers offline and ignore such latency in the evaluation.
In this work, we aim to deal with a more realistic problem of latency-aware tracking.
arXiv Detail & Related papers (2021-03-08T01:50:05Z) - PnPNet: End-to-End Perception and Prediction with Tracking in the Loop [82.97006521937101]
We tackle the problem of joint perception and motion forecasting in the context of self-driving vehicles.
We propose Net, an end-to-end model that takes as input sensor data, and outputs at each time step object tracks and their future level.
arXiv Detail & Related papers (2020-05-29T17:57:25Z) - Robust Visual Object Tracking with Two-Stream Residual Convolutional
Networks [62.836429958476735]
We propose a Two-Stream Residual Convolutional Network (TS-RCN) for visual tracking.
Our TS-RCN can be integrated with existing deep learning based visual trackers.
To further improve the tracking performance, we adopt a "wider" residual network ResNeXt as its feature extraction backbone.
arXiv Detail & Related papers (2020-05-13T19:05:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.