BEVTrack: A Simple and Strong Baseline for 3D Single Object Tracking in Bird's-Eye View
- URL: http://arxiv.org/abs/2309.02185v7
- Date: Fri, 26 Sep 2025 07:03:08 GMT
- Title: BEVTrack: A Simple and Strong Baseline for 3D Single Object Tracking in Bird's-Eye View
- Authors: Yuxiang Yang, Yingqi Deng, Mian Pan, Jing Zhang, Zheng-Jun Zha,
- Abstract summary: 3D Single Object Tracking (SOT) is a fundamental task in computer vision and plays a critical role in applications like autonomous driving.<n>We propose BEVTrack, a simple yet effective motion-based tracking method.<n>We show that BEVTrack achieves state-of-the-art results while operating at 200 FPS, enabling real-time applicability.
- Score: 54.48052449493636
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D Single Object Tracking (SOT) is a fundamental task in computer vision and plays a critical role in applications like autonomous driving. However, existing algorithms often involve complex designs and multiple loss functions, making model training and deployment challenging. Furthermore, their reliance on fixed probability distribution assumptions (e.g., Laplacian or Gaussian) hinders their ability to adapt to diverse target characteristics such as varying sizes and motion patterns, ultimately affecting tracking precision and robustness. To address these issues, we propose BEVTrack, a simple yet effective motion-based tracking method. BEVTrack directly estimates object motion in Bird's-Eye View (BEV) using a single regression loss. To enhance accuracy for targets with diverse attributes, it learns adaptive likelihood functions tailored to individual targets, avoiding the limitations of fixed distribution assumptions in previous methods. This approach provides valuable priors for tracking and significantly boosts performance. Comprehensive experiments on KITTI, NuScenes, and Waymo Open Dataset demonstrate that BEVTrack achieves state-of-the-art results while operating at 200 FPS, enabling real-time applicability. The code will be released at https://github.com/xmm-prio/BEVTrack.
Related papers
- Learning Occlusion-Robust Vision Transformers for Real-Time UAV Tracking [11.146155422858824]
Single-stream architectures using Vision Transformer (ViT) backbones show great potential for real-time UAV tracking.
We propose to learn Occlusion-Robust Representations (ORR) based on ViTs for UAV tracking.
We also propose an Adaptive Feature-Based Knowledge Distillation (AFKD) method to create a more compact tracker.
arXiv Detail & Related papers (2025-04-12T14:06:50Z) - Learning Motion Blur Robust Vision Transformers with Dynamic Early Exit for Real-Time UAV Tracking [14.382072224997074]
Single-stream architectures utilizing pre-trained ViT backbones offer improved performance, efficiency, and robustness.
We boost the efficiency of this framework by tailoring it into an adaptive framework that dynamically exits Transformer blocks for real-time UAV tracking.
We also improve the effectiveness of ViTs in handling motion blur, a common issue in UAV tracking caused by the fast movements of either the UAV, the tracked objects, or both.
arXiv Detail & Related papers (2024-07-07T14:10:04Z) - Robust Visual Tracking via Iterative Gradient Descent and Threshold Selection [4.978166837959101]
We introduce a novel robust linear regression estimator, which achieves favorable performance when the error vector follows i.i.d Gaussian-Laplacian distribution.
In addition, we expend IGDTS to a generative tracker, and apply IGDTS-distance to measure the deviation between the sample and the model.
Experimental results on several challenging image sequences show that the proposed tracker outperformance existing trackers.
arXiv Detail & Related papers (2024-06-02T01:51:09Z) - PillarTrack: Redesigning Pillar-based Transformer Network for Single Object Tracking on Point Clouds [5.524413892353708]
LiDAR-based 3D single object tracking (3D SOT) is a critical issue in robotics and autonomous driving.
We propose PillarTrack, a pillar-based 3D single object tracking framework.
PillarTrack achieves state-of-the-art performance on the KITTI and nuScenes dataset and enables real-time tracking speed.
arXiv Detail & Related papers (2024-04-11T06:06:56Z) - EasyTrack: Efficient and Compact One-stream 3D Point Clouds Tracker [35.74677036815288]
We propose a neat and compact one-stream transformer 3D SOT paradigm, termed as textbfEasyTrack.
A 3D point clouds tracking feature pre-training module is developed to exploit the masked autoencoding for learning 3D point clouds tracking representations.
A target location network in the dense bird's eye view (BEV) feature space is constructed for target classification and regression.
arXiv Detail & Related papers (2024-04-09T02:47:52Z) - Exploring Dynamic Transformer for Efficient Object Tracking [58.120191254379854]
We propose DyTrack, a dynamic transformer framework for efficient tracking.
DyTrack automatically learns to configure proper reasoning routes for various inputs, gaining better utilization of the available computational budget.
Experiments on multiple benchmarks demonstrate that DyTrack achieves promising speed-precision trade-offs with only a single model.
arXiv Detail & Related papers (2024-03-26T12:31:58Z) - SeqTrack3D: Exploring Sequence Information for Robust 3D Point Cloud
Tracking [26.405519771454102]
We introduce Sequence-to-Sequence tracking paradigm and a tracker named SeqTrack3D to capture target motion across continuous frames.
This novel method ensures robust tracking by leveraging location priors from historical boxes, even in scenes with sparse points.
Experiments conducted on large-scale datasets show that SeqTrack3D achieves new state-of-the-art performances.
arXiv Detail & Related papers (2024-02-26T02:14:54Z) - ByteTrackV2: 2D and 3D Multi-Object Tracking by Associating Every
Detection Box [81.45219802386444]
Multi-object tracking (MOT) aims at estimating bounding boxes and identities of objects across video frames.
We propose a hierarchical data association strategy to mine the true objects in low-score detection boxes.
In 3D scenarios, it is much easier for the tracker to predict object velocities in the world coordinate.
arXiv Detail & Related papers (2023-03-27T15:35:21Z) - An Effective Motion-Centric Paradigm for 3D Single Object Tracking in
Point Clouds [50.19288542498838]
3D single object tracking in LiDAR point clouds (LiDAR SOT) plays a crucial role in autonomous driving.
Current approaches all follow the Siamese paradigm based on appearance matching.
We introduce a motion-centric paradigm to handle LiDAR SOT from a new perspective.
arXiv Detail & Related papers (2023-03-21T17:28:44Z) - CAMO-MOT: Combined Appearance-Motion Optimization for 3D Multi-Object
Tracking with Camera-LiDAR Fusion [34.42289908350286]
3D Multi-object tracking (MOT) ensures consistency during continuous dynamic detection.
It can be challenging to accurately track the irregular motion of objects for LiDAR-based methods.
We propose a novel camera-LiDAR fusion 3D MOT framework based on the Combined Appearance-Motion Optimization (CAMO-MOT)
arXiv Detail & Related papers (2022-09-06T14:41:38Z) - Learnable Online Graph Representations for 3D Multi-Object Tracking [156.58876381318402]
We propose a unified and learning based approach to the 3D MOT problem.
We employ a Neural Message Passing network for data association that is fully trainable.
We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.
arXiv Detail & Related papers (2021-04-23T17:59:28Z) - Monocular Quasi-Dense 3D Object Tracking [99.51683944057191]
A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects and planning the observer's actions in numerous applications such as autonomous driving.
We propose a framework that can effectively associate moving objects over time and estimate their full 3D bounding box information from a sequence of 2D images captured on a moving platform.
arXiv Detail & Related papers (2021-03-12T15:30:02Z) - Tracklets Predicting Based Adaptive Graph Tracking [51.352829280902114]
We present an accurate and end-to-end learning framework for multi-object tracking, namely textbfTPAGT.
It re-extracts the features of the tracklets in the current frame based on motion predicting, which is the key to solve the problem of features inconsistent.
arXiv Detail & Related papers (2020-10-18T16:16:49Z) - Factor Graph based 3D Multi-Object Tracking in Point Clouds [8.411514688735183]
We propose a novel optimization-based approach that does not rely on explicit and fixed assignments.
We demonstrate its performance on the real world KITTI tracking dataset and achieve better results than many state-of-the-art algorithms.
arXiv Detail & Related papers (2020-08-12T13:34:46Z) - Robust Visual Object Tracking with Two-Stream Residual Convolutional
Networks [62.836429958476735]
We propose a Two-Stream Residual Convolutional Network (TS-RCN) for visual tracking.
Our TS-RCN can be integrated with existing deep learning based visual trackers.
To further improve the tracking performance, we adopt a "wider" residual network ResNeXt as its feature extraction backbone.
arXiv Detail & Related papers (2020-05-13T19:05:42Z) - ArTIST: Autoregressive Trajectory Inpainting and Scoring for Tracking [80.02322563402758]
One of the core components in online multiple object tracking (MOT) frameworks is associating new detections with existing tracklets.
We introduce a probabilistic autoregressive generative model to score tracklet proposals by directly measuring the likelihood that a tracklet represents natural motion.
arXiv Detail & Related papers (2020-04-16T06:43:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.