Related papers: BEVTrack: A Simple and Strong Baseline for 3D Single Object Tracking in Bird's-Eye View

BEVTrack: A Simple and Strong Baseline for 3D Single Object Tracking in Bird's-Eye View

URL: http://arxiv.org/abs/2309.02185v6
Date: Mon, 28 Oct 2024 05:18:14 GMT
Title: BEVTrack: A Simple and Strong Baseline for 3D Single Object Tracking in Bird's-Eye View
Authors: Yuxiang Yang, Yingqi Deng, Jinlong Fan, Jing Zhang, Zheng-Jun Zha,
Abstract summary: 3D Single Object Tracking (SOT) is a fundamental task of computer vision, proving essential for applications like autonomous driving. In this paper, we propose BEVTrack, a simple yet effective baseline method. By estimating the target motion in Bird's-Eye View (BEV) to perform tracking, BEVTrack demonstrates surprising simplicity from various aspects, i.e., network designs, training objectives, and tracking pipeline, while achieving superior performance.
Score: 56.77287041917277
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: 3D Single Object Tracking (SOT) is a fundamental task of computer vision, proving essential for applications like autonomous driving. It remains challenging to localize the target from surroundings due to appearance variations, distractors, and the high sparsity of point clouds. To address these issues, prior Siamese and motion-centric trackers both require elaborate designs and solving multiple subtasks. In this paper, we propose BEVTrack, a simple yet effective baseline method. By estimating the target motion in Bird's-Eye View (BEV) to perform tracking, BEVTrack demonstrates surprising simplicity from various aspects, i.e., network designs, training objectives, and tracking pipeline, while achieving superior performance. Besides, to achieve accurate regression for targets with diverse attributes (e.g., sizes and motion patterns), BEVTrack constructs the likelihood function with the learned underlying distributions adapted to different targets, rather than making a fixed Laplacian or Gaussian assumption as in previous works. This provides valuable priors for tracking and thus further boosts performance. While only using a single regression loss with a plain convolutional architecture, BEVTrack achieves state-of-the-art performance on three large-scale datasets, KITTI, NuScenes, and Waymo Open Dataset while maintaining a high inference speed of about 200 FPS. The code will be released at https://github.com/xmm-prio/BEVTrack.

Related papers

Learning Occlusion-Robust Vision Transformers for Real-Time UAV Tracking [11.146155422858824]
Single-stream architectures using Vision Transformer (ViT) backbones show great potential for real-time UAV tracking. We propose to learn Occlusion-Robust Representations (ORR) based on ViTs for UAV tracking. We also propose an Adaptive Feature-Based Knowledge Distillation (AFKD) method to create a more compact tracker.
arXiv Detail & Related papers (2025-04-12T14:06:50Z)
Learning Motion Blur Robust Vision Transformers with Dynamic Early Exit for Real-Time UAV Tracking [14.382072224997074]
Single-stream architectures utilizing pre-trained ViT backbones offer improved performance, efficiency, and robustness. We boost the efficiency of this framework by tailoring it into an adaptive framework that dynamically exits Transformer blocks for real-time UAV tracking. We also improve the effectiveness of ViTs in handling motion blur, a common issue in UAV tracking caused by the fast movements of either the UAV, the tracked objects, or both.
arXiv Detail & Related papers (2024-07-07T14:10:04Z)
PillarTrack: Redesigning Pillar-based Transformer Network for Single Object Tracking on Point Clouds [5.524413892353708]
LiDAR-based 3D single object tracking (3D SOT) is a critical issue in robotics and autonomous driving. We propose PillarTrack, a pillar-based 3D single object tracking framework. PillarTrack achieves state-of-the-art performance on the KITTI and nuScenes dataset and enables real-time tracking speed.
arXiv Detail & Related papers (2024-04-11T06:06:56Z)
EasyTrack: Efficient and Compact One-stream 3D Point Clouds Tracker [35.74677036815288]
We propose a neat and compact one-stream transformer 3D SOT paradigm, termed as textbfEasyTrack. A 3D point clouds tracking feature pre-training module is developed to exploit the masked autoencoding for learning 3D point clouds tracking representations. A target location network in the dense bird's eye view (BEV) feature space is constructed for target classification and regression.
arXiv Detail & Related papers (2024-04-09T02:47:52Z)
Exploring Dynamic Transformer for Efficient Object Tracking [58.120191254379854]
We propose DyTrack, a dynamic transformer framework for efficient tracking. DyTrack automatically learns to configure proper reasoning routes for various inputs, gaining better utilization of the available computational budget. Experiments on multiple benchmarks demonstrate that DyTrack achieves promising speed-precision trade-offs with only a single model.
arXiv Detail & Related papers (2024-03-26T12:31:58Z)
SeqTrack3D: Exploring Sequence Information for Robust 3D Point Cloud Tracking [26.405519771454102]
We introduce Sequence-to-Sequence tracking paradigm and a tracker named SeqTrack3D to capture target motion across continuous frames. This novel method ensures robust tracking by leveraging location priors from historical boxes, even in scenes with sparse points. Experiments conducted on large-scale datasets show that SeqTrack3D achieves new state-of-the-art performances.
arXiv Detail & Related papers (2024-02-26T02:14:54Z)
ByteTrackV2: 2D and 3D Multi-Object Tracking by Associating Every Detection Box [81.45219802386444]
Multi-object tracking (MOT) aims at estimating bounding boxes and identities of objects across video frames. We propose a hierarchical data association strategy to mine the true objects in low-score detection boxes. In 3D scenarios, it is much easier for the tracker to predict object velocities in the world coordinate.
arXiv Detail & Related papers (2023-03-27T15:35:21Z)
Learnable Online Graph Representations for 3D Multi-Object Tracking [156.58876381318402]
We propose a unified and learning based approach to the 3D MOT problem. We employ a Neural Message Passing network for data association that is fully trainable. We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.
arXiv Detail & Related papers (2021-04-23T17:59:28Z)
Monocular Quasi-Dense 3D Object Tracking [99.51683944057191]
A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects and planning the observer's actions in numerous applications such as autonomous driving. We propose a framework that can effectively associate moving objects over time and estimate their full 3D bounding box information from a sequence of 2D images captured on a moving platform.
arXiv Detail & Related papers (2021-03-12T15:30:02Z)
Factor Graph based 3D Multi-Object Tracking in Point Clouds [8.411514688735183]
We propose a novel optimization-based approach that does not rely on explicit and fixed assignments. We demonstrate its performance on the real world KITTI tracking dataset and achieve better results than many state-of-the-art algorithms.
arXiv Detail & Related papers (2020-08-12T13:34:46Z)
Robust Visual Object Tracking with Two-Stream Residual Convolutional Networks [62.836429958476735]
We propose a Two-Stream Residual Convolutional Network (TS-RCN) for visual tracking. Our TS-RCN can be integrated with existing deep learning based visual trackers. To further improve the tracking performance, we adopt a "wider" residual network ResNeXt as its feature extraction backbone.
arXiv Detail & Related papers (2020-05-13T19:05:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.