Exploring Point-BEV Fusion for 3D Point Cloud Object Tracking with
Transformer
- URL: http://arxiv.org/abs/2208.05216v1
- Date: Wed, 10 Aug 2022 08:36:46 GMT
- Title: Exploring Point-BEV Fusion for 3D Point Cloud Object Tracking with
Transformer
- Authors: Zhipeng Luo, Changqing Zhou, Liang Pan, Gongjie Zhang, Tianrui Liu,
Yueru Luo, Haiyu Zhao, Ziwei Liu, Shijian Lu
- Abstract summary: 3D object tracking aims to predict the location and orientation of an object in consecutive frames given an object template.
Motivated by the success of transformers, we propose Point Tracking TRansformer (PTTR), which efficiently predicts high-quality 3D tracking results.
- Score: 62.68401838976208
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the prevalence of LiDAR sensors in autonomous driving, 3D object
tracking has received increasing attention. In a point cloud sequence, 3D
object tracking aims to predict the location and orientation of an object in
consecutive frames given an object template. Motivated by the success of
transformers, we propose Point Tracking TRansformer (PTTR), which efficiently
predicts high-quality 3D tracking results in a coarse-to-fine manner with the
help of transformer operations. PTTR consists of three novel designs. 1)
Instead of random sampling, we design Relation-Aware Sampling to preserve
relevant points to the given template during subsampling. 2) We propose a Point
Relation Transformer for effective feature aggregation and feature matching
between the template and search region. 3) Based on the coarse tracking
results, we employ a novel Prediction Refinement Module to obtain the final
refined prediction through local feature pooling. In addition, motivated by the
favorable properties of the Bird's-Eye View (BEV) of point clouds in capturing
object motion, we further design a more advanced framework named PTTR++, which
incorporates both the point-wise view and BEV representation to exploit their
complementary effect in generating high-quality tracking results. PTTR++
substantially boosts the tracking performance on top of PTTR with low
computational overhead. Extensive experiments over multiple datasets show that
our proposed approaches achieve superior 3D tracking accuracy and efficiency.
Related papers
- PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection [66.94819989912823]
We propose a point-trajectory transformer with long short-term memory for efficient temporal 3D object detection.
We use point clouds of current-frame objects and their historical trajectories as input to minimize the memory bank storage requirement.
We conduct extensive experiments on the large-scale dataset to demonstrate that our approach performs well against state-of-the-art methods.
arXiv Detail & Related papers (2023-12-13T18:59:13Z) - PointOcc: Cylindrical Tri-Perspective View for Point-based 3D Semantic
Occupancy Prediction [72.75478398447396]
We propose a cylindrical tri-perspective view to represent point clouds effectively and comprehensively.
Considering the distance distribution of LiDAR point clouds, we construct the tri-perspective view in the cylindrical coordinate system.
We employ spatial group pooling to maintain structural details during projection and adopt 2D backbones to efficiently process each TPV plane.
arXiv Detail & Related papers (2023-08-31T17:57:17Z) - OCBEV: Object-Centric BEV Transformer for Multi-View 3D Object Detection [29.530177591608297]
Multi-view 3D object detection is becoming popular in autonomous driving due to its high effectiveness and low cost.
Most of the current state-of-the-art detectors follow the query-based bird's-eye-view (BEV) paradigm.
We propose an Object-Centric query-BEV detector OCBEV, which can carve the temporal and spatial cues of moving targets more effectively.
arXiv Detail & Related papers (2023-06-02T17:59:48Z) - Real-time 3D Single Object Tracking with Transformer [5.000768859809606]
Point-Track-Transformer (PTT) module for point cloud-based 3D single object tracking task.
PTT module generates fine-tuned attention features by computing attention weights.
In PTT-Net, we embed PTT into the voting stage and proposal generation stage.
arXiv Detail & Related papers (2022-09-02T07:36:20Z) - SRCN3D: Sparse R-CNN 3D for Compact Convolutional Multi-View 3D Object
Detection and Tracking [12.285423418301683]
This paper proposes Sparse R-CNN 3D (SRCN3D), a novel two-stage fully-sparse detector that incorporates sparse queries, sparse attention with box-wise sampling, and sparse prediction.
Experiments on nuScenes dataset demonstrate that SRCN3D achieves competitive performance in both 3D object detection and multi-object tracking tasks.
arXiv Detail & Related papers (2022-06-29T07:58:39Z) - SASA: Semantics-Augmented Set Abstraction for Point-based 3D Object
Detection [78.90102636266276]
We propose a novel set abstraction method named Semantics-Augmented Set Abstraction (SASA)
Based on the estimated point-wise foreground scores, we then propose a semantics-guided point sampling algorithm to help retain more important foreground points during down-sampling.
In practice, SASA shows to be effective in identifying valuable points related to foreground objects and improving feature learning for point-based 3D detection.
arXiv Detail & Related papers (2022-01-06T08:54:47Z) - PTTR: Relational 3D Point Cloud Object Tracking with Transformer [37.06516957454285]
In a point cloud sequence, 3D object tracking aims to predict the location and orientation of an object in the current search point cloud given a template point cloud.
We propose Point Tracking TRansformer (PTTR), which efficiently predicts high-quality 3D tracking results in a coarse-to-fine manner with the help of transformer operations.
arXiv Detail & Related papers (2021-12-06T08:28:05Z) - Improving 3D Object Detection with Channel-wise Transformer [58.668922561622466]
We propose a two-stage 3D object detection framework (CT3D) with minimal hand-crafted design.
CT3D simultaneously performs proposal-aware embedding and channel-wise context aggregation.
It achieves the AP of 81.77% in the moderate car category on the KITTI test 3D detection benchmark.
arXiv Detail & Related papers (2021-08-23T02:03:40Z) - DV-Det: Efficient 3D Point Cloud Object Detection with Dynamic
Voxelization [0.0]
We propose a novel two-stage framework for the efficient 3D point cloud object detection.
We parse the raw point cloud data directly in the 3D space yet achieve impressive efficiency and accuracy.
We highlight our KITTI 3D object detection dataset with 75 FPS and on Open dataset with 25 FPS inference speed with satisfactory accuracy.
arXiv Detail & Related papers (2021-07-27T10:07:39Z) - ST3D: Self-training for Unsupervised Domain Adaptation on 3D
ObjectDetection [78.71826145162092]
We present a new domain adaptive self-training pipeline, named ST3D, for unsupervised domain adaptation on 3D object detection from point clouds.
Our ST3D achieves state-of-the-art performance on all evaluated datasets and even surpasses fully supervised results on KITTI 3D object detection benchmark.
arXiv Detail & Related papers (2021-03-09T10:51:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.