3D Siamese Voxel-to-BEV Tracker for Sparse Point Clouds
- URL: http://arxiv.org/abs/2111.04426v1
- Date: Mon, 8 Nov 2021 12:47:11 GMT
- Title: 3D Siamese Voxel-to-BEV Tracker for Sparse Point Clouds
- Authors: Le Hui, Lingpeng Wang, Mingmei Cheng, Jin Xie, Jian Yang
- Abstract summary: 3D object tracking in point clouds is still a challenging problem due to the sparsity of LiDAR points in dynamic environments.
We propose a Siamese voxel-to-BEV tracker, which can significantly improve the tracking performance in sparse 3D point clouds.
- Score: 19.97270407211052
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D object tracking in point clouds is still a challenging problem due to the
sparsity of LiDAR points in dynamic environments. In this work, we propose a
Siamese voxel-to-BEV tracker, which can significantly improve the tracking
performance in sparse 3D point clouds. Specifically, it consists of a Siamese
shape-aware feature learning network and a voxel-to-BEV target localization
network. The Siamese shape-aware feature learning network can capture 3D shape
information of the object to learn the discriminative features of the object so
that the potential target from the background in sparse point clouds can be
identified. To this end, we first perform template feature embedding to embed
the template's feature into the potential target and then generate a dense 3D
shape to characterize the shape information of the potential target. For
localizing the tracked target, the voxel-to-BEV target localization network
regresses the target's 2D center and the $z$-axis center from the dense bird's
eye view (BEV) feature map in an anchor-free manner. Concretely, we compress
the voxelized point cloud along $z$-axis through max pooling to obtain a dense
BEV feature map, where the regression of the 2D center and the $z$-axis center
can be performed more effectively. Extensive evaluation on the KITTI and
nuScenes datasets shows that our method significantly outperforms the current
state-of-the-art methods by a large margin.
Related papers
- FASTC: A Fast Attentional Framework for Semantic Traversability Classification Using Point Cloud [7.711666704468952]
We address the problem of traversability assessment using point clouds.
We propose a pillar feature extraction module that utilizes PointNet to capture features from point clouds organized in vertical volume.
We then propose a newtemporal attention module to fuse multi-frame information, which can properly handle the varying density problem of LIDAR point clouds.
arXiv Detail & Related papers (2024-06-24T12:01:55Z) - Dynamic 3D Point Cloud Sequences as 2D Videos [81.46246338686478]
3D point cloud sequences serve as one of the most common and practical representation modalities of real-world environments.
We propose a novel generic representation called textitStructured Point Cloud Videos (SPCVs)
SPCVs re-organizes a point cloud sequence as a 2D video with spatial smoothness and temporal consistency, where the pixel values correspond to the 3D coordinates of points.
arXiv Detail & Related papers (2024-03-02T08:18:57Z) - BSH-Det3D: Improving 3D Object Detection with BEV Shape Heatmap [10.060577111347152]
We propose a novel LiDAR-based 3D object detection model named BSH-Det3D.
It applies an effective way to enhance spatial features by estimating complete shapes from a bird's eye view.
Experiments on the KITTI benchmark achieve state-of-the-art (SOTA) performance in terms of accuracy and speed.
arXiv Detail & Related papers (2023-03-03T15:13:11Z) - OA-BEV: Bringing Object Awareness to Bird's-Eye-View Representation for
Multi-Camera 3D Object Detection [78.38062015443195]
OA-BEV is a network that can be plugged into the BEV-based 3D object detection framework.
Our method achieves consistent improvements over the BEV-based baselines in terms of both average precision and nuScenes detection score.
arXiv Detail & Related papers (2023-01-13T06:02:31Z) - CG-SSD: Corner Guided Single Stage 3D Object Detection from LiDAR Point
Cloud [4.110053032708927]
In a real world scene, the LiDAR can only acquire a limited object surface point clouds, but the center point of the object does not exist.
We propose a corner-guided anchor-free single-stage 3D object detection model (CG-SSD)
CG-SSD achieves the state-of-art performance on the ONCE benchmark for supervised 3D object detection using single frame point cloud data.
arXiv Detail & Related papers (2022-02-24T02:30:15Z) - Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR-based
Perception [122.53774221136193]
State-of-the-art methods for driving-scene LiDAR-based perception often project the point clouds to 2D space and then process them via 2D convolution.
A natural remedy is to utilize the 3D voxelization and 3D convolution network.
We propose a new framework for the outdoor LiDAR segmentation, where cylindrical partition and asymmetrical 3D convolution networks are designed to explore the 3D geometric pattern.
arXiv Detail & Related papers (2021-09-12T06:25:11Z) - Anchor-free 3D Single Stage Detector with Mask-Guided Attention for
Point Cloud [79.39041453836793]
We develop a novel single-stage 3D detector for point clouds in an anchor-free manner.
We overcome this by converting the voxel-based sparse 3D feature volumes into the sparse 2D feature maps.
We propose an IoU-based detection confidence re-calibration scheme to improve the correlation between the detection confidence score and the accuracy of the bounding box regression.
arXiv Detail & Related papers (2021-08-08T13:42:13Z) - DV-Det: Efficient 3D Point Cloud Object Detection with Dynamic
Voxelization [0.0]
We propose a novel two-stage framework for the efficient 3D point cloud object detection.
We parse the raw point cloud data directly in the 3D space yet achieve impressive efficiency and accuracy.
We highlight our KITTI 3D object detection dataset with 75 FPS and on Open dataset with 25 FPS inference speed with satisfactory accuracy.
arXiv Detail & Related papers (2021-07-27T10:07:39Z) - Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR
Segmentation [81.02742110604161]
State-of-the-art methods for large-scale driving-scene LiDAR segmentation often project the point clouds to 2D space and then process them via 2D convolution.
We propose a new framework for the outdoor LiDAR segmentation, where cylindrical partition and asymmetrical 3D convolution networks are designed to explore the 3D geometric pat-tern.
Our method achieves the 1st place in the leaderboard of Semantic KITTI and outperforms existing methods on nuScenes with a noticeable margin, about 4%.
arXiv Detail & Related papers (2020-11-19T18:53:11Z) - InfoFocus: 3D Object Detection for Autonomous Driving with Dynamic
Information Modeling [65.47126868838836]
We propose a novel 3D object detection framework with dynamic information modeling.
Coarse predictions are generated in the first stage via a voxel-based region proposal network.
Experiments are conducted on the large-scale nuScenes 3D detection benchmark.
arXiv Detail & Related papers (2020-07-16T18:27:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.