SemanticVoxels: Sequential Fusion for 3D Pedestrian Detection using
LiDAR Point Cloud and Semantic Segmentation
- URL: http://arxiv.org/abs/2009.12276v1
- Date: Fri, 25 Sep 2020 14:52:32 GMT
- Title: SemanticVoxels: Sequential Fusion for 3D Pedestrian Detection using
LiDAR Point Cloud and Semantic Segmentation
- Authors: Juncong Fei, Wenbo Chen, Philipp Heidenreich, Sascha Wirges, Christoph
Stiller
- Abstract summary: We propose a generalization of PointPainting to be able to apply fusion at different levels.
We show that SemanticVoxels achieves state-of-the-art performance in both 3D and bird's eye view pedestrian detection benchmarks.
- Score: 4.350338899049983
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D pedestrian detection is a challenging task in automated driving because
pedestrians are relatively small, frequently occluded and easily confused with
narrow vertical objects. LiDAR and camera are two commonly used sensor
modalities for this task, which should provide complementary information.
Unexpectedly, LiDAR-only detection methods tend to outperform multisensor
fusion methods in public benchmarks. Recently, PointPainting has been presented
to eliminate this performance drop by effectively fusing the output of a
semantic segmentation network instead of the raw image information. In this
paper, we propose a generalization of PointPainting to be able to apply fusion
at different levels. After the semantic augmentation of the point cloud, we
encode raw point data in pillars to get geometric features and semantic point
data in voxels to get semantic features and fuse them in an effective way.
Experimental results on the KITTI test set show that SemanticVoxels achieves
state-of-the-art performance in both 3D and bird's eye view pedestrian
detection benchmarks. In particular, our approach demonstrates its strength in
detecting challenging pedestrian cases and outperforms current state-of-the-art
approaches.
Related papers
- Paint and Distill: Boosting 3D Object Detection with Semantic Passing
Network [70.53093934205057]
3D object detection task from lidar or camera sensors is essential for autonomous driving.
We propose a novel semantic passing framework, named SPNet, to boost the performance of existing lidar-based 3D detection models.
arXiv Detail & Related papers (2022-07-12T12:35:34Z) - Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object
Detection [58.81316192862618]
Two critical sensors for 3D perception in autonomous driving are the camera and the LiDAR.
fusing these two modalities can significantly boost the performance of 3D perception models.
We benchmark the state-of-the-art fusion methods for the first time.
arXiv Detail & Related papers (2022-05-30T09:35:37Z) - CVFNet: Real-time 3D Object Detection by Learning Cross View Features [11.402076835949824]
We present a real-time view-based single stage 3D object detector, namely CVFNet.
We first propose a novel Point-Range feature fusion module that deeply integrates point and range view features in multiple stages.
Then, a special Slice Pillar is designed to well maintain the 3D geometry when transforming the obtained deep point-view features into bird's eye view.
arXiv Detail & Related papers (2022-03-13T06:23:18Z) - SASA: Semantics-Augmented Set Abstraction for Point-based 3D Object
Detection [78.90102636266276]
We propose a novel set abstraction method named Semantics-Augmented Set Abstraction (SASA)
Based on the estimated point-wise foreground scores, we then propose a semantics-guided point sampling algorithm to help retain more important foreground points during down-sampling.
In practice, SASA shows to be effective in identifying valuable points related to foreground objects and improving feature learning for point-based 3D detection.
arXiv Detail & Related papers (2022-01-06T08:54:47Z) - Progressive Coordinate Transforms for Monocular 3D Object Detection [52.00071336733109]
We propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
In this paper, we propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
arXiv Detail & Related papers (2021-08-12T15:22:33Z) - FusionPainting: Multimodal Fusion with Adaptive Attention for 3D Object
Detection [15.641616738865276]
We propose a general multimodal fusion framework FusionPainting to fuse the 2D RGB image and 3D point clouds at a semantic level for boosting the 3D object detection task.
Especially, the FusionPainting framework consists of three main modules: a multi-modal semantic segmentation module, an adaptive attention-based semantic fusion module, and a 3D object detector.
The effectiveness of the proposed framework has been verified on the large-scale nuScenes detection benchmark.
arXiv Detail & Related papers (2021-06-23T14:53:22Z) - PLUME: Efficient 3D Object Detection from Stereo Images [95.31278688164646]
Existing methods tackle the problem in two steps: first depth estimation is performed, a pseudo LiDAR point cloud representation is computed from the depth estimates, and then object detection is performed in 3D space.
We propose a model that unifies these two tasks in the same metric space.
Our approach achieves state-of-the-art performance on the challenging KITTI benchmark, with significantly reduced inference time compared with existing methods.
arXiv Detail & Related papers (2021-01-17T05:11:38Z) - RoIFusion: 3D Object Detection from LiDAR and Vision [7.878027048763662]
We propose a novel fusion algorithm by projecting a set of 3D Region of Interests (RoIs) from the point clouds to the 2D RoIs of the corresponding the images.
Our approach achieves state-of-the-art performance on the KITTI 3D object detection challenging benchmark.
arXiv Detail & Related papers (2020-09-09T20:23:27Z) - Anchor-free Small-scale Multispectral Pedestrian Detection [88.7497134369344]
We propose a method for effective and efficient multispectral fusion of the two modalities in an adapted single-stage anchor-free base architecture.
We aim at learning pedestrian representations based on object center and scale rather than direct bounding box predictions.
Results show our method's effectiveness in detecting small-scaled pedestrians.
arXiv Detail & Related papers (2020-08-19T13:13:01Z) - SegVoxelNet: Exploring Semantic Context and Depth-aware Features for 3D
Vehicle Detection from Point Cloud [39.99118618229583]
We propose a unified model SegVoxelNet to address the above two problems.
A semantic context encoder is proposed to leverage the free-of-charge semantic segmentation masks in the bird's eye view.
A novel depth-aware head is designed to explicitly model the distribution differences and each part of the depth-aware head is made to focus on its own target detection range.
arXiv Detail & Related papers (2020-02-13T02:42:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.