X-view: Non-egocentric Multi-View 3D Object Detector
- URL: http://arxiv.org/abs/2103.13001v1
- Date: Wed, 24 Mar 2021 06:13:35 GMT
- Title: X-view: Non-egocentric Multi-View 3D Object Detector
- Authors: Liang Xie, Guodong Xu, Deng Cai, Xiaofei He
- Abstract summary: We propose a novel multi-view-based 3D detection method, named X-view, to overcome the drawbacks of the multi-view methods.
X-view breaks through the traditional limitation about the perspective view whose original point must be consistent with the 3D Cartesian coordinate.
We conduct experiments on KITTI and NuScenes datasets to demonstrate the robustness and effectiveness of our proposed X-view.
- Score: 40.25127812839952
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D object detection algorithms for autonomous driving reason about 3D
obstacles either from 3D birds-eye view or perspective view or both. Recent
works attempt to improve the detection performance via mining and fusing from
multiple egocentric views. Although the egocentric perspective view alleviates
some weaknesses of the birds-eye view, the sectored grid partition becomes so
coarse in the distance that the targets and surrounding context mix together,
which makes the features less discriminative. In this paper, we generalize the
research on 3D multi-view learning and propose a novel multi-view-based 3D
detection method, named X-view, to overcome the drawbacks of the multi-view
methods. Specifically, X-view breaks through the traditional limitation about
the perspective view whose original point must be consistent with the 3D
Cartesian coordinate. X-view is designed as a general paradigm that can be
applied on almost any 3D detectors based on LiDAR with only little increment of
running time, no matter it is voxel/grid-based or raw-point-based. We conduct
experiments on KITTI and NuScenes datasets to demonstrate the robustness and
effectiveness of our proposed X-view. The results show that X-view obtains
consistent improvements when combined with four mainstream state-of-the-art 3D
methods: SECOND, PointRCNN, Part-A^2, and PV-RCNN.
Related papers
- SOGDet: Semantic-Occupancy Guided Multi-view 3D Object Detection [19.75965521357068]
We propose a novel approach called SOGDet (Semantic-Occupancy Guided Multi-view 3D Object Detection) to improve the accuracy of 3D object detection.
Our results show that SOGDet consistently enhance the performance of three baseline methods in terms of nuScenes Detection Score (NDS) and mean Average Precision (mAP)
This indicates that the combination of 3D object detection and 3D semantic occupancy leads to a more comprehensive perception of the 3D environment, thereby aiding build more robust autonomous driving systems.
arXiv Detail & Related papers (2023-08-26T07:38:21Z) - XVTP3D: Cross-view Trajectory Prediction Using Shared 3D Queries for
Autonomous Driving [7.616422495497465]
Trajectory prediction with uncertainty is a critical and challenging task for autonomous driving.
We present a cross-view trajectory prediction method using shared 3D queries (XVTP3D)
The results of experiments on two publicly available datasets show that XVTP3D achieved state-of-the-art performance with consistent cross-view predictions.
arXiv Detail & Related papers (2023-08-17T03:35:13Z) - MVTN: Learning Multi-View Transformations for 3D Understanding [60.15214023270087]
We introduce the Multi-View Transformation Network (MVTN), which uses differentiable rendering to determine optimal view-points for 3D shape recognition.
MVTN can be trained end-to-end with any multi-view network for 3D shape recognition.
Our approach demonstrates state-of-the-art performance in 3D classification and shape retrieval on several benchmarks.
arXiv Detail & Related papers (2022-12-27T12:09:16Z) - Scatter Points in Space: 3D Detection from Multi-view Monocular Images [8.71944437852952]
3D object detection from monocular image(s) is a challenging and long-standing problem of computer vision.
Recent methods tend to aggregate multiview feature by sampling regular 3D grid densely in space.
We propose a learnable keypoints sampling method, which scatters pseudo surface points in 3D space, in order to keep data sparsity.
arXiv Detail & Related papers (2022-08-31T09:38:05Z) - A Simple Baseline for Multi-Camera 3D Object Detection [94.63944826540491]
3D object detection with surrounding cameras has been a promising direction for autonomous driving.
We present SimMOD, a Simple baseline for Multi-camera Object Detection.
We conduct extensive experiments on the 3D object detection benchmark of nuScenes to demonstrate the effectiveness of SimMOD.
arXiv Detail & Related papers (2022-08-22T03:38:01Z) - CVFNet: Real-time 3D Object Detection by Learning Cross View Features [11.402076835949824]
We present a real-time view-based single stage 3D object detector, namely CVFNet.
We first propose a novel Point-Range feature fusion module that deeply integrates point and range view features in multiple stages.
Then, a special Slice Pillar is designed to well maintain the 3D geometry when transforming the obtained deep point-view features into bird's eye view.
arXiv Detail & Related papers (2022-03-13T06:23:18Z) - VoxelTrack: Multi-Person 3D Human Pose Estimation and Tracking in the
Wild [98.69191256693703]
We present VoxelTrack for multi-person 3D pose estimation and tracking from a few cameras which are separated by wide baselines.
It employs a multi-branch network to jointly estimate 3D poses and re-identification (Re-ID) features for all people in the environment.
It outperforms the state-of-the-art methods by a large margin on three public datasets including Shelf, Campus and CMU Panoptic.
arXiv Detail & Related papers (2021-08-05T08:35:44Z) - From Multi-View to Hollow-3D: Hallucinated Hollow-3D R-CNN for 3D Object
Detection [101.20784125067559]
We propose a new architecture, namely Hallucinated Hollow-3D R-CNN, to address the problem of 3D object detection.
In our approach, we first extract the multi-view features by sequentially projecting the point clouds into the perspective view and the bird-eye view.
The 3D objects are detected via a box refinement module with a novel Hierarchical Voxel RoI Pooling operation.
arXiv Detail & Related papers (2021-07-30T02:00:06Z) - Unsupervised Learning of Visual 3D Keypoints for Control [104.92063943162896]
Learning sensorimotor control policies from high-dimensional images crucially relies on the quality of the underlying visual representations.
We propose a framework to learn such a 3D geometric structure directly from images in an end-to-end unsupervised manner.
These discovered 3D keypoints tend to meaningfully capture robot joints as well as object movements in a consistent manner across both time and 3D space.
arXiv Detail & Related papers (2021-06-14T17:59:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.