V-DETR: DETR with Vertex Relative Position Encoding for 3D Object
Detection
- URL: http://arxiv.org/abs/2308.04409v1
- Date: Tue, 8 Aug 2023 17:14:14 GMT
- Title: V-DETR: DETR with Vertex Relative Position Encoding for 3D Object
Detection
- Authors: Yichao Shen, Zigang Geng, Yuhui Yuan, Yutong Lin, Ze Liu, Chunyu Wang,
Han Hu, Nanning Zheng, Baining Guo
- Abstract summary: We introduce a highly performant 3D object detector for point clouds using the DETR framework.
To address the limitation, we introduce a novel 3D Relative Position (3DV-RPE) method.
We show exceptional results on the challenging ScanNetV2 benchmark.
- Score: 73.37781484123536
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a highly performant 3D object detector for point clouds using
the DETR framework. The prior attempts all end up with suboptimal results
because they fail to learn accurate inductive biases from the limited scale of
training data. In particular, the queries often attend to points that are far
away from the target objects, violating the locality principle in object
detection. To address the limitation, we introduce a novel 3D Vertex Relative
Position Encoding (3DV-RPE) method which computes position encoding for each
point based on its relative position to the 3D boxes predicted by the queries
in each decoder layer, thus providing clear information to guide the model to
focus on points near the objects, in accordance with the principle of locality.
In addition, we systematically improve the pipeline from various aspects such
as data normalization based on our understanding of the task. We show
exceptional results on the challenging ScanNetV2 benchmark, achieving
significant improvements over the previous 3DETR in
$\rm{AP}_{25}$/$\rm{AP}_{50}$ from 65.0\%/47.0\% to 77.8\%/66.0\%,
respectively. In addition, our method sets a new record on ScanNetV2 and SUN
RGB-D datasets.Code will be released at http://github.com/yichaoshen-MS/V-DETR.
Related papers
- MVSDet: Multi-View Indoor 3D Object Detection via Efficient Plane Sweeps [51.44887282336391]
Key challenge of multi-view indoor 3D object detection is to infer accurate geometry information from images for precise 3D detection.
Previous method relies on NeRF for geometry reasoning.
We propose MVSDet which utilizes plane sweep for geometry-aware 3D object detection.
arXiv Detail & Related papers (2024-10-28T21:58:41Z) - Point-DETR3D: Leveraging Imagery Data with Spatial Point Prior for Weakly Semi-supervised 3D Object Detection [32.86369670395974]
We introduce Point-DETR3D, a teacher-student framework for weakly semi-supervised 3D detection.
With only 5% of labeled data, Point-DETR3D achieves over 90% performance of its fully supervised counterpart.
arXiv Detail & Related papers (2024-03-22T16:11:29Z) - CAGroup3D: Class-Aware Grouping for 3D Object Detection on Point Clouds [55.44204039410225]
We present a novel two-stage fully sparse convolutional 3D object detection framework, named CAGroup3D.
Our proposed method first generates some high-quality 3D proposals by leveraging the class-aware local group strategy on the object surface voxels.
To recover the features of missed voxels due to incorrect voxel-wise segmentation, we build a fully sparse convolutional RoI pooling module.
arXiv Detail & Related papers (2022-10-09T13:38:48Z) - Anytime-Lidar: Deadline-aware 3D Object Detection [5.491655566898372]
We propose a scheduling algorithm, which intelligently selects the subset of the components to make effective time and accuracy trade-off on the fly.
We apply our approach to a state-of-art 3D object detection network, PointPillars, and evaluate its performance on Jetson Xavier AGX dataset.
arXiv Detail & Related papers (2022-08-25T16:07:10Z) - RBGNet: Ray-based Grouping for 3D Object Detection [104.98776095895641]
We propose the RBGNet framework, a voting-based 3D detector for accurate 3D object detection from point clouds.
We propose a ray-based feature grouping module, which aggregates the point-wise features on object surfaces using a group of determined rays.
Our model achieves state-of-the-art 3D detection performance on ScanNet V2 and SUN RGB-D with remarkable performance gains.
arXiv Detail & Related papers (2022-04-05T14:42:57Z) - 3D Object Detection Combining Semantic and Geometric Features from Point
Clouds [19.127930862527666]
We propose a novel end-to-end two-stage 3D object detector named SGNet for point clouds scenes.
The VTPM is a Voxel-Point-Based Module that finally implements 3D object detection in point space.
As of September 19, 2021, for KITTI dataset, SGNet ranked 1st in 3D and BEV detection on cyclists with easy difficulty level, and 2nd in the 3D detection of moderate cyclists.
arXiv Detail & Related papers (2021-10-10T04:43:27Z) - Progressive Coordinate Transforms for Monocular 3D Object Detection [52.00071336733109]
We propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
In this paper, we propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
arXiv Detail & Related papers (2021-08-12T15:22:33Z) - H3D: Benchmark on Semantic Segmentation of High-Resolution 3D Point
Clouds and textured Meshes from UAV LiDAR and Multi-View-Stereo [4.263987603222371]
This paper introduces a 3D dataset which is unique in three ways.
It depicts the village of Hessigheim (Germany) henceforth referred to as H3D.
It is designed for promoting research in the field of 3D data analysis on one hand and to evaluate and rank emerging approaches.
arXiv Detail & Related papers (2021-02-10T09:33:48Z) - DOPS: Learning to Detect 3D Objects and Predict their 3D Shapes [54.239416488865565]
We propose a fast single-stage 3D object detection method for LIDAR data.
The core novelty of our method is a fast, single-pass architecture that both detects objects in 3D and estimates their shapes.
We find that our proposed method achieves state-of-the-art results by 5% on object detection in ScanNet scenes, and it gets top results by 3.4% in the Open dataset.
arXiv Detail & Related papers (2020-04-02T17:48:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.