Frustum VoxNet for 3D object detection from RGB-D or Depth images
- URL: http://arxiv.org/abs/1910.05483v3
- Date: Thu, 25 May 2023 02:56:37 GMT
- Title: Frustum VoxNet for 3D object detection from RGB-D or Depth images
- Authors: Xiaoke Shen and Ioannis Stamos
- Abstract summary: We describe a new 3D object detection system from an RGB-D or depth-only point cloud.
Our system first detects objects in 2D (either RGB or pseudo-RGB constructed from depth)
The main novelty of our system has to do with determining which parts (3D proposals) of the frustums to voxelize.
- Score: 1.14219428942199
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, there have been a plethora of classification and detection systems
from RGB as well as 3D images. In this work, we describe a new 3D object
detection system from an RGB-D or depth-only point cloud. Our system first
detects objects in 2D (either RGB or pseudo-RGB constructed from depth). The
next step is to detect 3D objects within the 3D frustums these 2D detections
define. This is achieved by voxelizing parts of the frustums (since frustums
can be really large), instead of using the whole frustums as done in earlier
work. The main novelty of our system has to do with determining which parts (3D
proposals) of the frustums to voxelize, thus allowing us to provide high
resolution representations around the objects of interest. It also allows our
system to have reduced memory requirements. These 3D proposals are fed to an
efficient ResNet-based 3D Fully Convolutional Network (FCN). Our 3D detection
system is fast and can be integrated into a robotics platform. With respect to
systems that do not perform voxelization (such as PointNet), our methods can
operate without the requirement of subsampling of the datasets. We have also
introduced a pipelining approach that further improves the efficiency of our
system. Results on SUN RGB-D dataset show that our system, which is based on a
small network, can process 20 frames per second with comparable detection
results to the state-of-the-art, achieving a 2 times speedup.
Related papers
- 3D Instance Segmentation Using Deep Learning on RGB-D Indoor Data [0.0]
2D region based convolutional neural networks (Mask R-CNN) deep learning model with point based rending module is adapted to integrate with depth information to recognize and segment 3D instances of objects.
In order to generate 3D point cloud coordinates, segmented 2D pixels of recognized object regions in the RGB image are merged into (u, v) points of the depth image.
arXiv Detail & Related papers (2024-06-19T08:00:35Z) - 3D Small Object Detection with Dynamic Spatial Pruning [62.72638845817799]
We propose an efficient feature pruning strategy for 3D small object detection.
We present a multi-level 3D detector named DSPDet3D which benefits from high spatial resolution.
It takes less than 2s to directly process a whole building consisting of more than 4500k points while detecting out almost all objects.
arXiv Detail & Related papers (2023-05-05T17:57:04Z) - TR3D: Towards Real-Time Indoor 3D Object Detection [6.215404942415161]
TR3D is a fully-convolutional 3D object detection model trained end-to-end.
To take advantage of both point cloud and RGB inputs, we introduce an early fusion of 2D and 3D features.
Our model with early feature fusion, which we refer to as TR3D+FF, outperforms existing 3D object detection approaches on the SUN RGB-D dataset.
arXiv Detail & Related papers (2023-02-06T15:25:50Z) - CAGroup3D: Class-Aware Grouping for 3D Object Detection on Point Clouds [55.44204039410225]
We present a novel two-stage fully sparse convolutional 3D object detection framework, named CAGroup3D.
Our proposed method first generates some high-quality 3D proposals by leveraging the class-aware local group strategy on the object surface voxels.
To recover the features of missed voxels due to incorrect voxel-wise segmentation, we build a fully sparse convolutional RoI pooling module.
arXiv Detail & Related papers (2022-10-09T13:38:48Z) - RBGNet: Ray-based Grouping for 3D Object Detection [104.98776095895641]
We propose the RBGNet framework, a voting-based 3D detector for accurate 3D object detection from point clouds.
We propose a ray-based feature grouping module, which aggregates the point-wise features on object surfaces using a group of determined rays.
Our model achieves state-of-the-art 3D detection performance on ScanNet V2 and SUN RGB-D with remarkable performance gains.
arXiv Detail & Related papers (2022-04-05T14:42:57Z) - ODAM: Object Detection, Association, and Mapping using Posed RGB Video [36.16010611723447]
We present ODAM, a system for 3D Object Detection, Association, and Mapping using posed RGB videos.
The proposed system relies on a deep learning front-end to detect 3D objects from a given RGB frame and associate them to a global object-based map using a graph neural network (GNN)
arXiv Detail & Related papers (2021-08-23T13:28:10Z) - 3D-to-2D Distillation for Indoor Scene Parsing [78.36781565047656]
We present a new approach that enables us to leverage 3D features extracted from large-scale 3D data repository to enhance 2D features extracted from RGB images.
First, we distill 3D knowledge from a pretrained 3D network to supervise a 2D network to learn simulated 3D features from 2D features during the training.
Second, we design a two-stage dimension normalization scheme to calibrate the 2D and 3D features for better integration.
Third, we design a semantic-aware adversarial training model to extend our framework for training with unpaired 3D data.
arXiv Detail & Related papers (2021-04-06T02:22:24Z) - Learning to Predict the 3D Layout of a Scene [0.3867363075280544]
We propose a method that only uses a single RGB image, thus enabling applications in devices or vehicles that do not have LiDAR sensors.
We use the KITTI dataset for training, which consists of street traffic scenes with class labels, 2D bounding boxes and 3D annotations with seven degrees of freedom.
We achieve a mean average precision of 47.3% for moderately difficult data, measured at a 3D intersection over union threshold of 70%, as required by the official KITTI benchmark; outperforming previous state-of-the-art single RGB only methods by a large margin.
arXiv Detail & Related papers (2020-11-19T17:23:30Z) - An Overview Of 3D Object Detection [21.159668390764832]
We propose a framework that uses both RGB and point cloud data to perform multiclass object recognition.
We use the recently released nuScenes dataset---a large-scale dataset contains many data formats---to training and evaluate our proposed architecture.
arXiv Detail & Related papers (2020-10-29T14:04:50Z) - ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object
Detection [69.68263074432224]
We present a novel framework named ZoomNet for stereo imagery-based 3D detection.
The pipeline of ZoomNet begins with an ordinary 2D object detection model which is used to obtain pairs of left-right bounding boxes.
To further exploit the abundant texture cues in RGB images for more accurate disparity estimation, we introduce a conceptually straight-forward module -- adaptive zooming.
arXiv Detail & Related papers (2020-03-01T17:18:08Z) - DSGN: Deep Stereo Geometry Network for 3D Object Detection [79.16397166985706]
There is a large performance gap between image-based and LiDAR-based 3D object detectors.
Our method, called Deep Stereo Geometry Network (DSGN), significantly reduces this gap.
For the first time, we provide a simple and effective one-stage stereo-based 3D detection pipeline.
arXiv Detail & Related papers (2020-01-10T11:44:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.