Bridged Transformer for Vision and Point Cloud 3D Object Detection
- URL: http://arxiv.org/abs/2210.01391v1
- Date: Tue, 4 Oct 2022 05:44:22 GMT
- Title: Bridged Transformer for Vision and Point Cloud 3D Object Detection
- Authors: Yikai Wang, TengQi Ye, Lele Cao, Wenbing Huang, Fuchun Sun, Fengxiang
He, Dacheng Tao
- Abstract summary: Bridged Transformer (BrT) is an end-to-end architecture for 3D object detection.
BrT learns to identify 3D and 2D object bounding boxes from both points and image patches.
We experimentally show that BrT surpasses state-of-the-art methods on SUN RGB-D and ScanNetV2 datasets.
- Score: 92.86856146086316
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D object detection is a crucial research topic in computer vision, which
usually uses 3D point clouds as input in conventional setups. Recently, there
is a trend of leveraging multiple sources of input data, such as complementing
the 3D point cloud with 2D images that often have richer color and fewer
noises. However, due to the heterogeneous geometrics of the 2D and 3D
representations, it prevents us from applying off-the-shelf neural networks to
achieve multimodal fusion. To that end, we propose Bridged Transformer (BrT),
an end-to-end architecture for 3D object detection. BrT is simple and
effective, which learns to identify 3D and 2D object bounding boxes from both
points and image patches. A key element of BrT lies in the utilization of
object queries for bridging 3D and 2D spaces, which unifies different sources
of data representations in Transformer. We adopt a form of feature aggregation
realized by point-to-patch projections which further strengthen the
correlations between images and points. Moreover, BrT works seamlessly for
fusing the point cloud with multi-view images. We experimentally show that BrT
surpasses state-of-the-art methods on SUN RGB-D and ScanNetV2 datasets.
Related papers
- PointAD: Comprehending 3D Anomalies from Points and Pixels for Zero-shot 3D Anomaly Detection [13.60524473223155]
This paper introduces PointAD, a novel approach that transfers the strong generalization capabilities of CLIP for recognizing 3D anomalies on unseen objects.
PointAD renders 3D anomalies into multiple 2D renderings and projects them back into 3D space.
Our model can directly integrate RGB information, further enhancing the understanding of 3D anomalies in a plug-and-play manner.
arXiv Detail & Related papers (2024-10-01T01:40:22Z) - CN-RMA: Combined Network with Ray Marching Aggregation for 3D Indoors Object Detection from Multi-view Images [11.152821406076486]
CN-RMA is a novel approach for 3D indoor object detection from multi-view images.
Our method achieves state-of-the-art performance in 3D object detection from multi-view images.
arXiv Detail & Related papers (2024-03-07T03:59:47Z) - 3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features [70.50665869806188]
3DiffTection is a state-of-the-art method for 3D object detection from single images.
We fine-tune a diffusion model to perform novel view synthesis conditioned on a single image.
We further train the model on target data with detection supervision.
arXiv Detail & Related papers (2023-11-07T23:46:41Z) - PointMCD: Boosting Deep Point Cloud Encoders via Multi-view Cross-modal
Distillation for 3D Shape Recognition [55.38462937452363]
We propose a unified multi-view cross-modal distillation architecture, including a pretrained deep image encoder as the teacher and a deep point encoder as the student.
By pair-wise aligning multi-view visual and geometric descriptors, we can obtain more powerful deep point encoders without exhausting and complicated network modification.
arXiv Detail & Related papers (2022-07-07T07:23:20Z) - VPFNet: Improving 3D Object Detection with Virtual Point based LiDAR and
Stereo Data Fusion [62.24001258298076]
VPFNet is a new architecture that cleverly aligns and aggregates the point cloud and image data at the virtual' points.
Our VPFNet achieves 83.21% moderate 3D AP and 91.86% moderate BEV AP on the KITTI test set, ranking the 1st since May 21th, 2021.
arXiv Detail & Related papers (2021-11-29T08:51:20Z) - An Overview Of 3D Object Detection [21.159668390764832]
We propose a framework that uses both RGB and point cloud data to perform multiclass object recognition.
We use the recently released nuScenes dataset---a large-scale dataset contains many data formats---to training and evaluate our proposed architecture.
arXiv Detail & Related papers (2020-10-29T14:04:50Z) - Cross-Modality 3D Object Detection [63.29935886648709]
We present a novel two-stage multi-modal fusion network for 3D object detection.
The whole architecture facilitates two-stage fusion.
Our experiments on the KITTI dataset show that the proposed multi-stage fusion helps the network to learn better representations.
arXiv Detail & Related papers (2020-08-16T11:01:20Z) - ImVoteNet: Boosting 3D Object Detection in Point Clouds with Image Votes [93.82668222075128]
We propose a 3D detection architecture called ImVoteNet for RGB-D scenes.
ImVoteNet is based on fusing 2D votes in images and 3D votes in point clouds.
We validate our model on the challenging SUN RGB-D dataset.
arXiv Detail & Related papers (2020-01-29T05:09:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.