ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View
General-Purpose 3D Object Detection
- URL: http://arxiv.org/abs/2106.01178v1
- Date: Wed, 2 Jun 2021 14:20:24 GMT
- Title: ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View
General-Purpose 3D Object Detection
- Authors: Danila Rukhovich, Anna Vorontsova, Anton Konushin
- Abstract summary: ImVoxelNet is a novel fully convolutional method of 3D object detection based on monocular or multi-view RGB images.
ImVoxelNet successfully handles both indoor and outdoor scenes.
It surpasses existing RGB-based 3D object detection methods on the SUN RGB-D dataset.
- Score: 3.330229314824913
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we introduce the task of multi-view RGB-based 3D object
detection as an end-to-end optimization problem. To address this problem, we
propose ImVoxelNet, a novel fully convolutional method of 3D object detection
based on monocular or multi-view RGB images. The number of monocular images in
each multi-view input can variate during training and inference; actually, this
number might be unique for each multi-view input. ImVoxelNet successfully
handles both indoor and outdoor scenes, which makes it general-purpose.
Specifically, it achieves state-of-the-art results in car detection on KITTI
(monocular) and nuScenes (multi-view) benchmarks among all methods that accept
RGB images. Moreover, it surpasses existing RGB-based 3D object detection
methods on the SUN RGB-D dataset. On ScanNet, ImVoxelNet sets a new benchmark
for multi-view 3D object detection. The source code and the trained models are
available at \url{https://github.com/saic-vul/imvoxelnet}.
Related papers
- CN-RMA: Combined Network with Ray Marching Aggregation for 3D Indoors Object Detection from Multi-view Images [11.152821406076486]
CN-RMA is a novel approach for 3D indoor object detection from multi-view images.
Our method achieves state-of-the-art performance in 3D object detection from multi-view images.
arXiv Detail & Related papers (2024-03-07T03:59:47Z) - ConsistNet: Enforcing 3D Consistency for Multi-view Images Diffusion [61.37481051263816]
Given a single image of a 3D object, this paper proposes a method (named ConsistNet) that is able to generate multiple images of the same object.
Our method effectively learns 3D consistency over a frozen Zero123 backbone and can generate 16 surrounding views of the object within 40 seconds on a single A100 GPU.
arXiv Detail & Related papers (2023-10-16T12:29:29Z) - Anyview: Generalizable Indoor 3D Object Detection with Variable Frames [63.51422844333147]
We present a novel 3D detection framework named AnyView for our practical applications.
Our method achieves both great generalizability and high detection accuracy with a simple and clean architecture.
arXiv Detail & Related papers (2023-10-09T02:15:45Z) - ImGeoNet: Image-induced Geometry-aware Voxel Representation for
Multi-view 3D Object Detection [24.29296860815032]
ImGeoNet is an image-based 3D object detection framework that models a 3D space by an image-induced geometry-aware voxel representation.
We conduct experiments on three indoor datasets, namely ARKitScenes, ScanNetV2, and ScanNet200.
Our studies indicate that our proposed image-induced geometry-aware representation can enable image-based methods to attain superior detection accuracy.
arXiv Detail & Related papers (2023-08-17T16:49:38Z) - Bridged Transformer for Vision and Point Cloud 3D Object Detection [92.86856146086316]
Bridged Transformer (BrT) is an end-to-end architecture for 3D object detection.
BrT learns to identify 3D and 2D object bounding boxes from both points and image patches.
We experimentally show that BrT surpasses state-of-the-art methods on SUN RGB-D and ScanNetV2 datasets.
arXiv Detail & Related papers (2022-10-04T05:44:22Z) - A Simple Baseline for Multi-Camera 3D Object Detection [94.63944826540491]
3D object detection with surrounding cameras has been a promising direction for autonomous driving.
We present SimMOD, a Simple baseline for Multi-camera Object Detection.
We conduct extensive experiments on the 3D object detection benchmark of nuScenes to demonstrate the effectiveness of SimMOD.
arXiv Detail & Related papers (2022-08-22T03:38:01Z) - VPIT: Real-time Embedded Single Object 3D Tracking Using Voxel Pseudo Images [90.60881721134656]
We propose a novel voxel-based 3D single object tracking (3D SOT) method called Voxel Pseudo Image Tracking (VPIT)
Experiments on KITTI Tracking dataset show that VPIT is the fastest 3D SOT method and maintains competitive Success and Precision values.
arXiv Detail & Related papers (2022-06-06T14:02:06Z) - An Overview Of 3D Object Detection [21.159668390764832]
We propose a framework that uses both RGB and point cloud data to perform multiclass object recognition.
We use the recently released nuScenes dataset---a large-scale dataset contains many data formats---to training and evaluate our proposed architecture.
arXiv Detail & Related papers (2020-10-29T14:04:50Z) - Single-Shot 3D Detection of Vehicles from Monocular RGB Images via
Geometry Constrained Keypoints in Real-Time [6.82446891805815]
We propose a novel 3D single-shot object detection method for detecting vehicles in monocular RGB images.
Our approach lifts 2D detections to 3D space by predicting additional regression and classification parameters.
We test our approach on different datasets for autonomous driving and evaluate it using the challenging KITTI 3D Object Detection and the novel nuScenes Object Detection benchmarks.
arXiv Detail & Related papers (2020-06-23T15:10:19Z) - EPOS: Estimating 6D Pose of Objects with Symmetries [57.448933686429825]
We present a new method for estimating the 6D pose of rigid objects with available 3D models from a single RGB input.
An object is represented by compact surface fragments which allow symmetries in a systematic manner.
Correspondences between densely sampled pixels and the fragments are predicted using an encoder-decoder network.
arXiv Detail & Related papers (2020-04-01T17:41:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.