Anyview: Generalizable Indoor 3D Object Detection with Variable Frames
- URL: http://arxiv.org/abs/2310.05346v1
- Date: Mon, 9 Oct 2023 02:15:45 GMT
- Title: Anyview: Generalizable Indoor 3D Object Detection with Variable Frames
- Authors: Zhenyu Wu, Xiuwei Xu, Ziwei Wang, Chong Xia, Linqing Zhao, Jiwen Lu
and Haibin Yan
- Abstract summary: We present a novel 3D detection framework named AnyView for our practical applications.
Our method achieves both great generalizability and high detection accuracy with a simple and clean architecture.
- Score: 63.51422844333147
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a novel network framework for indoor 3D object
detection to handle variable input frame numbers in practical scenarios.
Existing methods only consider fixed frames of input data for a single
detector, such as monocular RGB-D images or point clouds reconstructed from
dense multi-view RGB-D images. While in practical application scenes such as
robot navigation and manipulation, the raw input to the 3D detectors is the
RGB-D images with variable frame numbers instead of the reconstructed scene
point cloud. However, the previous approaches can only handle fixed frame input
data and have poor performance with variable frame input. In order to
facilitate 3D object detection methods suitable for practical tasks, we present
a novel 3D detection framework named AnyView for our practical applications,
which generalizes well across different numbers of input frames with a single
model. To be specific, we propose a geometric learner to mine the local
geometric features of each input RGB-D image frame and implement local-global
feature interaction through a designed spatial mixture module. Meanwhile, we
further utilize a dynamic token strategy to adaptively adjust the number of
extracted features for each frame, which ensures consistent global feature
density and further enhances the generalization after fusion. Extensive
experiments on the ScanNet dataset show our method achieves both great
generalizability and high detection accuracy with a simple and clean
architecture containing a similar amount of parameters with the baselines.
Related papers
- 3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features [70.50665869806188]
3DiffTection is a state-of-the-art method for 3D object detection from single images.
We fine-tune a diffusion model to perform novel view synthesis conditioned on a single image.
We further train the model on target data with detection supervision.
arXiv Detail & Related papers (2023-11-07T23:46:41Z) - Viewpoint Equivariance for Multi-View 3D Object Detection [35.4090127133834]
State-of-the-art methods focus on reasoning and decoding object bounding boxes from multi-view camera input.
We introduce VEDet, a novel 3D object detection framework that exploits 3D multi-view geometry.
arXiv Detail & Related papers (2023-03-25T19:56:41Z) - Bridged Transformer for Vision and Point Cloud 3D Object Detection [92.86856146086316]
Bridged Transformer (BrT) is an end-to-end architecture for 3D object detection.
BrT learns to identify 3D and 2D object bounding boxes from both points and image patches.
We experimentally show that BrT surpasses state-of-the-art methods on SUN RGB-D and ScanNetV2 datasets.
arXiv Detail & Related papers (2022-10-04T05:44:22Z) - CMR3D: Contextualized Multi-Stage Refinement for 3D Object Detection [57.44434974289945]
We propose Contextualized Multi-Stage Refinement for 3D Object Detection (CMR3D) framework.
Our framework takes a 3D scene as input and strives to explicitly integrate useful contextual information of the scene.
In addition to 3D object detection, we investigate the effectiveness of our framework for the problem of 3D object counting.
arXiv Detail & Related papers (2022-09-13T05:26:09Z) - Unifying Voxel-based Representation with Transformer for 3D Object
Detection [143.91910747605107]
We present a unified framework for multi-modality 3D object detection, named UVTR.
The proposed method aims to unify multi-modality representations in the voxel space for accurate and robust single- or cross-modality 3D detection.
UVTR achieves leading performance in the nuScenes test set with 69.7%, 55.1%, and 71.1% NDS for LiDAR, camera, and multi-modality inputs, respectively.
arXiv Detail & Related papers (2022-06-01T17:02:40Z) - ODAM: Object Detection, Association, and Mapping using Posed RGB Video [36.16010611723447]
We present ODAM, a system for 3D Object Detection, Association, and Mapping using posed RGB videos.
The proposed system relies on a deep learning front-end to detect 3D objects from a given RGB frame and associate them to a global object-based map using a graph neural network (GNN)
arXiv Detail & Related papers (2021-08-23T13:28:10Z) - ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View
General-Purpose 3D Object Detection [3.330229314824913]
ImVoxelNet is a novel fully convolutional method of 3D object detection based on monocular or multi-view RGB images.
ImVoxelNet successfully handles both indoor and outdoor scenes.
It surpasses existing RGB-based 3D object detection methods on the SUN RGB-D dataset.
arXiv Detail & Related papers (2021-06-02T14:20:24Z) - LCD -- Line Clustering and Description for Place Recognition [29.053923938306323]
We introduce a novel learning-based approach to place recognition, using RGB-D cameras and line clusters as visual and geometric features.
We present a neural network architecture based on the attention mechanism for frame-wise line clustering.
A similar neural network is used for the description of these clusters with a compact embedding of 128 floating point numbers.
arXiv Detail & Related papers (2020-10-21T09:52:47Z) - Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z) - Single-Shot 3D Detection of Vehicles from Monocular RGB Images via
Geometry Constrained Keypoints in Real-Time [6.82446891805815]
We propose a novel 3D single-shot object detection method for detecting vehicles in monocular RGB images.
Our approach lifts 2D detections to 3D space by predicting additional regression and classification parameters.
We test our approach on different datasets for autonomous driving and evaluate it using the challenging KITTI 3D Object Detection and the novel nuScenes Object Detection benchmarks.
arXiv Detail & Related papers (2020-06-23T15:10:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.