HyperDet3D: Learning a Scene-conditioned 3D Object Detector
- URL: http://arxiv.org/abs/2204.05599v1
- Date: Tue, 12 Apr 2022 07:57:58 GMT
- Title: HyperDet3D: Learning a Scene-conditioned 3D Object Detector
- Authors: Yu Zheng, Yueqi Duan, Jiwen Lu, Jie Zhou, Qi Tian
- Abstract summary: We propose HyperDet3D to explore scene-conditioned prior knowledge for 3D object detection.
Our HyperDet3D achieves state-of-the-art results on the 3D object detection benchmark of the ScanNet and SUN RGB-D datasets.
- Score: 154.84798451437032
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A bathtub in a library, a sink in an office, a bed in a laundry room -- the
counter-intuition suggests that scene provides important prior knowledge for 3D
object detection, which instructs to eliminate the ambiguous detection of
similar objects. In this paper, we propose HyperDet3D to explore
scene-conditioned prior knowledge for 3D object detection. Existing methods
strive for better representation of local elements and their relations without
scene-conditioned knowledge, which may cause ambiguity merely based on the
understanding of individual points and object candidates. Instead, HyperDet3D
simultaneously learns scene-agnostic embeddings and scene-specific knowledge
through scene-conditioned hypernetworks. More specifically, our HyperDet3D not
only explores the sharable abstracts from various 3D scenes, but also adapts
the detector to the given scene at test time. We propose a discriminative
Multi-head Scene-specific Attention (MSA) module to dynamically control the
layer parameters of the detector conditioned on the fusion of scene-conditioned
knowledge. Our HyperDet3D achieves state-of-the-art results on the 3D object
detection benchmark of the ScanNet and SUN RGB-D datasets. Moreover, through
cross-dataset evaluation, we show the acquired scene-conditioned prior
knowledge still takes effect when facing 3D scenes with domain gap.
Related papers
- Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers [65.51132104404051]
We introduce the use of object identifiers and object-centric representations to interact with scenes at the object level.
Our model significantly outperforms existing methods on benchmarks including ScanRefer, Multi3DRefer, Scan2Cap, ScanQA, and SQA3D.
arXiv Detail & Related papers (2023-12-13T14:27:45Z) - Object2Scene: Putting Objects in Context for Open-Vocabulary 3D
Detection [24.871590175483096]
Point cloud-based open-vocabulary 3D object detection aims to detect 3D categories that do not have ground-truth annotations in the training set.
Previous approaches leverage large-scale richly-annotated image datasets as a bridge between 3D and category semantics.
We propose Object2Scene, the first approach that leverages large-scale large-vocabulary 3D object datasets to augment existing 3D scene datasets for open-vocabulary 3D object detection.
arXiv Detail & Related papers (2023-09-18T03:31:53Z) - Four Ways to Improve Verbo-visual Fusion for Dense 3D Visual Grounding [56.00186960144545]
3D visual grounding is the task of localizing the object in a 3D scene which is referred by a description in natural language.
We propose a dense 3D grounding network, featuring four novel stand-alone modules that aim to improve grounding performance.
arXiv Detail & Related papers (2023-09-08T19:27:01Z) - Surface-biased Multi-Level Context 3D Object Detection [1.9723551683930771]
This work addresses the object detection task in 3D point clouds using a highly efficient, surface-biased, feature extraction method (wang2022rbgnet)
We propose a 3D object detector that extracts accurate feature representations of object candidates and leverages self-attention on point patches, object candidates, and on the global scene in 3D scene.
arXiv Detail & Related papers (2023-02-13T11:50:04Z) - CMR3D: Contextualized Multi-Stage Refinement for 3D Object Detection [57.44434974289945]
We propose Contextualized Multi-Stage Refinement for 3D Object Detection (CMR3D) framework.
Our framework takes a 3D scene as input and strives to explicitly integrate useful contextual information of the scene.
In addition to 3D object detection, we investigate the effectiveness of our framework for the problem of 3D object counting.
arXiv Detail & Related papers (2022-09-13T05:26:09Z) - Point2Seq: Detecting 3D Objects as Sequences [58.63662049729309]
We present a simple and effective framework, named Point2Seq, for 3D object detection from point clouds.
We view each 3D object as a sequence of words and reformulate the 3D object detection task as decoding words from 3D scenes in an auto-regressive manner.
arXiv Detail & Related papers (2022-03-25T00:20:31Z) - Weakly Supervised 3D Object Detection from Point Clouds [27.70180601788613]
3D object detection aims to detect and localize the 3D bounding boxes of objects belonging to specific classes.
Existing 3D object detectors rely on annotated 3D bounding boxes during training, while these annotations could be expensive to obtain and only accessible in limited scenarios.
We propose VS3D, a framework for weakly supervised 3D object detection from point clouds without using any ground truth 3D bounding box for training.
arXiv Detail & Related papers (2020-07-28T03:30:11Z) - DSGN: Deep Stereo Geometry Network for 3D Object Detection [79.16397166985706]
There is a large performance gap between image-based and LiDAR-based 3D object detectors.
Our method, called Deep Stereo Geometry Network (DSGN), significantly reduces this gap.
For the first time, we provide a simple and effective one-stage stereo-based 3D detection pipeline.
arXiv Detail & Related papers (2020-01-10T11:44:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.