3DRM:Pair-wise relation module for 3D object detection
- URL: http://arxiv.org/abs/2202.09721v1
- Date: Sun, 20 Feb 2022 03:06:35 GMT
- Title: 3DRM:Pair-wise relation module for 3D object detection
- Authors: Yuqing Lan, Yao Duan, Yifei Shi, Hui Huang, Kai Xu
- Abstract summary: We argue that scene understanding benefits from object relation reasoning, which is capable of mitigating the ambiguity of 3D object detections.
We propose a novel 3D relation module (3DRM) which reasons about object relations at pair-wise levels.
The 3DRM predicts the semantic and spatial relationships between objects and extracts the object-wise relation features.
- Score: 17.757203529615815
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Context has proven to be one of the most important factors in object layout
reasoning for 3D scene understanding. Existing deep contextual models either
learn holistic features for context encoding or rely on pre-defined scene
templates for context modeling. We argue that scene understanding benefits from
object relation reasoning, which is capable of mitigating the ambiguity of 3D
object detections and thus helps locate and classify the 3D objects more
accurately and robustly. To achieve this, we propose a novel 3D relation module
(3DRM) which reasons about object relations at pair-wise levels. The 3DRM
predicts the semantic and spatial relationships between objects and extracts
the object-wise relation features. We demonstrate the effects of 3DRM by
plugging it into proposal-based and voting-based 3D object detection pipelines,
respectively. Extensive evaluations show the effectiveness and generalization
of 3DRM on 3D object detection. Our source code is available at
https://github.com/lanlan96/3DRM.
Related papers
- Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model [108.35777542298224]
This paper introduces Reason3D, a novel large language model for comprehensive 3D understanding.
We propose a hierarchical mask decoder to locate small objects within expansive scenes.
Experiments validate that Reason3D achieves remarkable results on large-scale ScanNet and Matterport3D datasets.
arXiv Detail & Related papers (2024-05-27T17:59:41Z) - PARIS3D: Reasoning-based 3D Part Segmentation Using Large Multimodal Model [19.333506797686695]
We introduce a novel segmentation task known as reasoning part segmentation for 3D objects.
We output a segmentation mask based on complex and implicit textual queries about specific parts of a 3D object.
We propose a model that is capable of segmenting parts of 3D objects based on implicit textual queries and generating natural language explanations.
arXiv Detail & Related papers (2024-04-04T23:38:45Z) - Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers [65.51132104404051]
We introduce the use of object identifiers and object-centric representations to interact with scenes at the object level.
Our model significantly outperforms existing methods on benchmarks including ScanRefer, Multi3DRefer, Scan2Cap, ScanQA, and SQA3D.
arXiv Detail & Related papers (2023-12-13T14:27:45Z) - 3DRP-Net: 3D Relative Position-aware Network for 3D Visual Grounding [58.924180772480504]
3D visual grounding aims to localize the target object in a 3D point cloud by a free-form language description.
We propose a relation-aware one-stage framework, named 3D Relative Position-aware Network (3-Net)
arXiv Detail & Related papers (2023-07-25T09:33:25Z) - Tracking Objects with 3D Representation from Videos [57.641129788552675]
We propose a new 2D Multiple Object Tracking paradigm, called P3DTrack.
With 3D object representation learning from Pseudo 3D object labels in monocular videos, we propose a new 2D MOT paradigm, called P3DTrack.
arXiv Detail & Related papers (2023-06-08T17:58:45Z) - Attention-Based Depth Distillation with 3D-Aware Positional Encoding for
Monocular 3D Object Detection [10.84784828447741]
ADD is an Attention-based Depth knowledge Distillation framework with 3D-aware positional encoding.
Credit to our teacher design, our framework is seamless, domain-gap free, easily implementable, and is compatible with object-wise ground-truth depth.
We implement our framework on three representative monocular detectors, and we achieve state-of-the-art performance with no additional inference computational cost.
arXiv Detail & Related papers (2022-11-30T06:39:25Z) - CMR3D: Contextualized Multi-Stage Refinement for 3D Object Detection [57.44434974289945]
We propose Contextualized Multi-Stage Refinement for 3D Object Detection (CMR3D) framework.
Our framework takes a 3D scene as input and strives to explicitly integrate useful contextual information of the scene.
In addition to 3D object detection, we investigate the effectiveness of our framework for the problem of 3D object counting.
arXiv Detail & Related papers (2022-09-13T05:26:09Z) - Point2Seq: Detecting 3D Objects as Sequences [58.63662049729309]
We present a simple and effective framework, named Point2Seq, for 3D object detection from point clouds.
We view each 3D object as a sequence of words and reformulate the 3D object detection task as decoding words from 3D scenes in an auto-regressive manner.
arXiv Detail & Related papers (2022-03-25T00:20:31Z) - ARM3D: Attention-based relation module for indoor 3D object detection [18.58659759308696]
We propose a novel 3D attention-based relation module (ARM3D)
It encompasses object-aware relation reasoning to extract pair-wise relation contexts among qualified proposals.
ARM3D can take full advantage of the useful relation context and filter those less relevant or even confusing contexts.
arXiv Detail & Related papers (2022-02-20T02:43:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.