ARM3D: Attention-based relation module for indoor 3D object detection
- URL: http://arxiv.org/abs/2202.09715v1
- Date: Sun, 20 Feb 2022 02:43:42 GMT
- Title: ARM3D: Attention-based relation module for indoor 3D object detection
- Authors: Yuqing Lan, Yao Duan, Chenyi Liu, Chenyang Zhu, Yueshan Xiong, Hui
Huang, Kai Xu
- Abstract summary: We propose a novel 3D attention-based relation module (ARM3D)
It encompasses object-aware relation reasoning to extract pair-wise relation contexts among qualified proposals.
ARM3D can take full advantage of the useful relation context and filter those less relevant or even confusing contexts.
- Score: 18.58659759308696
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Relation context has been proved to be useful for many challenging vision
tasks. In the field of 3D object detection, previous methods have been taking
the advantage of context encoding, graph embedding, or explicit relation
reasoning to extract relation context. However, there exists inevitably
redundant relation context due to noisy or low-quality proposals. In fact,
invalid relation context usually indicates underlying scene misunderstanding
and ambiguity, which may, on the contrary, reduce the performance in complex
scenes. Inspired by recent attention mechanism like Transformer, we propose a
novel 3D attention-based relation module (ARM3D). It encompasses object-aware
relation reasoning to extract pair-wise relation contexts among qualified
proposals and an attention module to distribute attention weights towards
different relation contexts. In this way, ARM3D can take full advantage of the
useful relation context and filter those less relevant or even confusing
contexts, which mitigates the ambiguity in detection. We have evaluated the
effectiveness of ARM3D by plugging it into several state-of-the-art 3D object
detectors and showing more accurate and robust detection results. Extensive
experiments show the capability and generalization of ARM3D on 3D object
detection. Our source code is available at https://github.com/lanlan96/ARM3D.
Related papers
- 3DRP-Net: 3D Relative Position-aware Network for 3D Visual Grounding [58.924180772480504]
3D visual grounding aims to localize the target object in a 3D point cloud by a free-form language description.
We propose a relation-aware one-stage framework, named 3D Relative Position-aware Network (3-Net)
arXiv Detail & Related papers (2023-07-25T09:33:25Z) - Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly
Supervised 3D Visual Grounding [58.924180772480504]
3D visual grounding involves finding a target object in a 3D scene that corresponds to a given sentence query.
We propose to leverage weakly supervised annotations to learn the 3D visual grounding model.
We design a novel semantic matching model that analyzes the semantic similarity between object proposals and sentences in a coarse-to-fine manner.
arXiv Detail & Related papers (2023-07-18T13:49:49Z) - Attention-Based Depth Distillation with 3D-Aware Positional Encoding for
Monocular 3D Object Detection [10.84784828447741]
ADD is an Attention-based Depth knowledge Distillation framework with 3D-aware positional encoding.
Credit to our teacher design, our framework is seamless, domain-gap free, easily implementable, and is compatible with object-wise ground-truth depth.
We implement our framework on three representative monocular detectors, and we achieve state-of-the-art performance with no additional inference computational cost.
arXiv Detail & Related papers (2022-11-30T06:39:25Z) - Ret3D: Rethinking Object Relations for Efficient 3D Object Detection in
Driving Scenes [82.4186966781934]
We introduce a simple, efficient, and effective two-stage detector, termed as Ret3D.
At the core of Ret3D is the utilization of novel intra-frame and inter-frame relation modules.
With negligible extra overhead, Ret3D achieves the state-of-the-art performance.
arXiv Detail & Related papers (2022-08-18T03:48:58Z) - Homography Loss for Monocular 3D Object Detection [54.04870007473932]
A differentiable loss function, termed as Homography Loss, is proposed to achieve the goal, which exploits both 2D and 3D information.
Our method yields the best performance compared with the other state-of-the-arts by a large margin on KITTI 3D datasets.
arXiv Detail & Related papers (2022-04-02T03:48:03Z) - DisARM: Displacement Aware Relation Module for 3D Detection [38.4380420322491]
Displacement Aware Relation Module (DisARM) is a novel neural network module for enhancing the performance of 3D object detection in point cloud scenes.
To find the anchors, we first perform a preliminary relation anchor module with an objectness-aware sampling approach.
This lightweight relation module leads to significantly higher accuracy of object instance detection when being plugged into the state-of-the-art detectors.
arXiv Detail & Related papers (2022-03-02T14:49:55Z) - 3DRM:Pair-wise relation module for 3D object detection [17.757203529615815]
We argue that scene understanding benefits from object relation reasoning, which is capable of mitigating the ambiguity of 3D object detections.
We propose a novel 3D relation module (3DRM) which reasons about object relations at pair-wise levels.
The 3DRM predicts the semantic and spatial relationships between objects and extracts the object-wise relation features.
arXiv Detail & Related papers (2022-02-20T03:06:35Z) - Progressive Coordinate Transforms for Monocular 3D Object Detection [52.00071336733109]
We propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
In this paper, we propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
arXiv Detail & Related papers (2021-08-12T15:22:33Z) - TransRefer3D: Entity-and-Relation Aware Transformer for Fine-Grained 3D
Visual Grounding [15.617150859765024]
We exploit Transformer for its natural suitability on permutation-invariant 3D point clouds data.
We propose a TransRefer3D network to extract entity-and-relation aware multimodal context.
Our proposed model significantly outperforms existing approaches by up to 10.6%.
arXiv Detail & Related papers (2021-08-05T05:47:12Z) - SESS: Self-Ensembling Semi-Supervised 3D Object Detection [138.80825169240302]
We propose SESS, a self-ensembling semi-supervised 3D object detection framework. Specifically, we design a thorough perturbation scheme to enhance generalization of the network on unlabeled and new unseen data.
Our SESS achieves competitive performance compared to the state-of-the-art fully-supervised method by using only 50% labeled data.
arXiv Detail & Related papers (2019-12-26T08:48:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.