3DRP-Net: 3D Relative Position-aware Network for 3D Visual Grounding
- URL: http://arxiv.org/abs/2307.13363v1
- Date: Tue, 25 Jul 2023 09:33:25 GMT
- Title: 3DRP-Net: 3D Relative Position-aware Network for 3D Visual Grounding
- Authors: Zehan Wang, Haifeng Huang, Yang Zhao, Linjun Li, Xize Cheng, Yichen
Zhu, Aoxiong Yin, Zhou Zhao
- Abstract summary: 3D visual grounding aims to localize the target object in a 3D point cloud by a free-form language description.
We propose a relation-aware one-stage framework, named 3D Relative Position-aware Network (3-Net)
- Score: 58.924180772480504
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: 3D visual grounding aims to localize the target object in a 3D point cloud by
a free-form language description. Typically, the sentences describing the
target object tend to provide information about its relative relation between
other objects and its position within the whole scene. In this work, we propose
a relation-aware one-stage framework, named 3D Relative Position-aware Network
(3DRP-Net), which can effectively capture the relative spatial relationships
between objects and enhance object attributes. Specifically, 1) we propose a 3D
Relative Position Multi-head Attention (3DRP-MA) module to analyze relative
relations from different directions in the context of object pairs, which helps
the model to focus on the specific object relations mentioned in the sentence.
2) We designed a soft-labeling strategy to alleviate the spatial ambiguity
caused by redundant points, which further stabilizes and enhances the learning
process through a constant and discriminative distribution. Extensive
experiments conducted on three benchmarks (i.e., ScanRefer and Nr3D/Sr3D)
demonstrate that our method outperforms all the state-of-the-art methods in
general. The source code will be released on GitHub.
Related papers
- R2G: Reasoning to Ground in 3D Scenes [22.917172452931844]
Reasoning to Ground (R2G) is a neural symbolic model that grounds the target objects within 3D scenes in a reasoning manner.
R2G explicitly models the 3D scene with a semantic concept-based scene graph; recurrently simulates the attention transferring across object entities.
Experiments on Sr3D/Nr3D benchmarks show that R2G achieves a comparable result with the prior works while maintaining improved interpretability.
arXiv Detail & Related papers (2024-08-24T06:52:14Z) - Generating Visual Spatial Description via Holistic 3D Scene
Understanding [88.99773815159345]
Visual spatial description (VSD) aims to generate texts that describe the spatial relations of the given objects within images.
With an external 3D scene extractor, we obtain the 3D objects and scene features for input images.
We construct a target object-centered 3D spatial scene graph (Go3D-S2G), such that we model the spatial semantics of target objects within the holistic 3D scenes.
arXiv Detail & Related papers (2023-05-19T15:53:56Z) - Grounding 3D Object Affordance from 2D Interactions in Images [128.6316708679246]
Grounding 3D object affordance seeks to locate objects' ''action possibilities'' regions in the 3D space.
Humans possess the ability to perceive object affordances in the physical world through demonstration images or videos.
We devise an Interaction-driven 3D Affordance Grounding Network (IAG), which aligns the region feature of objects from different sources.
arXiv Detail & Related papers (2023-03-18T15:37:35Z) - Language Conditioned Spatial Relation Reasoning for 3D Object Grounding [87.03299519917019]
Localizing objects in 3D scenes based on natural language requires understanding and reasoning about spatial relations.
We propose a language-conditioned transformer model for grounding 3D objects and their spatial relations.
arXiv Detail & Related papers (2022-11-17T16:42:39Z) - CMR3D: Contextualized Multi-Stage Refinement for 3D Object Detection [57.44434974289945]
We propose Contextualized Multi-Stage Refinement for 3D Object Detection (CMR3D) framework.
Our framework takes a 3D scene as input and strives to explicitly integrate useful contextual information of the scene.
In addition to 3D object detection, we investigate the effectiveness of our framework for the problem of 3D object counting.
arXiv Detail & Related papers (2022-09-13T05:26:09Z) - Point2Seq: Detecting 3D Objects as Sequences [58.63662049729309]
We present a simple and effective framework, named Point2Seq, for 3D object detection from point clouds.
We view each 3D object as a sequence of words and reformulate the 3D object detection task as decoding words from 3D scenes in an auto-regressive manner.
arXiv Detail & Related papers (2022-03-25T00:20:31Z) - 3DRM:Pair-wise relation module for 3D object detection [17.757203529615815]
We argue that scene understanding benefits from object relation reasoning, which is capable of mitigating the ambiguity of 3D object detections.
We propose a novel 3D relation module (3DRM) which reasons about object relations at pair-wise levels.
The 3DRM predicts the semantic and spatial relationships between objects and extracts the object-wise relation features.
arXiv Detail & Related papers (2022-02-20T03:06:35Z) - OCM3D: Object-Centric Monocular 3D Object Detection [35.804542148335706]
We propose a novel object-centric voxel representation tailored for monocular 3D object detection.
Specifically, voxels are built on each object proposal, and their sizes are adaptively determined by the 3D spatial distribution of the points.
Our method outperforms state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2021-04-13T09:15:40Z) - Object as Hotspots: An Anchor-Free 3D Object Detection Approach via
Firing of Hotspots [37.16690737208046]
We argue for an approach opposite to existing methods using object-level anchors.
Inspired by compositional models, we propose an object as composition of its interior non-empty voxels, termed hotspots.
Based on OHS, we propose an anchor-free detection head with a novel ground truth assignment strategy.
arXiv Detail & Related papers (2019-12-30T03:02:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.