Text to Point Cloud Localization with Relation-Enhanced Transformer
- URL: http://arxiv.org/abs/2301.05372v1
- Date: Fri, 13 Jan 2023 02:58:49 GMT
- Title: Text to Point Cloud Localization with Relation-Enhanced Transformer
- Authors: Guangzhi Wang, Hehe Fan, Mohan Kankanhalli
- Abstract summary: We focus on the text-to-point-cloud cross-modal localization problem.
It aims to identify the described location from city-scale point clouds.
We propose a unified Relation-Enhanced Transformer (RET) to improve representation discriminability.
- Score: 14.635206837740231
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automatically localizing a position based on a few natural language
instructions is essential for future robots to communicate and collaborate with
humans. To approach this goal, we focus on the text-to-point-cloud cross-modal
localization problem. Given a textual query, it aims to identify the described
location from city-scale point clouds. The task involves two challenges. 1) In
city-scale point clouds, similar ambient instances may exist in several
locations. Searching each location in a huge point cloud with only instances as
guidance may lead to less discriminative signals and incorrect results. 2) In
textual descriptions, the hints are provided separately. In this case, the
relations among those hints are not explicitly described, leading to
difficulties of learning relations. To overcome these two challenges, we
propose a unified Relation-Enhanced Transformer (RET) to improve representation
discriminability for both point cloud and natural language queries. The core of
the proposed RET is a novel Relation-enhanced Self-Attention (RSA) mechanism,
which explicitly encodes instance (hint)-wise relations for the two modalities.
Moreover, we propose a fine-grained cross-modal matching method to further
refine the location predictions in a subsequent instance-hint matching stage.
Experimental results on the KITTI360Pose dataset demonstrate that our approach
surpasses the previous state-of-the-art method by large margin.
Related papers
- AddressCLIP: Empowering Vision-Language Models for City-wide Image Address Localization [57.34659640776723]
We propose an end-to-end framework named AddressCLIP to solve the problem with more semantics.
We have built three datasets from Pittsburgh and San Francisco on different scales specifically for the IAL problem.
arXiv Detail & Related papers (2024-07-11T03:18:53Z) - Spatial Semantic Recurrent Mining for Referring Image Segmentation [63.34997546393106]
We propose Stextsuperscript2RM to achieve high-quality cross-modality fusion.
It follows a working strategy of trilogy: distributing language feature, spatial semantic recurrent coparsing, and parsed-semantic balancing.
Our proposed method performs favorably against other state-of-the-art algorithms.
arXiv Detail & Related papers (2024-05-15T00:17:48Z) - Instance-free Text to Point Cloud Localization with Relative Position Awareness [37.22900045434484]
Text-to-point-cloud cross-modal localization is an emerging vision-language task critical for future robot-human collaboration.
We address two key limitations of existing approaches: 1) their reliance on ground-truth instances as input; and 2) their neglect of the relative positions among potential instances.
Our proposed model follows a two-stage pipeline, including a coarse stage for text-cell retrieval and a fine stage for position estimation.
arXiv Detail & Related papers (2024-04-27T09:46:49Z) - Coupled Laplacian Eigenmaps for Locally-Aware 3D Rigid Point Cloud Matching [0.0]
We propose a new technique, based on graph Laplacian eigenmaps, to match point clouds by taking into account fine local structures.
To deal with the order and sign ambiguity of Laplacian eigenmaps, we introduce a new operator, called Coupled Laplacian.
We show that the similarity between those aligned high-dimensional spaces provides a locally meaningful score to match shapes.
arXiv Detail & Related papers (2024-02-27T10:10:12Z) - Text2Loc: 3D Point Cloud Localization from Natural Language [49.01851743372889]
We tackle the problem of 3D point cloud localization based on a few natural linguistic descriptions.
We introduce a novel neural network, Text2Loc, that fully interprets the semantic relationship between points and text.
Text2Loc improves the localization accuracy by up to $2times$ over the state-of-the-art on the KITTI360Pose dataset.
arXiv Detail & Related papers (2023-11-27T16:23:01Z) - Collect-and-Distribute Transformer for 3D Point Cloud Analysis [82.03517861433849]
We propose a new transformer network equipped with a collect-and-distribute mechanism to communicate short- and long-range contexts of point clouds.
Results show the effectiveness of the proposed CDFormer, delivering several new state-of-the-art performances on point cloud classification and segmentation tasks.
arXiv Detail & Related papers (2023-06-02T03:48:45Z) - SIRI: Spatial Relation Induced Network For Spatial Description
Resolution [64.38872296406211]
We propose a novel relationship induced (SIRI) network for language-guided localization.
We show that our method is around 24% better than the state-of-the-art method in terms of accuracy, measured by an 80-pixel radius.
Our method also generalizes well on our proposed extended dataset collected using the same settings as Touchdown.
arXiv Detail & Related papers (2020-10-27T14:04:05Z) - Semantic Graph Based Place Recognition for 3D Point Clouds [22.608115489674653]
This paper presents a novel semantic graph based approach for place recognition.
First, we propose a novel semantic graph representation for the point cloud scenes.
We then design a fast and effective graph similarity network to compute the similarity.
arXiv Detail & Related papers (2020-08-26T09:27:26Z) - Rethinking Positional Encoding in Language Pre-training [111.2320727291926]
We show that in absolute positional encoding, the addition operation applied on positional embeddings and word embeddings brings mixed correlations.
We propose a new positional encoding method called textbfTransformer with textbfUntied textPositional textbfEncoding (T)
arXiv Detail & Related papers (2020-06-28T13:11:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.