CLIP-Loc: Multi-modal Landmark Association for Global Localization in
Object-based Maps
- URL: http://arxiv.org/abs/2402.06092v1
- Date: Thu, 8 Feb 2024 22:59:12 GMT
- Title: CLIP-Loc: Multi-modal Landmark Association for Global Localization in
Object-based Maps
- Authors: Shigemichi Matsuzaki, Takuma Sugino, Kazuhito Tanaka, Zijun Sha,
Shintaro Nakaoka, Shintaro Yoshizawa, Kazuhiro Shintani
- Abstract summary: This paper describes a multi-modal data association method for global localization using object-based maps and camera images.
We propose labeling landmarks with natural language descriptions and extracting correspondences based on conceptual similarity with image observations.
- Score: 0.16492989697868893
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This paper describes a multi-modal data association method for global
localization using object-based maps and camera images. In global localization,
or relocalization, using object-based maps, existing methods typically resort
to matching all possible combinations of detected objects and landmarks with
the same object category, followed by inlier extraction using RANSAC or
brute-force search. This approach becomes infeasible as the number of landmarks
increases due to the exponential growth of correspondence candidates. In this
paper, we propose labeling landmarks with natural language descriptions and
extracting correspondences based on conceptual similarity with image
observations using a Vision Language Model (VLM). By leveraging detailed text
information, our approach efficiently extracts correspondences compared to
methods using only object categories. Through experiments, we demonstrate that
the proposed method enables more accurate global localization with fewer
iterations compared to baseline methods, exhibiting its efficiency.
Related papers
- CLIP-Clique: Graph-based Correspondence Matching Augmented by Vision Language Models for Object-based Global Localization [0.0]
One of the most promising approaches for localization on object maps is to use semantic graph matching.
To address the former issue, we augment the correspondence matching using Vision Language Models.
In addition, inliers are estimated deterministically using a graph-theoretic approach.
arXiv Detail & Related papers (2024-10-04T00:23:20Z) - Mapping High-level Semantic Regions in Indoor Environments without
Object Recognition [50.624970503498226]
The present work proposes a method for semantic region mapping via embodied navigation in indoor environments.
To enable region identification, the method uses a vision-to-language model to provide scene information for mapping.
By projecting egocentric scene understanding into the global frame, the proposed method generates a semantic map as a distribution over possible region labels at each location.
arXiv Detail & Related papers (2024-03-11T18:09:50Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - Single-Shot Global Localization via Graph-Theoretic Correspondence
Matching [16.956872056232633]
The proposed framework employs correspondence matching based on the maximum clique problem (MCP)
We implement it with a semantically labeled 3D point cloud map, and a semantic segmentation image as a query.
The method shows promising results on multiple large-scale simulated maps of urban scenes.
arXiv Detail & Related papers (2023-06-06T12:52:07Z) - An Object SLAM Framework for Association, Mapping, and High-Level Tasks [12.62957558651032]
We present a comprehensive object SLAM framework that focuses on object-based perception and object-oriented robot tasks.
A range of public datasets and real-world results have been used to evaluate the proposed object SLAM framework for its efficient performance.
arXiv Detail & Related papers (2023-05-12T08:10:14Z) - Spatial Likelihood Voting with Self-Knowledge Distillation for Weakly
Supervised Object Detection [54.24966006457756]
We propose a WSOD framework called the Spatial Likelihood Voting with Self-knowledge Distillation Network (SLV-SD Net)
SLV-SD Net converges region proposal localization without bounding box annotations.
Experiments on the PASCAL VOC 2007/2012 and MS-COCO datasets demonstrate the excellent performance of SLV-SD Net.
arXiv Detail & Related papers (2022-04-14T11:56:19Z) - Fusing Local Similarities for Retrieval-based 3D Orientation Estimation
of Unseen Objects [70.49392581592089]
We tackle the task of estimating the 3D orientation of previously-unseen objects from monocular images.
We follow a retrieval-based strategy and prevent the network from learning object-specific features.
Our experiments on the LineMOD, LineMOD-Occluded, and T-LESS datasets show that our method yields a significantly better generalization to unseen objects than previous works.
arXiv Detail & Related papers (2022-03-16T08:53:00Z) - Local Context Attention for Salient Object Segmentation [5.542044768017415]
We propose a novel Local Context Attention Network (LCANet) to generate locally reinforcement feature maps in a uniform representational architecture.
The proposed network introduces an Attentional Correlation Filter (ACF) module to generate explicit local attention by calculating the correlation feature map between coarse prediction and global context.
Comprehensive experiments are conducted on several salient object segmentation datasets, demonstrating the superior performance of the proposed LCANet against the state-of-the-art methods.
arXiv Detail & Related papers (2020-09-24T09:20:06Z) - Pairwise Similarity Knowledge Transfer for Weakly Supervised Object
Localization [53.99850033746663]
We study the problem of learning localization model on target classes with weakly supervised image labels.
In this work, we argue that learning only an objectness function is a weak form of knowledge transfer.
Experiments on the COCO and ILSVRC 2013 detection datasets show that the performance of the localization model improves significantly with the inclusion of pairwise similarity function.
arXiv Detail & Related papers (2020-03-18T17:53:33Z) - Weakly-supervised Object Localization for Few-shot Learning and
Fine-grained Few-shot Learning [0.5156484100374058]
Few-shot learning aims to learn novel visual categories from very few samples.
We propose a Self-Attention Based Complementary Module (SAC Module) to fulfill the weakly-supervised object localization.
We also produce the activated masks for selecting discriminative deep descriptors for few-shot classification.
arXiv Detail & Related papers (2020-03-02T14:07:05Z) - Universal-RCNN: Universal Object Detector via Transferable Graph R-CNN [117.80737222754306]
We present a novel universal object detector called Universal-RCNN.
We first generate a global semantic pool by integrating all high-level semantic representation of all the categories.
An Intra-Domain Reasoning Module learns and propagates the sparse graph representation within one dataset guided by a spatial-aware GCN.
arXiv Detail & Related papers (2020-02-18T07:57:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.