UniLoc: Towards Universal Place Recognition Using Any Single Modality
- URL: http://arxiv.org/abs/2412.12079v1
- Date: Mon, 16 Dec 2024 18:48:58 GMT
- Title: UniLoc: Towards Universal Place Recognition Using Any Single Modality
- Authors: Yan Xia, Zhendong Li, Yun-Jin Li, Letian Shi, Hu Cao, João F. Henriques, Daniel Cremers,
- Abstract summary: We develop a universal solution to place recognition, UniLoc, that works with any single query modality.
UniLoc learns by matching hierarchically at two levels: instance-level matching and scene-level matching.
Experiments on the KITTI-360 dataset demonstrate the benefits of cross-modality for place recognition.
- Score: 46.056160460726396
- License:
- Abstract: To date, most place recognition methods focus on single-modality retrieval. While they perform well in specific environments, cross-modal methods offer greater flexibility by allowing seamless switching between map and query sources. It also promises to reduce computation requirements by having a unified model, and achieving greater sample efficiency by sharing parameters. In this work, we develop a universal solution to place recognition, UniLoc, that works with any single query modality (natural language, image, or point cloud). UniLoc leverages recent advances in large-scale contrastive learning, and learns by matching hierarchically at two levels: instance-level matching and scene-level matching. Specifically, we propose a novel Self-Attention based Pooling (SAP) module to evaluate the importance of instance descriptors when aggregated into a place-level descriptor. Experiments on the KITTI-360 dataset demonstrate the benefits of cross-modality for place recognition, achieving superior performance in cross-modal settings and competitive results also for uni-modal scenarios. Our project page is publicly available at https://yan-xia.github.io/projects/UniLoc/.
Related papers
- YOLO-UniOW: Efficient Universal Open-World Object Detection [63.71512991320627]
We introduce Universal Open-World Object Detection (Uni-OWD), a new paradigm that unifies open-vocabulary and open-world object detection tasks.
YOLO-UniOW incorporates Adaptive Decision Learning to replace computationally expensive cross-modality fusion with lightweight alignment in the CLIP latent space.
Experiments validate the superiority of YOLO-UniOW, achieving 34.6 AP and 30.0 APr with an inference speed of 69.6 FPS.
arXiv Detail & Related papers (2024-12-30T01:34:14Z) - CLIP-Loc: Multi-modal Landmark Association for Global Localization in
Object-based Maps [0.16492989697868893]
This paper describes a multi-modal data association method for global localization using object-based maps and camera images.
We propose labeling landmarks with natural language descriptions and extracting correspondences based on conceptual similarity with image observations.
arXiv Detail & Related papers (2024-02-08T22:59:12Z) - SQLNet: Scale-Modulated Query and Localization Network for Few-Shot
Class-Agnostic Counting [71.38754976584009]
The class-agnostic counting (CAC) task has recently been proposed to solve the problem of counting all objects of an arbitrary class with several exemplars given in the input image.
We propose a novel localization-based CAC approach, termed Scale-modulated Query and Localization Network (Net)
It fully explores the scales of exemplars in both the query and localization stages and achieves effective counting by accurately locating each object and predicting its approximate size.
arXiv Detail & Related papers (2023-11-16T16:50:56Z) - Re-thinking Federated Active Learning based on Inter-class Diversity [16.153683223016973]
We show that the superiority of two selector models depends on the global and local inter-class diversity.
We propose LoGo, a FAL sampling strategy robust to varying local heterogeneity levels and global imbalance ratio.
LoGo consistently outperforms six active learning strategies in the total number of 38 experimental settings.
arXiv Detail & Related papers (2023-03-22T05:21:21Z) - Prototype-Based Layered Federated Cross-Modal Hashing [14.844848099134648]
In this paper, we propose a novel method called prototype-based layered federated cross-modal hashing.
Specifically, the prototype is introduced to learn the similarity between instances and classes on server.
To realize personalized federated learning, a hypernetwork is deployed on server to dynamically update different layers' weights of local model.
arXiv Detail & Related papers (2022-10-27T15:11:12Z) - Learning to Affiliate: Mutual Centralized Learning for Few-shot
Classification [33.19451499073551]
Few-shot learning aims to learn a classifier that can be easily adapted to accommodate new tasks not seen during training.
Recent methods tend to collectively use a set of local features to densely represent an image instead of using a mixed global feature.
arXiv Detail & Related papers (2021-06-10T06:16:00Z) - Multi-Center Federated Learning [62.57229809407692]
This paper proposes a novel multi-center aggregation mechanism for federated learning.
It learns multiple global models from the non-IID user data and simultaneously derives the optimal matching between users and centers.
Our experimental results on benchmark datasets show that our method outperforms several popular federated learning methods.
arXiv Detail & Related papers (2020-05-03T09:14:31Z) - Pairwise Similarity Knowledge Transfer for Weakly Supervised Object
Localization [53.99850033746663]
We study the problem of learning localization model on target classes with weakly supervised image labels.
In this work, we argue that learning only an objectness function is a weak form of knowledge transfer.
Experiments on the COCO and ILSVRC 2013 detection datasets show that the performance of the localization model improves significantly with the inclusion of pairwise similarity function.
arXiv Detail & Related papers (2020-03-18T17:53:33Z) - Improving Few-shot Learning by Spatially-aware Matching and
CrossTransformer [116.46533207849619]
We study the impact of scale and location mismatch in the few-shot learning scenario.
We propose a novel Spatially-aware Matching scheme to effectively perform matching across multiple scales and locations.
arXiv Detail & Related papers (2020-01-06T14:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.