A Real-Time Fusion Framework for Long-term Visual Localization
- URL: http://arxiv.org/abs/2210.09757v1
- Date: Tue, 18 Oct 2022 11:13:20 GMT
- Title: A Real-Time Fusion Framework for Long-term Visual Localization
- Authors: Yuchen Yang, Xudong Zhang, Shuang Gao, Jixiang Wan, Yishan Ping, Yuyue
Liu, Jijunnan Li, Yandong Guo
- Abstract summary: We present an efficient client-server visual localization architecture that fuses global and local pose estimations.
We conduct the evaluations on two typical open-source benchmarks, 4Seasons and OpenLORIS.
- Score: 7.606637557727453
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Visual localization is a fundamental task that regresses the 6 Degree Of
Freedom (6DoF) poses with image features in order to serve the high precision
localization requests in many robotics applications. Degenerate conditions like
motion blur, illumination changes and environment variations place great
challenges in this task. Fusion with additional information, such as sequential
information and Inertial Measurement Unit (IMU) inputs, would greatly assist
such problems. In this paper, we present an efficient client-server visual
localization architecture that fuses global and local pose estimations to
realize promising precision and efficiency. We include additional geometry
hints in mapping and global pose regressing modules to improve the measurement
quality. A loosely coupled fusion policy is adopted to leverage the computation
complexity and accuracy. We conduct the evaluations on two typical open-source
benchmarks, 4Seasons and OpenLORIS. Quantitative results prove that our
framework has competitive performance with respect to other state-of-the-art
visual localization solutions.
Related papers
- Holistic Fusion: Task- and Setup-Agnostic Robot Localization and State Estimation with Factor Graphs [19.359422457413576]
This work introduces a flexible open-source solution for task- and setup-agnostic multimodal sensor fusion.
Holistic Fusion (HF) formulates sensor fusion as a combined estimation problem of i) the local and global robot state and ii) a (theoretically unlimited) number of dynamic context variables.
HF enables low-latency and smooth online state estimation on typical robot hardware while simultaneously providing low-drift global localization at the IMU measurement rate.
arXiv Detail & Related papers (2025-04-08T22:54:52Z) - GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks [84.86699025256705]
We present GEOBench-VLM, a benchmark specifically designed to evaluate Vision-Language Models (VLMs) on geospatial tasks.
Our benchmark features over 10,000 manually verified instructions and spanning diverse visual conditions, object types, and scales.
We evaluate several state-of-the-art VLMs to assess performance on geospatial-specific challenges.
arXiv Detail & Related papers (2024-11-28T18:59:56Z) - Swarm Intelligence in Geo-Localization: A Multi-Agent Large Vision-Language Model Collaborative Framework [51.26566634946208]
We introduce smileGeo, a novel visual geo-localization framework.
By inter-agent communication, smileGeo integrates the inherent knowledge of these agents with additional retrieved information.
Results show that our approach significantly outperforms current state-of-the-art methods.
arXiv Detail & Related papers (2024-08-21T03:31:30Z) - HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models [96.76995840807615]
HiRes-LLaVA is a novel framework designed to process any size of high-resolution input without altering the original contextual and geometric information.
HiRes-LLaVA comprises two innovative components: (i) a SliceRestore adapter that reconstructs sliced patches into their original form, efficiently extracting both global and local features via down-up-sampling and convolution layers, and (ii) a Self-Mining Sampler to compress the vision tokens based on themselves.
arXiv Detail & Related papers (2024-07-11T17:42:17Z) - ELGC-Net: Efficient Local-Global Context Aggregation for Remote Sensing Change Detection [65.59969454655996]
We propose an efficient change detection framework, ELGC-Net, which leverages rich contextual information to precisely estimate change regions.
Our proposed ELGC-Net sets a new state-of-the-art performance in remote sensing change detection benchmarks.
We also introduce ELGC-Net-LW, a lighter variant with significantly reduced computational complexity, suitable for resource-constrained settings.
arXiv Detail & Related papers (2024-03-26T17:46:25Z) - CurriculumLoc: Enhancing Cross-Domain Geolocalization through
Multi-Stage Refinement [11.108860387261508]
Visual geolocalization is a cost-effective and scalable task that involves matching one or more query images taken at some unknown location, to a set of geo-tagged reference images.
We develop CurriculumLoc, a novel keypoint detection and description with global semantic awareness and a local geometric verification.
We achieve new high recall@1 scores of 62.6% and 94.5% on ALTO, with two different distances metrics, respectively.
arXiv Detail & Related papers (2023-11-20T08:40:01Z) - Adaptive Local-Component-aware Graph Convolutional Network for One-shot
Skeleton-based Action Recognition [54.23513799338309]
We present an Adaptive Local-Component-aware Graph Convolutional Network for skeleton-based action recognition.
Our method provides a stronger representation than the global embedding and helps our model reach state-of-the-art.
arXiv Detail & Related papers (2022-09-21T02:33:07Z) - Transformer-based Context Condensation for Boosting Feature Pyramids in
Object Detection [77.50110439560152]
Current object detectors typically have a feature pyramid (FP) module for multi-level feature fusion (MFF)
We propose a novel and efficient context modeling mechanism that can help existing FPs deliver better MFF results.
In particular, we introduce a novel insight that comprehensive contexts can be decomposed and condensed into two types of representations for higher efficiency.
arXiv Detail & Related papers (2022-07-14T01:45:03Z) - Remote Sensing Cross-Modal Text-Image Retrieval Based on Global and
Local Information [15.32353270625554]
Cross-modal remote sensing text-image retrieval (RSCTIR) has recently become an urgent research hotspot due to its ability of enabling fast and flexible information extraction on remote sensing (RS) images.
We first propose a novel RSCTIR framework based on global and local information (GaLR), and design a multi-level information dynamic fusion (MIDF) module to efficaciously integrate features of different levels.
Experiments on public datasets strongly demonstrate the state-of-the-art performance of GaLR methods on the RSCTIR task.
arXiv Detail & Related papers (2022-04-21T03:18:09Z) - Retrieval and Localization with Observation Constraints [12.010135672015704]
We propose an integrated visual re-localization method called RLOCS.
It combines image retrieval, semantic consistency and geometry verification to achieve accurate estimations.
Our method achieves many performance improvements on the challenging localization benchmarks.
arXiv Detail & Related papers (2021-08-19T06:14:33Z) - Active Visual Localization in Partially Calibrated Environments [35.48595012305253]
Humans can robustly localize themselves without a map after they get lost following prominent visual cues or landmarks.
In this work, we aim at endowing autonomous agents the same ability. Such ability is important in robotics applications yet very challenging when an agent is exposed to partially calibrated environments.
We propose an indoor scene dataset ACR-6, which consists of both synthetic and real data and simulates challenging scenarios for active visual localization.
arXiv Detail & Related papers (2020-12-08T08:00:55Z) - Unsupervised Metric Relocalization Using Transform Consistency Loss [66.19479868638925]
Training networks to perform metric relocalization traditionally requires accurate image correspondences.
We propose a self-supervised solution, which exploits a key insight: localizing a query image within a map should yield the same absolute pose, regardless of the reference image used for registration.
We evaluate our framework on synthetic and real-world data, showing our approach outperforms other supervised methods when a limited amount of ground-truth information is available.
arXiv Detail & Related papers (2020-11-01T19:24:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.