Multi-modal Loop Closure Detection with Foundation Models in Severely Unstructured Environments
- URL: http://arxiv.org/abs/2511.05404v1
- Date: Fri, 07 Nov 2025 16:30:35 GMT
- Title: Multi-modal Loop Closure Detection with Foundation Models in Severely Unstructured Environments
- Authors: Laura Alejandra Encinar Gonzalez, John Folkesson, Rudolph Triebel, Riccardo Giubilato,
- Abstract summary: This paper presents MPRF, a multimodal pipeline that leverages transformer-based foundation models for both vision and LiDAR modalities to achieve robust loop closure.<n> Experiments on the S3LI dataset and S3LI Vulcano dataset show that MPRF outperforms state-of-the-art retrieval methods in precision.<n>By providing interpretable correspondences suitable for SLAM back-ends, MPRF achieves a favorable trade-off between accuracy, efficiency, and reliability.
- Score: 10.028232479762075
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Robust loop closure detection is a critical component of Simultaneous Localization and Mapping (SLAM) algorithms in GNSS-denied environments, such as in the context of planetary exploration. In these settings, visual place recognition often fails due to aliasing and weak textures, while LiDAR-based methods suffer from sparsity and ambiguity. This paper presents MPRF, a multimodal pipeline that leverages transformer-based foundation models for both vision and LiDAR modalities to achieve robust loop closure in severely unstructured environments. Unlike prior work limited to retrieval, MPRF integrates a two-stage visual retrieval strategy with explicit 6-DoF pose estimation, combining DINOv2 features with SALAD aggregation for efficient candidate screening and SONATA-based LiDAR descriptors for geometric verification. Experiments on the S3LI dataset and S3LI Vulcano dataset show that MPRF outperforms state-of-the-art retrieval methods in precision while enhancing pose estimation robustness in low-texture regions. By providing interpretable correspondences suitable for SLAM back-ends, MPRF achieves a favorable trade-off between accuracy, efficiency, and reliability, demonstrating the potential of foundation models to unify place recognition and pose estimation. Code and models will be released at github.com/DLR-RM/MPRF.
Related papers
- A Novel Multimodal RUL Framework for Remaining Useful Life Estimation with Layer-wise Explanations [2.312232949770907]
Rolling-element bearings are among the most frequent causes of machinery failure.<n>Rolling-element bearings are among the most frequent causes of machinery failure.<n>Existing approaches often suffer from poor generalization, lack of robustness, high data demands, and limited interpretability.
arXiv Detail & Related papers (2025-12-07T07:38:36Z) - Source-Free Object Detection with Detection Transformer [59.33653163035064]
Source-Free Object Detection (SFOD) enables knowledge transfer from a source domain to an unsupervised target domain for object detection without access to source data.<n>Most existing SFOD approaches are either confined to conventional object detection (OD) models like Faster R-CNN or designed as general solutions without tailored adaptations for novel OD architectures, especially Detection Transformer (DETR)<n>In this paper, we introduce Feature Reweighting ANd Contrastive Learning NetworK (FRANCK), a novel SFOD framework specifically designed to perform query-centric feature enhancement for DETRs.
arXiv Detail & Related papers (2025-10-13T07:35:04Z) - D$^2$GS: Depth-and-Density Guided Gaussian Splatting for Stable and Accurate Sparse-View Reconstruction [73.61056394880733]
3D Gaussian Splatting (3DGS) enables real-time, high-fidelity novel view synthesis (NVS) with explicit 3D representations.<n>We identify two key failure modes under sparse-view conditions: overfitting in regions with excessive Gaussian density near the camera, and underfitting in distant areas with insufficient Gaussian coverage.<n>We propose a unified framework D$2$GS, comprising two key components: a Depth-and-Density Guided Dropout strategy, and a Distance-Aware Fidelity Enhancement module.
arXiv Detail & Related papers (2025-10-09T17:59:49Z) - VOGUE: Guiding Exploration with Visual Uncertainty Improves Multimodal Reasoning [62.09195763860549]
Reinforcement learning with verifiable rewards (RLVR) improves reasoning in large language models (LLMs) but struggles with exploration.<n>We introduce $textbfVOGUE (Visual Uncertainty Guided Exploration)$, a novel method that shifts exploration from the output (text) to the input (visual) space.<n>Our work shows that grounding exploration in the inherent uncertainty of visual inputs is an effective strategy for improving multimodal reasoning.
arXiv Detail & Related papers (2025-10-01T20:32:08Z) - LDRFusion: A LiDAR-Dominant multimodal refinement framework for 3D object detection [5.820124526272312]
Existing LiDAR-Camera fusion methods have achieved strong results in 3D object detection.<n>We propose LDRFusion, a novel Lidar-dominant two-stage refinement framework for multi-sensor fusion.<n>Our framework consistently achieves strong performance across multiple categories and difficulty levels.
arXiv Detail & Related papers (2025-07-22T04:35:52Z) - Hi^2-GSLoc: Dual-Hierarchical Gaussian-Specific Visual Relocalization for Remote Sensing [6.997091164331322]
Visual relocalization is fundamental to remote sensing and UAV applications.<n>Existing methods face inherent trade-offs: image-based retrieval and pose regression approaches lack precision.<n>We introduce $mathrmHi2$-GSLoc, a dual-hierarchical relocalization framework that follows a sparse-to-dense and coarse-to-fine paradigm.
arXiv Detail & Related papers (2025-07-21T14:47:56Z) - Divide-and-Conquer: Confluent Triple-Flow Network for RGB-T Salient Object Detection [70.84835546732738]
RGB-Thermal Salient Object Detection aims to pinpoint prominent objects within aligned pairs of visible and thermal infrared images.<n>Traditional encoder-decoder architectures may not have adequately considered the robustness against noise originating from defective modalities.<n>We propose the ConTriNet, a robust Confluent Triple-Flow Network employing a Divide-and-Conquer strategy.
arXiv Detail & Related papers (2024-12-02T14:44:39Z) - PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting [54.7468067660037]
PF3plat sets a new state-of-the-art across all benchmarks, supported by comprehensive ablation studies validating our design choices.<n>Our framework capitalizes on fast speed, scalability, and high-quality 3D reconstruction and view synthesis capabilities of 3DGS.
arXiv Detail & Related papers (2024-10-29T15:28:15Z) - Lightweight Deep Learning Framework for Accurate Particle Flow Energy Reconstruction [8.598010350935596]
This paper systematically evaluates a deep learning reconstruction framework.<n>We design a hybrid loss function combining weighted mean squared with error structural similarity index.<n>We enhance the model's capability to capture cross-modaltemporal correlations and energy-displacement nonlinearities.
arXiv Detail & Related papers (2024-10-08T11:49:18Z) - Leveraging Neural Radiance Field in Descriptor Synthesis for Keypoints Scene Coordinate Regression [1.2974519529978974]
This paper introduces a pipeline for keypoint descriptor synthesis using Neural Radiance Field (NeRF)
generating novel poses and feeding them into a trained NeRF model to create new views, our approach enhances the KSCR's capabilities in data-scarce environments.
The proposed system could significantly improve localization accuracy by up to 50% and cost only a fraction of time for data synthesis.
arXiv Detail & Related papers (2024-03-15T13:40:37Z) - Benchmarking the Robustness of LiDAR Semantic Segmentation Models [78.6597530416523]
In this paper, we aim to comprehensively analyze the robustness of LiDAR semantic segmentation models under various corruptions.
We propose a new benchmark called SemanticKITTI-C, which features 16 out-of-domain LiDAR corruptions in three groups, namely adverse weather, measurement noise and cross-device discrepancy.
We design a robust LiDAR segmentation model (RLSeg) which greatly boosts the robustness with simple but effective modifications.
arXiv Detail & Related papers (2023-01-03T06:47:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.