Hierarchical localization with panoramic views and triplet loss functions
- URL: http://arxiv.org/abs/2404.14117v2
- Date: Fri, 22 Nov 2024 15:51:52 GMT
- Title: Hierarchical localization with panoramic views and triplet loss functions
- Authors: Marcos Alfaro, Juan José Cabrera, María Flores, Óscar Reinoso, Luis Payá,
- Abstract summary: The main objective of this paper is to tackle visual localization, which is essential for the safe navigation of mobile robots.
The solution we propose employs panoramic images and triplet convolutional neural networks.
To explore the limits of our approach, triplet networks have been tested in different indoor environments simultaneously.
- Score: 2.663377882489275
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The main objective of this paper is to tackle visual localization, which is essential for the safe navigation of mobile robots. The solution we propose employs panoramic images and triplet convolutional neural networks. We seek to exploit the properties of such architectures to address both hierarchical and global localization in indoor environments, which are prone to visual aliasing and other phenomena. Considering their importance in these architectures, a complete comparative evaluation of different triplet loss functions is performed. The experimental section proves that triplet networks can be trained with a relatively low number of images captured under a specific lighting condition and even so, the resulting networks are a robust tool to perform visual localization under dynamic conditions. Our approach has been evaluated against some of these effects, such as changes in the lighting conditions, occlusions, noise and motion blurring. Furthermore, to explore the limits of our approach, triplet networks have been tested in different indoor environments simultaneously. In all the cases, these architectures have demonstrated a great capability to generalize to diverse and challenging scenarios. The code used in the experiments is available at https://github.com/MarcosAlfaro/TripletNetworksIndoorLocalization.git.
Related papers
- Bridging the Gap Between Ideal and Real-world Evaluation: Benchmarking AI-Generated Image Detection in Challenging Scenarios [54.07895223545793]
This paper introduces the Real-World Robustness dataset (RRDataset) for comprehensive evaluation of detection models across three dimensions.<n>RRDataset includes high-quality images from seven major scenarios.<n>We benchmarked 17 detectors and 10 vision-language models (VLMs) on RRDataset and conducted a large-scale human study.
arXiv Detail & Related papers (2025-09-11T06:15:52Z) - Task-based Loss Functions in Computer Vision: A Comprehensive Review [6.831703240486857]
Loss functions are at the heart of deep learning, shaping how models learn and perform across diverse tasks.<n>This paper presents a comprehensive review of loss functions, covering fundamental metrics like Mean Squared Error and Cross-Entropy to advanced functions such as Adversarial and Diffusion losses.
arXiv Detail & Related papers (2025-04-05T18:07:20Z) - Underlying Semantic Diffusion for Effective and Efficient In-Context Learning [113.4003355229632]
Underlying Semantic Diffusion (US-Diffusion) is an enhanced diffusion model that boosts underlying semantics learning, computational efficiency, and in-context learning capabilities.<n>We present a Feedback-Aided Learning (FAL) framework, which leverages feedback signals to guide the model in capturing semantic details.<n>We also propose a plug-and-play Efficient Sampling Strategy (ESS) for dense sampling at time steps with high-noise levels.
arXiv Detail & Related papers (2025-03-06T03:06:22Z) - Enhancing Environmental Robustness in Few-shot Learning via Conditional Representation Learning [27.549889991320203]
Few-shot learning has been extensively utilized to overcome the scarcity of training data in domain-specific visual recognition.<n>In real-world scenarios, environmental factors such as complex backgrounds, varying lighting conditions, long-distance shooting, and moving targets often cause test images to exhibit numerous incomplete targets or noise disruptions.<n>We propose a novel conditional representation learning network (CRLNet) that integrates the interactions between training and testing images as conditional information in their respective representation processes.
arXiv Detail & Related papers (2025-02-03T09:18:03Z) - Understanding and Improving Training-Free AI-Generated Image Detections with Vision Foundation Models [68.90917438865078]
Deepfake techniques for facial synthesis and editing pose serious risks for generative models.<n>In this paper, we investigate how detection performance varies across model backbones, types, and datasets.<n>We introduce Contrastive Blur, which enhances performance on facial images, and MINDER, which addresses noise type bias, balancing performance across domains.
arXiv Detail & Related papers (2024-11-28T13:04:45Z) - Visual Localization in 3D Maps: Comparing Point Cloud, Mesh, and NeRF Representations [8.522160106746478]
We present a global visual localization system capable of localizing a single camera image across various 3D map representations.
Our system generates a database by synthesizing novel views of the scene, creating RGB and depth image pairs.
NeRF synthesized images show superior performance, localizing query images at an average success rate of 72%.
arXiv Detail & Related papers (2024-08-21T19:37:17Z) - An experimental evaluation of Siamese Neural Networks for robot localization using omnidirectional imaging in indoor environments [1.0485739694839669]
This paper addresses the localization problem using omnidirectional images captured by a catadioptric vision system mounted on the robot.
We explore the potential of Siamese Neural Networks for modeling indoor environments using panoramic images as the unique source of information.
arXiv Detail & Related papers (2024-07-15T08:44:37Z) - Lacunarity Pooling Layers for Plant Image Classification using Texture Analysis [0.38366697175402226]
Pooling layers overlook important information encoded in the spatial arrangement of pixel intensity and/or feature values.
We propose a novel lacunarity pooling layer that aims to capture the spatial heterogeneity of the feature maps by evaluating the variability within local windows.
The lacunarity pooling layer can be seamlessly integrated into any artificial neural network architecture.
arXiv Detail & Related papers (2024-04-25T00:34:52Z) - Improving Hand Recognition in Uncontrolled and Uncooperative
Environments using Multiple Spatial Transformers and Loss Functions [13.47664951012019]
Many existing hand-based recognition methods perform well for hand images collected in controlled environments with user cooperation.
An algorithm integrating a multi-spatial transformer network (MSTN) and multiple loss functions is proposed to fully utilize information in full hand images.
Experimental results show that the proposed algorithm performs significantly better than the existing methods in these uncontrolled and uncooperative environments.
arXiv Detail & Related papers (2023-11-09T14:08:48Z) - ZoomNeXt: A Unified Collaborative Pyramid Network for Camouflaged Object Detection [70.11264880907652]
Recent object (COD) attempts to segment objects visually blended into their surroundings, which is extremely complex and difficult in real-world scenarios.
We propose an effective unified collaborative pyramid network that mimics human behavior when observing vague images and camouflaged zooming in and out.
Our framework consistently outperforms existing state-of-the-art methods in image and video COD benchmarks.
arXiv Detail & Related papers (2023-10-31T06:11:23Z) - Leveraging Neural Radiance Fields for Uncertainty-Aware Visual
Localization [56.95046107046027]
We propose to leverage Neural Radiance Fields (NeRF) to generate training samples for scene coordinate regression.
Despite NeRF's efficiency in rendering, many of the rendered data are polluted by artifacts or only contain minimal information gain.
arXiv Detail & Related papers (2023-10-10T20:11:13Z) - Neural Implicit Dense Semantic SLAM [83.04331351572277]
We propose a novel RGBD vSLAM algorithm that learns a memory-efficient, dense 3D geometry, and semantic segmentation of an indoor scene in an online manner.
Our pipeline combines classical 3D vision-based tracking and loop closing with neural fields-based mapping.
Our proposed algorithm can greatly enhance scene perception and assist with a range of robot control problems.
arXiv Detail & Related papers (2023-04-27T23:03:52Z) - Enhancing Deformable Local Features by Jointly Learning to Detect and
Describe Keypoints [8.390939268280235]
Local feature extraction is a standard approach in computer vision for tackling important tasks such as image matching and retrieval.
We propose DALF, a novel deformation-aware network for jointly detecting and describing keypoints.
Our approach also enhances the performance of two real-world applications: deformable object retrieval and non-rigid 3D surface registration.
arXiv Detail & Related papers (2023-04-02T18:01:51Z) - Self-Supervised Feature Learning for Long-Term Metric Visual
Localization [16.987148593917905]
We present a novel self-supervised feature learning framework for metric visual localization.
We use a sequence-based image matching algorithm to generate image correspondences without ground-truth labels.
We can then sample image pairs to train a deep neural network that learns sparse features with associated descriptors and scores without ground-truth pose supervision.
arXiv Detail & Related papers (2022-11-30T21:15:05Z) - Supervised Fine-tuning Evaluation for Long-term Visual Place Recognition [14.632777952261716]
We present a comprehensive study on the utility of deep convolutional neural networks with two state-of-the-art pooling layers.
We compare deep learned global features with three different loss functions, e.g. triplet, contrastive and ArcFace, for learning the parameters of the architectures.
Our investigation demonstrates that fine tuning architectures with ArcFace loss in an end-to-end manner outperforms other two losses by approximately 14% in outdoor and 12% in indoor datasets.
arXiv Detail & Related papers (2022-11-14T19:16:21Z) - Vision Transformer for NeRF-Based View Synthesis from a Single Input
Image [49.956005709863355]
We propose to leverage both the global and local features to form an expressive 3D representation.
To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering.
Our method can render novel views from only a single input image and generalize across multiple object categories using a single model.
arXiv Detail & Related papers (2022-07-12T17:52:04Z) - FuNNscope: Visual microscope for interactively exploring the loss
landscape of fully connected neural networks [77.34726150561087]
We show how to explore high-dimensional landscape characteristics of neural networks.
We generalize observations on small neural networks to more complex systems.
An interactive dashboard opens up a number of possible application networks.
arXiv Detail & Related papers (2022-04-09T16:41:53Z) - On Exploring Pose Estimation as an Auxiliary Learning Task for
Visible-Infrared Person Re-identification [66.58450185833479]
In this paper, we exploit Pose Estimation as an auxiliary learning task to assist the VI-ReID task in an end-to-end framework.
By jointly training these two tasks in a mutually beneficial manner, our model learns higher quality modality-shared and ID-related features.
Experimental results on two benchmark VI-ReID datasets show that the proposed method consistently improves state-of-the-art methods by significant margins.
arXiv Detail & Related papers (2022-01-11T09:44:00Z) - Stereoscopic Universal Perturbations across Different Architectures and
Datasets [60.021985610201156]
We study the effect of adversarial perturbations of images on deep stereo matching networks for the disparity estimation task.
We present a method to craft a single set of perturbations that, when added to any stereo image pair in a dataset, can fool a stereo network.
Our perturbations can increase D1-error (akin to fooling rate) of state-of-the-art stereo networks from 1% to as much as 87%.
arXiv Detail & Related papers (2021-12-12T02:11:31Z) - A Decidability-Based Loss Function [2.5919311269669003]
Biometric problems often use deep learning models to extract features from images, also known as embeddings.
In this work, a loss function based on the decidability index is proposed to improve the quality of embeddings for the verification routine.
The proposed approach is compared against the Softmax (cross-entropy), Triplets Soft-Hard, and the Multi Similarity losses in four different benchmarks.
arXiv Detail & Related papers (2021-09-12T14:26:27Z) - Frequency-aware Discriminative Feature Learning Supervised by
Single-Center Loss for Face Forgery Detection [89.43987367139724]
Face forgery detection is raising ever-increasing interest in computer vision.
Recent works have reached sound achievements, but there are still unignorable problems.
A novel frequency-aware discriminative feature learning framework is proposed in this paper.
arXiv Detail & Related papers (2021-03-16T14:17:17Z) - Unsupervised Metric Relocalization Using Transform Consistency Loss [66.19479868638925]
Training networks to perform metric relocalization traditionally requires accurate image correspondences.
We propose a self-supervised solution, which exploits a key insight: localizing a query image within a map should yield the same absolute pose, regardless of the reference image used for registration.
We evaluate our framework on synthetic and real-world data, showing our approach outperforms other supervised methods when a limited amount of ground-truth information is available.
arXiv Detail & Related papers (2020-11-01T19:24:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.