LatentBKI: Open-Dictionary Continuous Mapping in Visual-Language Latent Spaces with Quantifiable Uncertainty
- URL: http://arxiv.org/abs/2410.11783v2
- Date: Tue, 21 Jan 2025 21:46:26 GMT
- Title: LatentBKI: Open-Dictionary Continuous Mapping in Visual-Language Latent Spaces with Quantifiable Uncertainty
- Authors: Joey Wilson, Ruihan Xu, Yile Sun, Parker Ewen, Minghan Zhu, Kira Barton, Maani Ghaffari,
- Abstract summary: This paper introduces a novel probabilistic mapping algorithm, LatentBKI, which enables open-vocabulary mapping with quantifiable uncertainty.
LatentBKI is evaluated against similar explicit semantic mapping and VL mapping frameworks on the popular Matterport3D and Semantic KITTI datasets.
Real-world experiments demonstrate applicability to challenging indoor environments.
- Score: 6.986230616834552
- License:
- Abstract: This paper introduces a novel probabilistic mapping algorithm, LatentBKI, which enables open-vocabulary mapping with quantifiable uncertainty. Traditionally, semantic mapping algorithms focus on a fixed set of semantic categories which limits their applicability for complex robotic tasks. Vision-Language (VL) models have recently emerged as a technique to jointly model language and visual features in a latent space, enabling semantic recognition beyond a predefined, fixed set of semantic classes. LatentBKI recurrently incorporates neural embeddings from VL models into a voxel map with quantifiable uncertainty, leveraging the spatial correlations of nearby observations through Bayesian Kernel Inference (BKI). LatentBKI is evaluated against similar explicit semantic mapping and VL mapping frameworks on the popular Matterport3D and Semantic KITTI datasets, demonstrating that LatentBKI maintains the probabilistic benefits of continuous mapping with the additional benefit of open-dictionary queries. Real-world experiments demonstrate applicability to challenging indoor environments.
Related papers
- Post-hoc Probabilistic Vision-Language Models [51.12284891724463]
Vision-language models (VLMs) have found remarkable success in classification, retrieval, and generative tasks.
We propose post-hoc uncertainty estimation in VLMs that does not require additional training.
Our results show promise for safety-critical applications of large-scale models.
arXiv Detail & Related papers (2024-12-08T18:16:13Z) - Disentangling Dense Embeddings with Sparse Autoencoders [0.0]
Sparse autoencoders (SAEs) have shown promise in extracting interpretable features from complex neural networks.
We present one of the first applications of SAEs to dense text embeddings from large language models.
We show that the resulting sparse representations maintain semantic fidelity while offering interpretability.
arXiv Detail & Related papers (2024-08-01T15:46:22Z) - Uncertainty-aware Semantic Mapping in Off-road Environments with Dempster-Shafer Theory of Evidence [4.83420384410068]
We propose an evidential semantic mapping framework, which integrates the evidential reasoning of Dempster-Shafer Theory of Evidence (DST) into the entire mapping pipeline.
We show that our framework enhances the reliability of uncertainty maps, consistently outperforming existing methods in scenes with high perceptual uncertainties.
arXiv Detail & Related papers (2024-05-10T06:32:01Z) - Evidential Semantic Mapping in Off-road Environments with Uncertainty-aware Bayesian Kernel Inference [5.120567378386614]
We propose an evidential semantic mapping framework, which can enhance reliability in perceptually challenging off-road environments.
By adaptively handling semantic uncertainties, the proposed framework constructs robust representations of the surroundings even in previously unseen environments.
arXiv Detail & Related papers (2024-03-21T05:13:34Z) - Dirichlet Active Learning [1.4277428617774877]
Dirichlet Active Learning (DiAL) is a Bayesian-inspired approach to the design of active learning algorithms.
Our framework models feature-conditional class probabilities as a Dirichlet random field.
arXiv Detail & Related papers (2023-11-09T16:39:02Z) - Semantics Meets Temporal Correspondence: Self-supervised Object-centric Learning in Videos [63.94040814459116]
Self-supervised methods have shown remarkable progress in learning high-level semantics and low-level temporal correspondence.
We propose a novel semantic-aware masked slot attention on top of the fused semantic features and correspondence maps.
We adopt semantic- and instance-level temporal consistency as self-supervision to encourage temporally coherent object-centric representations.
arXiv Detail & Related papers (2023-08-19T09:12:13Z) - GFlowNet-EM for learning compositional latent variable models [115.96660869630227]
A key tradeoff in modeling the posteriors over latents is between expressivity and tractable optimization.
We propose the use of GFlowNets, algorithms for sampling from an unnormalized density.
By training GFlowNets to sample from the posterior over latents, we take advantage of their strengths as amortized variational algorithms.
arXiv Detail & Related papers (2023-02-13T18:24:21Z) - BEVBert: Multimodal Map Pre-training for Language-guided Navigation [75.23388288113817]
We propose a new map-based pre-training paradigm that is spatial-aware for use in vision-and-language navigation (VLN)
We build a local metric map to explicitly aggregate incomplete observations and remove duplicates, while modeling navigation dependency in a global topological map.
Based on the hybrid map, we devise a pre-training framework to learn a multimodal map representation, which enhances spatial-aware cross-modal reasoning thereby facilitating the language-guided navigation goal.
arXiv Detail & Related papers (2022-12-08T16:27:54Z) - Convolutional Bayesian Kernel Inference for 3D Semantic Mapping [1.7615233156139762]
We introduce a Convolutional Bayesian Kernel Inference layer which learns to perform explicit Bayesian inference.
We learn semantic-geometric probability distributions for LiDAR sensor information and incorporate semantic predictions into a global map.
We evaluate our network against state-of-the-art semantic mapping algorithms on the KITTI data set, demonstrating improved latency with comparable semantic label inference results.
arXiv Detail & Related papers (2022-09-21T21:15:12Z) - Dive into Ambiguity: Latent Distribution Mining and Pairwise Uncertainty
Estimation for Facial Expression Recognition [59.52434325897716]
We propose a solution, named DMUE, to address the problem of annotation ambiguity from two perspectives.
For the former, an auxiliary multi-branch learning framework is introduced to better mine and describe the latent distribution in the label space.
For the latter, the pairwise relationship of semantic feature between instances are fully exploited to estimate the ambiguity extent in the instance space.
arXiv Detail & Related papers (2021-04-01T03:21:57Z) - Inter-class Discrepancy Alignment for Face Recognition [55.578063356210144]
We propose a unified framework calledInter-class DiscrepancyAlignment(IDA)
IDA-DAO is used to align the similarity scores considering the discrepancy between the images and its neighbors.
IDA-SSE can provide convincing inter-class neighbors by introducing virtual candidate images generated with GAN.
arXiv Detail & Related papers (2021-03-02T08:20:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.