Related papers: LatentAM: Real-Time, Large-Scale Latent Gaussian Attention Mapping via Online Dictionary Learning

LatentAM: Real-Time, Large-Scale Latent Gaussian Attention Mapping via Online Dictionary Learning

URL: http://arxiv.org/abs/2602.12314v1
Date: Thu, 12 Feb 2026 17:25:00 GMT
Title: LatentAM: Real-Time, Large-Scale Latent Gaussian Attention Mapping via Online Dictionary Learning
Authors: Junwoon Lee, Yulun Tian,
Abstract summary: LatentAM builds latent feature maps from streaming RGB-D observations for open-vocabulary robotic perception.<n>We propose an online dictionary learning approach that is both model-agnostic and pretraining-free.<n>Experiments on public benchmarks and a large-scale custom dataset demonstrate that LatentAM attains significantly better feature reconstruction fidelity.
Score: 1.9229388624311596
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present LatentAM, an online 3D Gaussian Splatting (3DGS) mapping framework that builds scalable latent feature maps from streaming RGB-D observations for open-vocabulary robotic perception. Instead of distilling high-dimensional Vision-Language Model (VLM) embeddings using model-specific decoders, LatentAM proposes an online dictionary learning approach that is both model-agnostic and pretraining-free, enabling plug-and-play integration with different VLMs at test time. Specifically, our approach associates each Gaussian primitive with a compact query vector that can be converted into approximate VLM embeddings using an attention mechanism with a learnable dictionary. The dictionary is initialized efficiently from streaming observations and optimized online to adapt to evolving scene semantics under trust-region regularization. To scale to long trajectories and large environments, we further propose an efficient map management strategy based on voxel hashing, where optimization is restricted to an active local map on the GPU, while the global map is stored and indexed on the CPU to maintain bounded GPU memory usage. Experiments on public benchmarks and a large-scale custom dataset demonstrate that LatentAM attains significantly better feature reconstruction fidelity compared to state-of-the-art methods, while achieving near-real-time speed (12-35 FPS) on the evaluated datasets. Our project page is at: https://junwoonlee.github.io/projects/LatentAM

Related papers

LangGS-SLAM: Real-Time Language-Feature Gaussian Splatting SLAM [2.738569311610586]
RGB-D SLAM system reconstructs a language-aligned dense feature field while sustaining low-latency tracking and mapping.<n>System achieves superior geometric fidelity compared to geometric-only baselines and comparable semantic fidelity to offline approaches while operating at 15 FPS.
arXiv Detail & Related papers (2026-01-28T05:35:34Z)
Splat-SLAM: Globally Optimized RGB-only SLAM with 3D Gaussians [87.48403838439391]
3D Splatting has emerged as a powerful representation of geometry and appearance for RGB-only dense Simultaneous SLAM. We propose the first RGB-only SLAM system with a dense 3D Gaussian map representation. Our experiments on the Replica, TUM-RGBD, and ScanNet datasets indicate the effectiveness of globally optimized 3D Gaussians.
arXiv Detail & Related papers (2024-05-26T12:26:54Z)
Loopy-SLAM: Dense Neural SLAM with Loop Closures [53.11936461015725]
We introduce Loopy-SLAM that globally optimize poses and the dense 3D model. We use frame-to-model tracking using a data-driven point-based submap generation method and trigger loop closures online by performing global place recognition. Evaluation on the synthetic Replica and real-world TUM-RGBD and ScanNet datasets demonstrate competitive or superior performance in tracking, mapping, and rendering accuracy when compared to existing dense neural RGBD SLAM methods.
arXiv Detail & Related papers (2024-02-14T18:18:32Z)
HI-SLAM: Monocular Real-time Dense Mapping with Hybrid Implicit Fields [11.627951040865568]
Recent neural mapping frameworks show promising results, but rely on RGB-D or pose inputs, or cannot run in real-time. Our approach integrates dense-SLAM with neural implicit fields. For efficient construction of neural fields, we employ multi-resolution grid encoding and signed distance function.
arXiv Detail & Related papers (2023-10-07T12:26:56Z)
Volumetric Semantically Consistent 3D Panoptic Mapping [77.13446499924977]
We introduce an online 2D-to-3D semantic instance mapping algorithm aimed at generating semantic 3D maps suitable for autonomous agents in unstructured environments. It introduces novel ways of integrating semantic prediction confidence during mapping, producing semantic and instance-consistent 3D regions. The proposed method achieves accuracy superior to the state of the art on public large-scale datasets, improving on a number of widely used metrics.
arXiv Detail & Related papers (2023-09-26T08:03:10Z)
EgoVM: Achieving Precise Ego-Localization using Lightweight Vectorized Maps [9.450650025266379]
We present EgoVM, an end-to-end localization network that achieves comparable localization accuracy to prior state-of-the-art methods. We employ a set of learnable semantic embeddings to encode the semantic types of map elements and supervise them with semantic segmentation. We adopt a robust histogram-based pose solver to estimate the optimal pose by searching exhaustively over candidate poses.
arXiv Detail & Related papers (2023-07-18T06:07:25Z)
Neural Implicit Dense Semantic SLAM [83.04331351572277]
We propose a novel RGBD vSLAM algorithm that learns a memory-efficient, dense 3D geometry, and semantic segmentation of an indoor scene in an online manner. Our pipeline combines classical 3D vision-based tracking and loop closing with neural fields-based mapping. Our proposed algorithm can greatly enhance scene perception and assist with a range of robot control problems.
arXiv Detail & Related papers (2023-04-27T23:03:52Z)
TANDEM: Tracking and Dense Mapping in Real-time using Deep Multi-view Stereo [55.30992853477754]
We present TANDEM, a real-time monocular tracking and dense framework. For pose estimation, TANDEM performs photometric bundle adjustment based on a sliding window of alignments. TANDEM shows state-of-the-art real-time 3D reconstruction performance.
arXiv Detail & Related papers (2021-11-14T19:01:02Z)
Spatial Pyramid Based Graph Reasoning for Semantic Segmentation [67.47159595239798]
We apply graph convolution into the semantic segmentation task and propose an improved Laplacian. The graph reasoning is directly performed in the original feature space organized as a spatial pyramid. We achieve comparable performance with advantages in computational and memory overhead.
arXiv Detail & Related papers (2020-03-23T12:28:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.