Related papers: TextSLAM: Visual SLAM with Semantic Planar Text Features

TextSLAM: Visual SLAM with Semantic Planar Text Features

URL: http://arxiv.org/abs/2305.10029v2
Date: Mon, 3 Jul 2023 12:06:12 GMT
Title: TextSLAM: Visual SLAM with Semantic Planar Text Features
Authors: Boying Li, Danping Zou, Yuan Huang, Xinghan Niu, Ling Pei, Wenxian Yu
Abstract summary: We propose a novel visual SLAM method that integrates text objects tightly by treating them as semantic features. We tested our method in various scenes with the ground truth data. The results show that integrating texture features leads to a more superior SLAM system that can match images across day and night.
Score: 8.8100408194584
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a novel visual SLAM method that integrates text objects tightly by treating them as semantic features via fully exploring their geometric and semantic prior. The text object is modeled as a texture-rich planar patch whose semantic meaning is extracted and updated on the fly for better data association. With the full exploration of locally planar characteristics and semantic meaning of text objects, the SLAM system becomes more accurate and robust even under challenging conditions such as image blurring, large viewpoint changes, and significant illumination variations (day and night). We tested our method in various scenes with the ground truth data. The results show that integrating texture features leads to a more superior SLAM system that can match images across day and night. The reconstructed semantic 3D text map could be useful for navigation and scene understanding in robotic and mixed reality applications. Our project page: https://github.com/SJTU-ViSYS/TextSLAM .

Related papers

LIGHT: Multi-Modal Text Linking on Historical Maps [1.8399976559754367]
Light is a novel multi-modal approach that integrates linguistic, image, and geometric features for linking text on historical maps.<n>It outperforms existing methods on the ICDAR 2024/2025 MapText Competition data.
arXiv Detail & Related papers (2025-06-27T19:18:00Z)
LEG-SLAM: Real-Time Language-Enhanced Gaussian Splatting for SLAM [0.0]
LEG-SLAM is a novel approach that fuses an optimized Gaussian Splatting implementation with visual-language feature extraction.<n>Our method simultaneously generates high-quality photorealistic images and semantically labeled scene maps.<n>With its potential applications in autonomous robotics, augmented reality, and other interactive domains, LEG-SLAM represents a significant step forward in real-time semantic 3D Gaussian-based SLAM.
arXiv Detail & Related papers (2025-06-03T16:51:59Z)
TriTex: Learning Texture from a Single Mesh via Triplane Semantic Features [78.13246375582906]
We present a novel approach that learns a volumetric texture field from a single textured mesh by mapping semantic features to surface target colors. Our approach achieves superior texture quality across 3D models in applications like game development.
arXiv Detail & Related papers (2025-03-20T18:35:03Z)
vS-Graphs: Integrating Visual SLAM and Situational Graphs through Multi-level Scene Understanding [0.0]
This paper introduces visual S-Graphs (vS-Graphs), a novel real-time VSLAM framework. It integrates vision-based scene understanding with map reconstruction and comprehensible graph-based representation. Experiments on standard benchmarks and real-world datasets demonstrate that vS-Graphs outperforms state-of-the-art VSLAM methods.
arXiv Detail & Related papers (2025-03-03T18:15:11Z)
PanoSLAM: Panoptic 3D Scene Reconstruction via Gaussian SLAM [105.01907579424362]
PanoSLAM is the first SLAM system to integrate geometric reconstruction, 3D semantic segmentation, and 3D instance segmentation within a unified framework. For the first time, it achieves panoptic 3D reconstruction of open-world environments directly from the RGB-D video.
arXiv Detail & Related papers (2024-12-31T08:58:10Z)
Textured Mesh Saliency: Bridging Geometry and Texture for Human Perception in 3D Graphics [50.23625950905638]
We present a new dataset for textured mesh saliency, created through an innovative eye-tracking experiment in a six degrees of freedom (6-DOF) VR environment. Our proposed model predicts saliency maps for textured mesh surfaces by treating each triangular face as an individual unit and assigning a saliency density value to reflect the importance of each local surface region.
arXiv Detail & Related papers (2024-12-11T08:27:33Z)
Hi-SLAM: Scaling-up Semantics in SLAM with a Hierarchically Categorical Gaussian Splatting [28.821276113559346]
Hi-SLAM is a semantic 3D Gaussian Splatting SLAM method featuring a novel hierarchical categorical representation. It enables accurate global 3D semantic mapping, scaling-up capability, and explicit semantic label prediction in the 3D world.
arXiv Detail & Related papers (2024-09-19T07:18:41Z)
HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting [53.6394928681237]
holistic understanding of urban scenes based on RGB images is a challenging yet important problem. Our main idea involves the joint optimization of geometry, appearance, semantics, and motion using a combination of static and dynamic 3D Gaussians. Our approach offers the ability to render new viewpoints in real-time, yielding 2D and 3D semantic information with high accuracy.
arXiv Detail & Related papers (2024-03-19T13:39:05Z)
TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion [64.49276500129092]
TextureDreamer is an image-guided texture synthesis method. It can transfer relightable textures from a small number of input images to target 3D shapes across arbitrary categories.
arXiv Detail & Related papers (2024-01-17T18:55:49Z)
DNS SLAM: Dense Neural Semantic-Informed SLAM [92.39687553022605]
DNS SLAM is a novel neural RGB-D semantic SLAM approach featuring a hybrid representation. Our method integrates multi-view geometry constraints with image-based feature extraction to improve appearance details. Our experimental results achieve state-of-the-art performance on both synthetic data and real-world data tracking.
arXiv Detail & Related papers (2023-11-30T21:34:44Z)
Directional Texture Editing for 3D Models [51.31499400557996]
ITEM3D is designed for automatic textbf3D object editing according to the text textbfInstructions. Leveraging the diffusion models and the differentiable rendering, ITEM3D takes the rendered images as the bridge of text and 3D representation.
arXiv Detail & Related papers (2023-09-26T12:01:13Z)
TANGO: Text-driven Photorealistic and Robust 3D Stylization via Lighting Decomposition [39.312567993736025]
We propose TANGO, which transfers the appearance style of a given 3D shape according to a text prompt in a photorealistic manner. We show that TANGO outperforms existing methods of text-driven 3D style transfer in terms of photorealistic quality, consistency of 3D geometry, and robustness when stylizing low-quality meshes.
arXiv Detail & Related papers (2022-10-20T13:52:18Z)
Semantic Visual Simultaneous Localization and Mapping: A Survey [18.372996585079235]
This paper first reviews the development of semantic vSLAM, explicitly focusing on its strengths and differences. Secondly, we explore three main issues of semantic vSLAM: the extraction and association of semantic information, the application of semantic information, and the advantages of semantic vSLAM. Finally, we discuss future directions that will provide a blueprint for the future development of semantic vSLAM.
arXiv Detail & Related papers (2022-09-14T05:45:26Z)
Zero-Shot Text-Guided Object Generation with Dream Fields [111.06026544180398]
We combine neural rendering with multi-modal image and text representations to synthesize diverse 3D objects. Our method, Dream Fields, can generate the geometry and color of a wide range of objects without 3D supervision. In experiments, Dream Fields produce realistic, multi-view consistent object geometry and color from a variety of natural language captions.
arXiv Detail & Related papers (2021-12-02T17:53:55Z)
SSC: Semantic Scan Context for Large-Scale Place Recognition [13.228580954956342]
We explore the use of high-level features, namely semantics, to improve the representation ability of descriptors. We propose a novel global descriptor, Semantic Scan Context, which explores semantic information to represent scenes more effectively. Our approach outperforms the state-of-the-art methods with a large margin.
arXiv Detail & Related papers (2021-07-01T11:51:19Z)
TediGAN: Text-Guided Diverse Face Image Generation and Manipulation [52.83401421019309]
TediGAN is a framework for multi-modal image generation and manipulation with textual descriptions. StyleGAN inversion module maps real images to the latent space of a well-trained StyleGAN. visual-linguistic similarity learns the text-image matching by mapping the image and text into a common embedding space. instance-level optimization is for identity preservation in manipulation.
arXiv Detail & Related papers (2020-12-06T16:20:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.