Related papers: OSMa-Bench: Evaluating Open Semantic Mapping Under Varying Lighting Conditions

OSMa-Bench: Evaluating Open Semantic Mapping Under Varying Lighting Conditions

URL: http://arxiv.org/abs/2503.10331v1
Date: Thu, 13 Mar 2025 13:07:51 GMT
Title: OSMa-Bench: Evaluating Open Semantic Mapping Under Varying Lighting Conditions
Authors: Maxim Popov, Regina Kurkova, Mikhail Iumanov, Jaafar Mahmoud, Sergey Kolyubin,
Abstract summary: Open Semantic Mapping (OSM) is a key technology in robotic perception, combining semantic segmentation and SLAM techniques.<n>The study focuses on evaluating state-of-the-art semantic mapping algorithms under varying indoor lighting conditions.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Open Semantic Mapping (OSM) is a key technology in robotic perception, combining semantic segmentation and SLAM techniques. This paper introduces a dynamically configurable and highly automated LLM/LVLM-powered pipeline for evaluating OSM solutions called OSMa-Bench (Open Semantic Mapping Benchmark). The study focuses on evaluating state-of-the-art semantic mapping algorithms under varying indoor lighting conditions, a critical challenge in indoor environments. We introduce a novel dataset with simulated RGB-D sequences and ground truth 3D reconstructions, facilitating the rigorous analysis of mapping performance across different lighting conditions. Through experiments on leading models such as ConceptGraphs, BBQ and OpenScene, we evaluate the semantic fidelity of object recognition and segmentation. Additionally, we introduce a Scene Graph evaluation method to analyze the ability of models to interpret semantic structure. The results provide insights into the robustness of these models, forming future research directions for developing resilient and adaptable robotic systems. Our code is available at https://be2rlab.github.io/OSMa-Bench/.

Related papers

Topology-Aware Modeling for Unsupervised Simulation-to-Reality Point Cloud Recognition [63.55828203989405]
We introduce a novel Topology-Aware Modeling (TAM) framework for Sim2Real UDA on object point clouds.<n>Our approach mitigates the domain gap by leveraging global spatial topology, characterized by low-level, high-frequency 3D structures.<n>We propose an advanced self-training strategy that combines cross-domain contrastive learning with self-training.
arXiv Detail & Related papers (2025-06-26T11:53:59Z)
OSMLoc: Single Image-Based Visual Localization in OpenStreetMap with Geometric and Semantic Guidances [11.085165252259042]
OSMLoc is a brain-inspired single-image visual localization method with semantic and geometric guidance to improve accuracy, robustness, and generalization ability. To validate the proposed OSMLoc, we collect a worldwide cross-area and cross-condition (CC) benchmark for extensive evaluation.
arXiv Detail & Related papers (2024-11-13T14:59:00Z)
Object-Oriented Material Classification and 3D Clustering for Improved Semantic Perception and Mapping in Mobile Robots [6.395242048226456]
We propose a complement-aware deep learning approach for RGB-D-based material classification built on top of an object-oriented pipeline. We show a significant improvement in material classification and 3D clustering accuracy compared to state-of-the-art approaches for 3D semantic scene mapping.
arXiv Detail & Related papers (2024-07-08T16:25:01Z)
Mixed Diffusion for 3D Indoor Scene Synthesis [55.94569112629208]
We present MiDiffusion, a novel mixed discrete-continuous diffusion model designed to synthesize plausible 3D indoor scenes.<n>We show it outperforms state-of-the-art autoregressive and diffusion models in floor-conditioned 3D scene synthesis.
arXiv Detail & Related papers (2024-05-31T17:54:52Z)
MOKA: Open-World Robotic Manipulation through Mark-Based Visual Prompting [97.52388851329667]
We introduce Marking Open-world Keypoint Affordances (MOKA) to solve robotic manipulation tasks specified by free-form language instructions. Central to our approach is a compact point-based representation of affordance, which bridges the VLM's predictions on observed images and the robot's actions in the physical world. We evaluate and analyze MOKA's performance on various table-top manipulation tasks including tool use, deformable body manipulation, and object rearrangement.
arXiv Detail & Related papers (2024-03-05T18:08:45Z)
Language-EXtended Indoor SLAM (LEXIS): A Versatile System for Real-time Visual Scene Understanding [0.0]
LEXIS is a real-time indoor Simultaneous Localization and Mapping system. It harnesses the open-vocabulary nature of Large Language Models to create a unified approach to scene understanding and place recognition. It successfully categorizes rooms with varying layouts and dimensions and outperforms the state-of-the-art (SOTA)
arXiv Detail & Related papers (2023-09-26T16:50:20Z)
Using Detection, Tracking and Prediction in Visual SLAM to Achieve Real-time Semantic Mapping of Dynamic Scenarios [70.70421502784598]
RDS-SLAM can build semantic maps at object level for dynamic scenarios in real time using only one commonly used Intel Core i7 CPU. We evaluate RDS-SLAM in TUM RGB-D dataset, and experimental results show that RDS-SLAM can run with 30.3 ms per frame in dynamic scenarios.
arXiv Detail & Related papers (2022-10-10T11:03:32Z)
Learning Invariant World State Representations with Predictive Coding [1.8963850600275547]
We develop a new predictive coding-based architecture and a hybrid fully-supervised/self-supervised learning method. We evaluate the robustness of our model on a new synthetic dataset.
arXiv Detail & Related papers (2022-07-06T21:08:30Z)
Multitask AET with Orthogonal Tangent Regularity for Dark Object Detection [84.52197307286681]
We propose a novel multitask auto encoding transformation (MAET) model to enhance object detection in a dark environment. In a self-supervision manner, the MAET learns the intrinsic visual structure by encoding and decoding the realistic illumination-degrading transformation. We have achieved the state-of-the-art performance using synthetic and real-world datasets.
arXiv Detail & Related papers (2022-05-06T16:27:14Z)
A Probabilistic Framework for Dynamic Object Recognition in 3D Environment With A Novel Continuous Ground Estimation Method [0.0]
A probabilistic framework is developed and proposed for Dynamic Object Recognition in 3D Environments. A novel Gaussian Process Regression (GPR) based method is developed to detect ground points in different urban scenarios.
arXiv Detail & Related papers (2022-01-27T16:07:10Z)
Multi-Branch Deep Radial Basis Function Networks for Facial Emotion Recognition [80.35852245488043]
We propose a CNN based architecture enhanced with multiple branches formed by radial basis function (RBF) units. RBF units capture local patterns shared by similar instances using an intermediate representation. We show it is the incorporation of local information what makes the proposed model competitive.
arXiv Detail & Related papers (2021-09-07T21:05:56Z)
Learning Monocular Depth in Dynamic Scenes via Instance-Aware Projection Consistency [114.02182755620784]
We present an end-to-end joint training framework that explicitly models 6-DoF motion of multiple dynamic objects, ego-motion and depth in a monocular camera setup without supervision. Our framework is shown to outperform the state-of-the-art depth and motion estimation methods.
arXiv Detail & Related papers (2021-02-04T14:26:42Z)
A User's Guide to Calibrating Robotics Simulators [54.85241102329546]
This paper proposes a set of benchmarks and a framework for the study of various algorithms aimed to transfer models and policies learnt in simulation to the real world. We conduct experiments on a wide range of well known simulated environments to characterize and offer insights into the performance of different algorithms. Our analysis can be useful for practitioners working in this area and can help make informed choices about the behavior and main properties of sim-to-real algorithms.
arXiv Detail & Related papers (2020-11-17T22:24:26Z)
Extending Maps with Semantic and Contextual Object Information for Robot Navigation: a Learning-Based Framework using Visual and Depth Cues [12.984393386954219]
This paper addresses the problem of building augmented metric representations of scenes with semantic information from RGB-D images. We propose a complete framework to create an enhanced map representation of the environment with object-level information.
arXiv Detail & Related papers (2020-03-13T15:05:23Z)
Image Matching across Wide Baselines: From Paper to Practice [80.9424750998559]
We introduce a comprehensive benchmark for local features and robust estimation algorithms. Our pipeline's modular structure allows easy integration, configuration, and combination of different methods. We show that with proper settings, classical solutions may still outperform the perceived state of the art.
arXiv Detail & Related papers (2020-03-03T15:20:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.