MCOO-SLAM: A Multi-Camera Omnidirectional Object SLAM System
- URL: http://arxiv.org/abs/2506.15402v1
- Date: Wed, 18 Jun 2025 12:20:34 GMT
- Title: MCOO-SLAM: A Multi-Camera Omnidirectional Object SLAM System
- Authors: Miaoxin Pan, Jinnan Li, Yaowen Zhang, Yi Yang, Yufeng Yue,
- Abstract summary: We propose MCOO-SLAM, a novel Multi-Camera Omnidirectional Object SLAM system.<n>Our approach integrates point features and object-level landmarks enhanced with open-vocabulary semantics.<n>Extensive experiments in real-world demonstrate that MCOO-SLAM achieves accurate localization and scalable object-level mapping.
- Score: 19.16370123474815
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Object-level SLAM offers structured and semantically meaningful environment representations, making it more interpretable and suitable for high-level robotic tasks. However, most existing approaches rely on RGB-D sensors or monocular views, which suffer from narrow fields of view, occlusion sensitivity, and limited depth perception-especially in large-scale or outdoor environments. These limitations often restrict the system to observing only partial views of objects from limited perspectives, leading to inaccurate object modeling and unreliable data association. In this work, we propose MCOO-SLAM, a novel Multi-Camera Omnidirectional Object SLAM system that fully leverages surround-view camera configurations to achieve robust, consistent, and semantically enriched mapping in complex outdoor scenarios. Our approach integrates point features and object-level landmarks enhanced with open-vocabulary semantics. A semantic-geometric-temporal fusion strategy is introduced for robust object association across multiple views, leading to improved consistency and accurate object modeling, and an omnidirectional loop closure module is designed to enable viewpoint-invariant place recognition using scene-level descriptors. Furthermore, the constructed map is abstracted into a hierarchical 3D scene graph to support downstream reasoning tasks. Extensive experiments in real-world demonstrate that MCOO-SLAM achieves accurate localization and scalable object-level mapping with improved robustness to occlusion, pose variation, and environmental complexity.
Related papers
- Agentic 3D Scene Generation with Spatially Contextualized VLMs [67.31920821192323]
We introduce a new paradigm that enables vision-language models to generate, understand, and edit complex 3D environments.<n>We develop an agentic 3D scene generation pipeline in which the VLM iteratively reads from and updates the spatial context.<n>Results show that our framework can handle diverse and challenging inputs, achieving a level of generalization not observed in prior work.
arXiv Detail & Related papers (2025-05-26T15:28:17Z) - IAAO: Interactive Affordance Learning for Articulated Objects in 3D Environments [56.85804719947]
We present IAAO, a framework that builds an explicit 3D model for intelligent agents to gain understanding of articulated objects in their environment through interaction.<n>We first build hierarchical features and label fields for each object state using 3D Gaussian Splatting (3DGS) by distilling mask features and view-consistent labels from multi-view images.<n>We then perform object- and part-level queries on the 3D Gaussian primitives to identify static and articulated elements, estimating global transformations and local articulation parameters along with affordances.
arXiv Detail & Related papers (2025-04-09T12:36:48Z) - MObI: Multimodal Object Inpainting Using Diffusion Models [52.07640413626605]
This paper introduces MObI, a novel framework for Multimodal Object Inpainting.<n>Using a single reference RGB image, MObI enables objects to be seamlessly inserted into existing multimodal scenes.<n>Unlike traditional inpainting methods that rely solely on edit masks, our 3D bounding box conditioning gives objects accurate spatial positioning and realistic scaling.
arXiv Detail & Related papers (2025-01-06T17:43:26Z) - Go-SLAM: Grounded Object Segmentation and Localization with Gaussian Splatting SLAM [12.934788858420752]
Go-SLAM is a novel framework that utilizes 3D Gaussian Splatting SLAM to reconstruct dynamic environments.
Our system facilitates open-vocabulary querying, allowing users to locate objects using natural language descriptions.
arXiv Detail & Related papers (2024-09-25T13:56:08Z) - LOSS-SLAM: Lightweight Open-Set Semantic Simultaneous Localization and Mapping [9.289001828243512]
We show that a system of identifying, localizing, and encoding objects is tightly coupled with probabilistic graphical models for performing open-set semantic simultaneous localization and mapping (SLAM)
Results are presented demonstrating that the proposed lightweight object encoding can be used to perform more accurate object-based SLAM than existing open-set methods.
arXiv Detail & Related papers (2024-04-05T19:42:55Z) - ZoomNeXt: A Unified Collaborative Pyramid Network for Camouflaged Object Detection [70.11264880907652]
Recent object (COD) attempts to segment objects visually blended into their surroundings, which is extremely complex and difficult in real-world scenarios.
We propose an effective unified collaborative pyramid network that mimics human behavior when observing vague images and camouflaged zooming in and out.
Our framework consistently outperforms existing state-of-the-art methods in image and video COD benchmarks.
arXiv Detail & Related papers (2023-10-31T06:11:23Z) - NeuSE: Neural SE(3)-Equivariant Embedding for Consistent Spatial
Understanding with Objects [53.111397800478294]
We present NeuSE, a novel Neural SE(3)-Equivariant Embedding for objects.
NeuSE serves as a compact point cloud surrogate for complete object models.
Our proposed SLAM paradigm, using NeuSE for object shape and pose characterization, can operate independently or in conjunction with typical SLAM systems.
arXiv Detail & Related papers (2023-03-13T17:30:43Z) - Visual-Inertial Multi-Instance Dynamic SLAM with Object-level
Relocalisation [14.302118093865849]
We present a tightly-coupled visual-inertial object-level multi-instance dynamic SLAM system.
It can robustly optimise for the camera pose, velocity, IMU biases and build a dense 3D reconstruction object-level map of the environment.
arXiv Detail & Related papers (2022-08-08T17:13:24Z) - SO-SLAM: Semantic Object SLAM with Scale Proportional and Symmetrical
Texture Constraints [9.694083816665525]
This paper proposes a novel monocular Semantic Object SLAM (SO-SLAM) system that addresses the introduction of object spatial constraints.
We have verified the performance of the algorithm on the public datasets and an author-recorded mobile robot dataset.
arXiv Detail & Related papers (2021-09-10T13:55:37Z) - Compositional Scalable Object SLAM [29.349829139625403]
We present a fast, scalable, and accurate Simultaneous Localization and Mapping (SLAM) system that represents indoor scenes as a graph of objects.
We show that a compositional scalable object mapping formulation is amenable to a robust SLAM solution for drift-free large scale indoor reconstruction.
arXiv Detail & Related papers (2020-11-05T04:46:25Z) - OmniSLAM: Omnidirectional Localization and Dense Mapping for
Wide-baseline Multi-camera Systems [88.41004332322788]
We present an omnidirectional localization and dense mapping system for a wide-baseline multiview stereo setup with ultra-wide field-of-view (FOV) fisheye cameras.
For more practical and accurate reconstruction, we first introduce improved and light-weighted deep neural networks for the omnidirectional depth estimation.
We integrate our omnidirectional depth estimates into the visual odometry (VO) and add a loop closing module for global consistency.
arXiv Detail & Related papers (2020-03-18T05:52:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.