Related papers: Artifacts Mapping: Multi-Modal Semantic Mapping for Object Detection and 3D Localization

Artifacts Mapping: Multi-Modal Semantic Mapping for Object Detection and 3D Localization

URL: http://arxiv.org/abs/2307.01121v2
Date: Tue, 21 Nov 2023 21:04:24 GMT
Title: Artifacts Mapping: Multi-Modal Semantic Mapping for Object Detection and 3D Localization
Authors: Federico Rollo, Gennaro Raiola, Andrea Zunino, Nikolaos Tsagarakis, Arash Ajoudani
Abstract summary: We propose a framework that can autonomously detect and localize objects in a known environment. The framework consists of three key elements: understanding the environment through RGB data, estimating depth through multi-modal sensor fusion, and managing artifacts. Experiments show that the proposed framework can accurately detect 98% of the objects in the real sample environment, without post-processing.
Score: 13.473742114288616
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Geometric navigation is nowadays a well-established field of robotics and the research focus is shifting towards higher-level scene understanding, such as Semantic Mapping. When a robot needs to interact with its environment, it must be able to comprehend the contextual information of its surroundings. This work focuses on classifying and localising objects within a map, which is under construction (SLAM) or already built. To further explore this direction, we propose a framework that can autonomously detect and localize predefined objects in a known environment using a multi-modal sensor fusion approach (combining RGB and depth data from an RGB-D camera and a lidar). The framework consists of three key elements: understanding the environment through RGB data, estimating depth through multi-modal sensor fusion, and managing artifacts (i.e., filtering and stabilizing measurements). The experiments show that the proposed framework can accurately detect 98% of the objects in the real sample environment, without post-processing, while 85% and 80% of the objects were mapped using the single RGBD camera or RGB + lidar setup respectively. The comparison with single-sensor (camera or lidar) experiments is performed to show that sensor fusion allows the robot to accurately detect near and far obstacles, which would have been noisy or imprecise in a purely visual or laser-based approach.

Related papers

PickScan: Object discovery and reconstruction from handheld interactions [99.99566882133179]
We develop an interaction-guided and class-agnostic method to reconstruct 3D representations of scenes. Our main contribution is a novel approach to detecting user-object interactions and extracting the masks of manipulated objects. Compared to Co-Fusion, the only comparable interaction-based and class-agnostic baseline, this corresponds to a reduction in chamfer distance of 73%.
arXiv Detail & Related papers (2024-11-17T23:09:08Z)
FusionVision: A comprehensive approach of 3D object reconstruction and segmentation from RGB-D cameras using YOLO and fast segment anything [1.5728609542259502]
This paper introduces FusionVision, an exhaustive pipeline adapted for the robust 3D segmentation of objects in RGB-D imagery. The proposed FusionVision pipeline employs YOLO for identifying objects within the RGB image domain. The synergy between these components and their integration into 3D scene understanding ensures a cohesive fusion of object detection and segmentation.
arXiv Detail & Related papers (2024-02-29T22:59:27Z)
Anyview: Generalizable Indoor 3D Object Detection with Variable Frames [63.51422844333147]
We present a novel 3D detection framework named AnyView for our practical applications. Our method achieves both great generalizability and high detection accuracy with a simple and clean architecture.
arXiv Detail & Related papers (2023-10-09T02:15:45Z)
Multi-Modal 3D Object Detection by Box Matching [109.43430123791684]
We propose a novel Fusion network by Box Matching (FBMNet) for multi-modal 3D detection. With the learned assignments between 3D and 2D object proposals, the fusion for detection can be effectively performed by combing their ROI features.
arXiv Detail & Related papers (2023-05-12T18:08:51Z)
Neural Implicit Dense Semantic SLAM [83.04331351572277]
We propose a novel RGBD vSLAM algorithm that learns a memory-efficient, dense 3D geometry, and semantic segmentation of an indoor scene in an online manner. Our pipeline combines classical 3D vision-based tracking and loop closing with neural fields-based mapping. Our proposed algorithm can greatly enhance scene perception and assist with a range of robot control problems.
arXiv Detail & Related papers (2023-04-27T23:03:52Z)
MVTrans: Multi-View Perception of Transparent Objects [29.851395075937255]
We forgo the unreliable depth map from RGB-D sensors and extend the stereo based method. Our proposed method, MVTrans, is an end-to-end multi-view architecture with multiple perception capabilities. We establish a novel procedural photo-realistic dataset generation pipeline and create a large-scale transparent object detection dataset.
arXiv Detail & Related papers (2023-02-22T22:45:28Z)
Semi-Perspective Decoupled Heatmaps for 3D Robot Pose Estimation from Depth Maps [66.24554680709417]
Knowing the exact 3D location of workers and robots in a collaborative environment enables several real applications. We propose a non-invasive framework based on depth devices and deep neural networks to estimate the 3D pose of robots from an external camera.
arXiv Detail & Related papers (2022-07-06T08:52:12Z)
EagerMOT: 3D Multi-Object Tracking via Sensor Fusion [68.8204255655161]
Multi-object tracking (MOT) enables mobile robots to perform well-informed motion planning and navigation by localizing surrounding objects in 3D space and time. Existing methods rely on depth sensors (e.g., LiDAR) to detect and track targets in 3D space, but only up to a limited sensing range due to the sparsity of the signal. We propose EagerMOT, a simple tracking formulation that integrates all available object observations from both sensor modalities to obtain a well-informed interpretation of the scene dynamics.
arXiv Detail & Related papers (2021-04-29T22:30:29Z)
Camera-Lidar Integration: Probabilistic sensor fusion for semantic mapping [8.18198392834469]
An automated vehicle must be able to perceive and recognise object/obstacles in a three-dimensional world while navigating in a constantly changing environment. We present a probabilistic pipeline that incorporates uncertainties from the sensor readings (cameras, lidar, IMU and wheel encoders), compensation for the motion of the vehicle, and label probabilities for the semantic images.
arXiv Detail & Related papers (2020-07-09T07:59:39Z)
RGB-D Odometry and SLAM [20.02647320786556]
RGB-D sensors are low-cost, low-power and low-size alternatives to traditional range sensors such as LiDAR. Unlike RGB cameras, RGB-D sensors provide the additional depth information that removes the need of frame-by-frame triangulation for 3D scene reconstruction. This chapter consists of three main parts: In the first part, we introduce the basic concept of odometry and SLAM and motivate the use of RGB-D sensors. In the second part, we detail the three main components of SLAM systems: camera pose tracking, scene mapping and loop closing.
arXiv Detail & Related papers (2020-01-19T17:56:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.