Artifacts Mapping: Multi-Modal Semantic Mapping for Object Detection and
3D Localization
- URL: http://arxiv.org/abs/2307.01121v2
- Date: Tue, 21 Nov 2023 21:04:24 GMT
- Title: Artifacts Mapping: Multi-Modal Semantic Mapping for Object Detection and
3D Localization
- Authors: Federico Rollo, Gennaro Raiola, Andrea Zunino, Nikolaos Tsagarakis,
Arash Ajoudani
- Abstract summary: We propose a framework that can autonomously detect and localize objects in a known environment.
The framework consists of three key elements: understanding the environment through RGB data, estimating depth through multi-modal sensor fusion, and managing artifacts.
Experiments show that the proposed framework can accurately detect 98% of the objects in the real sample environment, without post-processing.
- Score: 13.473742114288616
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Geometric navigation is nowadays a well-established field of robotics and the
research focus is shifting towards higher-level scene understanding, such as
Semantic Mapping. When a robot needs to interact with its environment, it must
be able to comprehend the contextual information of its surroundings. This work
focuses on classifying and localising objects within a map, which is under
construction (SLAM) or already built. To further explore this direction, we
propose a framework that can autonomously detect and localize predefined
objects in a known environment using a multi-modal sensor fusion approach
(combining RGB and depth data from an RGB-D camera and a lidar). The framework
consists of three key elements: understanding the environment through RGB data,
estimating depth through multi-modal sensor fusion, and managing artifacts
(i.e., filtering and stabilizing measurements). The experiments show that the
proposed framework can accurately detect 98% of the objects in the real sample
environment, without post-processing, while 85% and 80% of the objects were
mapped using the single RGBD camera or RGB + lidar setup respectively. The
comparison with single-sensor (camera or lidar) experiments is performed to
show that sensor fusion allows the robot to accurately detect near and far
obstacles, which would have been noisy or imprecise in a purely visual or
laser-based approach.
Related papers
- FusionVision: A comprehensive approach of 3D object reconstruction and segmentation from RGB-D cameras using YOLO and fast segment anything [1.5728609542259502]
This paper introduces FusionVision, an exhaustive pipeline adapted for the robust 3D segmentation of objects in RGB-D imagery.
The proposed FusionVision pipeline employs YOLO for identifying objects within the RGB image domain.
The synergy between these components and their integration into 3D scene understanding ensures a cohesive fusion of object detection and segmentation.
arXiv Detail & Related papers (2024-02-29T22:59:27Z) - Anyview: Generalizable Indoor 3D Object Detection with Variable Frames [63.51422844333147]
We present a novel 3D detection framework named AnyView for our practical applications.
Our method achieves both great generalizability and high detection accuracy with a simple and clean architecture.
arXiv Detail & Related papers (2023-10-09T02:15:45Z) - Multi-Modal 3D Object Detection by Box Matching [109.43430123791684]
We propose a novel Fusion network by Box Matching (FBMNet) for multi-modal 3D detection.
With the learned assignments between 3D and 2D object proposals, the fusion for detection can be effectively performed by combing their ROI features.
arXiv Detail & Related papers (2023-05-12T18:08:51Z) - Neural Implicit Dense Semantic SLAM [83.04331351572277]
We propose a novel RGBD vSLAM algorithm that learns a memory-efficient, dense 3D geometry, and semantic segmentation of an indoor scene in an online manner.
Our pipeline combines classical 3D vision-based tracking and loop closing with neural fields-based mapping.
Our proposed algorithm can greatly enhance scene perception and assist with a range of robot control problems.
arXiv Detail & Related papers (2023-04-27T23:03:52Z) - MVTrans: Multi-View Perception of Transparent Objects [29.851395075937255]
We forgo the unreliable depth map from RGB-D sensors and extend the stereo based method.
Our proposed method, MVTrans, is an end-to-end multi-view architecture with multiple perception capabilities.
We establish a novel procedural photo-realistic dataset generation pipeline and create a large-scale transparent object detection dataset.
arXiv Detail & Related papers (2023-02-22T22:45:28Z) - Semi-Perspective Decoupled Heatmaps for 3D Robot Pose Estimation from
Depth Maps [66.24554680709417]
Knowing the exact 3D location of workers and robots in a collaborative environment enables several real applications.
We propose a non-invasive framework based on depth devices and deep neural networks to estimate the 3D pose of robots from an external camera.
arXiv Detail & Related papers (2022-07-06T08:52:12Z) - Progressive Coordinate Transforms for Monocular 3D Object Detection [52.00071336733109]
We propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
In this paper, we propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
arXiv Detail & Related papers (2021-08-12T15:22:33Z) - EagerMOT: 3D Multi-Object Tracking via Sensor Fusion [68.8204255655161]
Multi-object tracking (MOT) enables mobile robots to perform well-informed motion planning and navigation by localizing surrounding objects in 3D space and time.
Existing methods rely on depth sensors (e.g., LiDAR) to detect and track targets in 3D space, but only up to a limited sensing range due to the sparsity of the signal.
We propose EagerMOT, a simple tracking formulation that integrates all available object observations from both sensor modalities to obtain a well-informed interpretation of the scene dynamics.
arXiv Detail & Related papers (2021-04-29T22:30:29Z) - Camera-Lidar Integration: Probabilistic sensor fusion for semantic
mapping [8.18198392834469]
An automated vehicle must be able to perceive and recognise object/obstacles in a three-dimensional world while navigating in a constantly changing environment.
We present a probabilistic pipeline that incorporates uncertainties from the sensor readings (cameras, lidar, IMU and wheel encoders), compensation for the motion of the vehicle, and label probabilities for the semantic images.
arXiv Detail & Related papers (2020-07-09T07:59:39Z) - RGB-D Odometry and SLAM [20.02647320786556]
RGB-D sensors are low-cost, low-power and low-size alternatives to traditional range sensors such as LiDAR.
Unlike RGB cameras, RGB-D sensors provide the additional depth information that removes the need of frame-by-frame triangulation for 3D scene reconstruction.
This chapter consists of three main parts: In the first part, we introduce the basic concept of odometry and SLAM and motivate the use of RGB-D sensors.
In the second part, we detail the three main components of SLAM systems: camera pose tracking, scene mapping and loop closing.
arXiv Detail & Related papers (2020-01-19T17:56:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.