RGB-D Odometry and SLAM
- URL: http://arxiv.org/abs/2001.06875v1
- Date: Sun, 19 Jan 2020 17:56:11 GMT
- Title: RGB-D Odometry and SLAM
- Authors: Javier Civera and Seong Hun Lee
- Abstract summary: RGB-D sensors are low-cost, low-power and low-size alternatives to traditional range sensors such as LiDAR.
Unlike RGB cameras, RGB-D sensors provide the additional depth information that removes the need of frame-by-frame triangulation for 3D scene reconstruction.
This chapter consists of three main parts: In the first part, we introduce the basic concept of odometry and SLAM and motivate the use of RGB-D sensors.
In the second part, we detail the three main components of SLAM systems: camera pose tracking, scene mapping and loop closing.
- Score: 20.02647320786556
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The emergence of modern RGB-D sensors had a significant impact in many
application fields, including robotics, augmented reality (AR) and 3D scanning.
They are low-cost, low-power and low-size alternatives to traditional range
sensors such as LiDAR. Moreover, unlike RGB cameras, RGB-D sensors provide the
additional depth information that removes the need of frame-by-frame
triangulation for 3D scene reconstruction. These merits have made them very
popular in mobile robotics and AR, where it is of great interest to estimate
ego-motion and 3D scene structure. Such spatial understanding can enable robots
to navigate autonomously without collisions and allow users to insert virtual
entities consistent with the image stream. In this chapter, we review common
formulations of odometry and Simultaneous Localization and Mapping (known by
its acronym SLAM) using RGB-D stream input. The two topics are closely related,
as the former aims to track the incremental camera motion with respect to a
local map of the scene, and the latter to jointly estimate the camera
trajectory and the global map with consistency. In both cases, the standard
approaches minimize a cost function using nonlinear optimization techniques.
This chapter consists of three main parts: In the first part, we introduce the
basic concept of odometry and SLAM and motivate the use of RGB-D sensors. We
also give mathematical preliminaries relevant to most odometry and SLAM
algorithms. In the second part, we detail the three main components of SLAM
systems: camera pose tracking, scene mapping and loop closing. For each
component, we describe different approaches proposed in the literature. In the
final part, we provide a brief discussion on advanced research topics with the
references to the state-of-the-art.
Related papers
- MM3DGS SLAM: Multi-modal 3D Gaussian Splatting for SLAM Using Vision, Depth, and Inertial Measurements [59.70107451308687]
We show for the first time that using 3D Gaussians for map representation with unposed camera images and inertial measurements can enable accurate SLAM.
Our method, MM3DGS, addresses the limitations of prior rendering by enabling faster scale awareness, and improved trajectory tracking.
We also release a multi-modal dataset, UT-MM, collected from a mobile robot equipped with a camera and an inertial measurement unit.
arXiv Detail & Related papers (2024-04-01T04:57:41Z) - MatchU: Matching Unseen Objects for 6D Pose Estimation from RGB-D Images [57.71600854525037]
We propose a Fuse-Describe-Match strategy for 6D pose estimation from RGB-D images.
MatchU is a generic approach that fuses 2D texture and 3D geometric cues for 6D pose prediction of unseen objects.
arXiv Detail & Related papers (2024-03-03T14:01:03Z) - Anyview: Generalizable Indoor 3D Object Detection with Variable Frames [63.51422844333147]
We present a novel 3D detection framework named AnyView for our practical applications.
Our method achieves both great generalizability and high detection accuracy with a simple and clean architecture.
arXiv Detail & Related papers (2023-10-09T02:15:45Z) - RGB-based Category-level Object Pose Estimation via Decoupled Metric
Scale Recovery [72.13154206106259]
We propose a novel pipeline that decouples the 6D pose and size estimation to mitigate the influence of imperfect scales on rigid transformations.
Specifically, we leverage a pre-trained monocular estimator to extract local geometric information.
A separate branch is designed to directly recover the metric scale of the object based on category-level statistics.
arXiv Detail & Related papers (2023-09-19T02:20:26Z) - Artifacts Mapping: Multi-Modal Semantic Mapping for Object Detection and
3D Localization [13.473742114288616]
We propose a framework that can autonomously detect and localize objects in a known environment.
The framework consists of three key elements: understanding the environment through RGB data, estimating depth through multi-modal sensor fusion, and managing artifacts.
Experiments show that the proposed framework can accurately detect 98% of the objects in the real sample environment, without post-processing.
arXiv Detail & Related papers (2023-07-03T15:51:39Z) - Neural Implicit Dense Semantic SLAM [83.04331351572277]
We propose a novel RGBD vSLAM algorithm that learns a memory-efficient, dense 3D geometry, and semantic segmentation of an indoor scene in an online manner.
Our pipeline combines classical 3D vision-based tracking and loop closing with neural fields-based mapping.
Our proposed algorithm can greatly enhance scene perception and assist with a range of robot control problems.
arXiv Detail & Related papers (2023-04-27T23:03:52Z) - ODAM: Object Detection, Association, and Mapping using Posed RGB Video [36.16010611723447]
We present ODAM, a system for 3D Object Detection, Association, and Mapping using posed RGB videos.
The proposed system relies on a deep learning front-end to detect 3D objects from a given RGB frame and associate them to a global object-based map using a graph neural network (GNN)
arXiv Detail & Related papers (2021-08-23T13:28:10Z) - RGB2Hands: Real-Time Tracking of 3D Hand Interactions from Monocular RGB
Video [76.86512780916827]
We present the first real-time method for motion capture of skeletal pose and 3D surface geometry of hands from a single RGB camera.
In order to address the inherent depth ambiguities in RGB data, we propose a novel multi-task CNN.
We experimentally verify the individual components of our RGB two-hand tracking and 3D reconstruction pipeline.
arXiv Detail & Related papers (2021-06-22T12:53:56Z) - RGB-D-E: Event Camera Calibration for Fast 6-DOF Object Tracking [16.06615504110132]
We propose to use an event-based camera to increase the speed of 3D object tracking in 6 degrees of freedom.
This application requires handling very high object speed to convey compelling AR experiences.
We develop a deep learning approach, which combines an existing RGB-D network along with a novel event-based network in a cascade fashion.
arXiv Detail & Related papers (2020-06-09T01:55:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.