DVIO: Depth aided visual inertial odometry for RGBD sensors
- URL: http://arxiv.org/abs/2110.10805v1
- Date: Wed, 20 Oct 2021 22:12:01 GMT
- Title: DVIO: Depth aided visual inertial odometry for RGBD sensors
- Authors: Abhishek Tyagi, Yangwen Liang, Shuangquan Wang, Dongwoon Bai
- Abstract summary: This paper presents a new visual inertial odometry (VIO) system, which uses measurements from a RGBD sensor and an inertial measurement unit (IMU) sensor for estimating the motion state of the mobile device.
The resulting system is called the depth-aided VIO (DVIO) system.
- Score: 7.745106319694523
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In past few years we have observed an increase in the usage of RGBD sensors
in mobile devices. These sensors provide a good estimate of the depth map for
the camera frame, which can be used in numerous augmented reality applications.
This paper presents a new visual inertial odometry (VIO) system, which uses
measurements from a RGBD sensor and an inertial measurement unit (IMU) sensor
for estimating the motion state of the mobile device. The resulting system is
called the depth-aided VIO (DVIO) system. In this system we add the depth
measurement as part of the nonlinear optimization process. Specifically, we
propose methods to use the depth measurement using one-dimensional (1D) feature
parameterization as well as three-dimensional (3D) feature parameterization. In
addition, we propose to utilize the depth measurement for estimating time
offset between the unsynchronized IMU and the RGBD sensors. Last but not least,
we propose a novel block-based marginalization approach to speed up the
marginalization processes and maintain the real-time performance of the overall
system. Experimental results validate that the proposed DVIO system outperforms
the other state-of-the-art VIO systems in terms of trajectory accuracy as well
as processing time.
Related papers
- Camera Motion Estimation from RGB-D-Inertial Scene Flow [9.192660643226372]
We introduce a novel formulation for camera motion estimation that integrates RGB-D images and inertial data through scene flow.
Our goal is to accurately estimate the camera motion in a rigid 3D environment, along with the state of the inertial measurement unit (IMU)
arXiv Detail & Related papers (2024-04-26T08:42:59Z) - MM3DGS SLAM: Multi-modal 3D Gaussian Splatting for SLAM Using Vision, Depth, and Inertial Measurements [59.70107451308687]
We show for the first time that using 3D Gaussians for map representation with unposed camera images and inertial measurements can enable accurate SLAM.
Our method, MM3DGS, addresses the limitations of prior rendering by enabling faster scale awareness, and improved trajectory tracking.
We also release a multi-modal dataset, UT-MM, collected from a mobile robot equipped with a camera and an inertial measurement unit.
arXiv Detail & Related papers (2024-04-01T04:57:41Z) - Multi-Modal Neural Radiance Field for Monocular Dense SLAM with a
Light-Weight ToF Sensor [58.305341034419136]
We present the first dense SLAM system with a monocular camera and a light-weight ToF sensor.
We propose a multi-modal implicit scene representation that supports rendering both the signals from the RGB camera and light-weight ToF sensor.
Experiments demonstrate that our system well exploits the signals of light-weight ToF sensors and achieves competitive results.
arXiv Detail & Related papers (2023-08-28T07:56:13Z) - On the Importance of Accurate Geometry Data for Dense 3D Vision Tasks [61.74608497496841]
Training on inaccurate or corrupt data induces model bias and hampers generalisation capabilities.
This paper investigates the effect of sensor errors for the dense 3D vision tasks of depth estimation and reconstruction.
arXiv Detail & Related papers (2023-03-26T22:32:44Z) - DELTAR: Depth Estimation from a Light-weight ToF Sensor and RGB Image [39.389538555506256]
We propose DELTAR, a novel method to empower light-weight ToF sensors with the capability of measuring high resolution and accurate depth.
As the core of DELTAR, a feature extractor customized for depth distribution and an attention-based neural architecture is proposed to fuse the information from the color and ToF domain efficiently.
Experiments show that our method produces more accurate depth than existing frameworks designed for depth completion and depth super-resolution and achieves on par performance with a commodity-level RGB-D sensor.
arXiv Detail & Related papers (2022-09-27T13:11:37Z) - Depth Estimation Matters Most: Improving Per-Object Depth Estimation for
Monocular 3D Detection and Tracking [47.59619420444781]
Approaches to monocular 3D perception including detection and tracking often yield inferior performance when compared to LiDAR-based techniques.
We propose a multi-level fusion method that combines different representations (RGB and pseudo-LiDAR) and temporal information across multiple frames for objects (tracklets) to enhance per-object depth estimation.
arXiv Detail & Related papers (2022-06-08T03:37:59Z) - Learning Online Multi-Sensor Depth Fusion [100.84519175539378]
SenFuNet is a depth fusion approach that learns sensor-specific noise and outlier statistics.
We conduct experiments with various sensor combinations on the real-world CoRBS and Scene3D datasets.
arXiv Detail & Related papers (2022-04-07T10:45:32Z) - Joint Learning of Salient Object Detection, Depth Estimation and Contour
Extraction [91.43066633305662]
We propose a novel multi-task and multi-modal filtered transformer (MMFT) network for RGB-D salient object detection (SOD)
Specifically, we unify three complementary tasks: depth estimation, salient object detection and contour estimation. The multi-task mechanism promotes the model to learn the task-aware features from the auxiliary tasks.
Experiments show that it not only significantly surpasses the depth-based RGB-D SOD methods on multiple datasets, but also precisely predicts a high-quality depth map and salient contour at the same time.
arXiv Detail & Related papers (2022-03-09T17:20:18Z) - Incremental learning of LSTM framework for sensor fusion in attitude
estimation [2.064612766965483]
This paper presents a novel method for attitude estimation of an object in 3D space by incremental learning of the Long-Short Term Memory (LSTM) network.
Inertial sensors data are fed to the LSTM network which are then updated incrementally to incorporate the dynamic changes in motion occurring in the run time.
The proposed framework offers a significant improvement in the results compared to the traditional method, even in the case of a highly dynamic environment.
arXiv Detail & Related papers (2021-08-04T09:03:53Z) - D3VO: Deep Depth, Deep Pose and Deep Uncertainty for Monocular Visual
Odometry [57.5549733585324]
D3VO is a novel framework for monocular visual odometry that exploits deep networks on three levels -- deep depth, pose and uncertainty estimation.
We first propose a novel self-supervised monocular depth estimation network trained on stereo videos without any external supervision.
We model the photometric uncertainties of pixels on the input images, which improves the depth estimation accuracy.
arXiv Detail & Related papers (2020-03-02T17:47:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.