Camera Motion Estimation from RGB-D-Inertial Scene Flow
- URL: http://arxiv.org/abs/2404.17251v1
- Date: Fri, 26 Apr 2024 08:42:59 GMT
- Title: Camera Motion Estimation from RGB-D-Inertial Scene Flow
- Authors: Samuel Cerezo, Javier Civera,
- Abstract summary: We introduce a novel formulation for camera motion estimation that integrates RGB-D images and inertial data through scene flow.
Our goal is to accurately estimate the camera motion in a rigid 3D environment, along with the state of the inertial measurement unit (IMU)
- Score: 9.192660643226372
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we introduce a novel formulation for camera motion estimation that integrates RGB-D images and inertial data through scene flow. Our goal is to accurately estimate the camera motion in a rigid 3D environment, along with the state of the inertial measurement unit (IMU). Our proposed method offers the flexibility to operate as a multi-frame optimization or to marginalize older data, thus effectively utilizing past measurements. To assess the performance of our method, we conducted evaluations using both synthetic data from the ICL-NUIM dataset and real data sequences from the OpenLORIS-Scene dataset. Our results show that the fusion of these two sensors enhances the accuracy of camera motion estimation when compared to using only visual data.
Related papers
- Motion-prior Contrast Maximization for Dense Continuous-Time Motion Estimation [34.529280562470746]
We introduce a novel self-supervised loss combining the Contrast Maximization framework with a non-linear motion prior in the form of pixel-level trajectories.
Their effectiveness is demonstrated in two scenarios: In dense continuous-time motion estimation, our method improves the zero-shot performance of a synthetically trained model by 29%.
arXiv Detail & Related papers (2024-07-15T15:18:28Z) - MM3DGS SLAM: Multi-modal 3D Gaussian Splatting for SLAM Using Vision, Depth, and Inertial Measurements [59.70107451308687]
We show for the first time that using 3D Gaussians for map representation with unposed camera images and inertial measurements can enable accurate SLAM.
Our method, MM3DGS, addresses the limitations of prior rendering by enabling faster scale awareness, and improved trajectory tracking.
We also release a multi-modal dataset, UT-MM, collected from a mobile robot equipped with a camera and an inertial measurement unit.
arXiv Detail & Related papers (2024-04-01T04:57:41Z) - RPEFlow: Multimodal Fusion of RGB-PointCloud-Event for Joint Optical
Flow and Scene Flow Estimation [43.358140897849616]
In this paper, we incorporate RGB images, Point clouds and Events for joint optical flow and scene flow estimation with our proposed multi-stage multimodal fusion model, RPEFlow.
Experiments on both synthetic and real datasets show that our model outperforms the existing state-of-the-art by a wide margin.
arXiv Detail & Related papers (2023-09-26T17:23:55Z) - RGB-based Category-level Object Pose Estimation via Decoupled Metric
Scale Recovery [72.13154206106259]
We propose a novel pipeline that decouples the 6D pose and size estimation to mitigate the influence of imperfect scales on rigid transformations.
Specifically, we leverage a pre-trained monocular estimator to extract local geometric information.
A separate branch is designed to directly recover the metric scale of the object based on category-level statistics.
arXiv Detail & Related papers (2023-09-19T02:20:26Z) - DynaMoN: Motion-Aware Fast and Robust Camera Localization for Dynamic Neural Radiance Fields [71.94156412354054]
We propose Dynamic Motion-Aware Fast and Robust Camera Localization for Dynamic Neural Radiance Fields (DynaMoN)
DynaMoN handles dynamic content for initial camera pose estimation and statics-focused ray sampling for fast and accurate novel-view synthesis.
We extensively evaluate our approach on two real-world dynamic datasets, the TUM RGB-D dataset and the BONN RGB-D Dynamic dataset.
arXiv Detail & Related papers (2023-09-16T08:46:59Z) - Event Camera-based Visual Odometry for Dynamic Motion Tracking of a
Legged Robot Using Adaptive Time Surface [5.341864681049579]
Event cameras offer high temporal resolution and dynamic range, which can eliminate the issue of blurred RGB images during fast movements.
We introduce an adaptive time surface (ATS) method that addresses the whiteout and blackout issue in conventional time surfaces.
Lastly, we propose a nonlinear pose optimization formula that simultaneously performs 3D-2D alignment on both RGB-based and event-based maps and images.
arXiv Detail & Related papers (2023-05-15T19:03:45Z) - ParticleSfM: Exploiting Dense Point Trajectories for Localizing Moving
Cameras in the Wild [57.37891682117178]
We present a robust dense indirect structure-from-motion method for videos that is based on dense correspondence from pairwise optical flow.
A novel neural network architecture is proposed for processing irregular point trajectory data.
Experiments on MPI Sintel dataset show that our system produces significantly more accurate camera trajectories.
arXiv Detail & Related papers (2022-07-19T09:19:45Z) - Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object
Detection [58.81316192862618]
Two critical sensors for 3D perception in autonomous driving are the camera and the LiDAR.
fusing these two modalities can significantly boost the performance of 3D perception models.
We benchmark the state-of-the-art fusion methods for the first time.
arXiv Detail & Related papers (2022-05-30T09:35:37Z) - Asynchronous Optimisation for Event-based Visual Odometry [53.59879499700895]
Event cameras open up new possibilities for robotic perception due to their low latency and high dynamic range.
We focus on event-based visual odometry (VO)
We propose an asynchronous structure-from-motion optimisation back-end.
arXiv Detail & Related papers (2022-03-02T11:28:47Z) - Multi-Modal Fusion for Sensorimotor Coordination in Steering Angle
Prediction [8.707695512525717]
Imitation learning is employed to learn sensorimotor coordination for steering angle prediction in an end-to-end fashion.
This work explores the fusion of frame-based RGB and event data for learning end-to-end lateral control.
We propose DRFuser, a novel convolutional encoder-decoder architecture for learning end-to-end lateral control.
arXiv Detail & Related papers (2022-02-11T08:22:36Z) - DVIO: Depth aided visual inertial odometry for RGBD sensors [7.745106319694523]
This paper presents a new visual inertial odometry (VIO) system, which uses measurements from a RGBD sensor and an inertial measurement unit (IMU) sensor for estimating the motion state of the mobile device.
The resulting system is called the depth-aided VIO (DVIO) system.
arXiv Detail & Related papers (2021-10-20T22:12:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.