RoMeO: Robust Metric Visual Odometry
- URL: http://arxiv.org/abs/2412.11530v2
- Date: Thu, 19 Dec 2024 06:32:22 GMT
- Title: RoMeO: Robust Metric Visual Odometry
- Authors: Junda Cheng, Zhipeng Cai, Zhaoxing Zhang, Wei Yin, Matthias Muller, Michael Paulitsch, Xin Yang,
- Abstract summary: Visual odometry (VO) aims to estimate camera poses from visual inputs -- a fundamental building block for many applications such as VR/AR and robotics.
Existing approaches lack robustness under this challenging scenario and fail to generalize to unseen data (especially outdoors)
We propose Robust Metric Visual Odometry (RoMeO), a novel method that resolves these issues leveraging priors from pre-trained depth models.
- Score: 11.381243799745729
- License:
- Abstract: Visual odometry (VO) aims to estimate camera poses from visual inputs -- a fundamental building block for many applications such as VR/AR and robotics. This work focuses on monocular RGB VO where the input is a monocular RGB video without IMU or 3D sensors. Existing approaches lack robustness under this challenging scenario and fail to generalize to unseen data (especially outdoors); they also cannot recover metric-scale poses. We propose Robust Metric Visual Odometry (RoMeO), a novel method that resolves these issues leveraging priors from pre-trained depth models. RoMeO incorporates both monocular metric depth and multi-view stereo (MVS) models to recover metric-scale, simplify correspondence search, provide better initialization and regularize optimization. Effective strategies are proposed to inject noise during training and adaptively filter noisy depth priors, which ensure the robustness of RoMeO on in-the-wild data. As shown in Fig.1, RoMeO advances the state-of-the-art (SOTA) by a large margin across 6 diverse datasets covering both indoor and outdoor scenes. Compared to the current SOTA DPVO, RoMeO reduces the relative (align the trajectory scale with GT) and absolute trajectory errors both by >50%. The performance gain also transfers to the full SLAM pipeline (with global BA & loop closure). Code will be released upon acceptance.
Related papers
- MetricGold: Leveraging Text-To-Image Latent Diffusion Models for Metric Depth Estimation [9.639797094021988]
MetricGold is a novel approach that harnesses generative diffusion model's rich priors to improve metric depth estimation.
Our experiments demonstrate robust generalization across diverse datasets, producing sharper and higher quality metric depth estimates.
arXiv Detail & Related papers (2024-11-16T20:59:01Z) - Gravity-aligned Rotation Averaging with Circular Regression [53.81374943525774]
We introduce a principled approach that integrates gravity direction into the rotation averaging phase of global pipelines.
We achieve state-of-the-art accuracy on four large-scale datasets.
arXiv Detail & Related papers (2024-10-16T17:37:43Z) - ES-PTAM: Event-based Stereo Parallel Tracking and Mapping [11.801511288805225]
Event cameras offer advantages to overcome the limitations of standard cameras.
We propose a novel event-based stereo VO system by combining two ideas.
We evaluate the system on five real-world datasets.
arXiv Detail & Related papers (2024-08-28T07:56:28Z) - RD-VIO: Robust Visual-Inertial Odometry for Mobile Augmented Reality in
Dynamic Environments [55.864869961717424]
It is typically challenging for visual or visual-inertial odometry systems to handle the problems of dynamic scenes and pure rotation.
We design a novel visual-inertial odometry (VIO) system called RD-VIO to handle both of these problems.
arXiv Detail & Related papers (2023-10-23T16:30:39Z) - RGB-based Category-level Object Pose Estimation via Decoupled Metric
Scale Recovery [72.13154206106259]
We propose a novel pipeline that decouples the 6D pose and size estimation to mitigate the influence of imperfect scales on rigid transformations.
Specifically, we leverage a pre-trained monocular estimator to extract local geometric information.
A separate branch is designed to directly recover the metric scale of the object based on category-level statistics.
arXiv Detail & Related papers (2023-09-19T02:20:26Z) - Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image [85.91935485902708]
We show that the key to a zero-shot single-view metric depth model lies in the combination of large-scale data training and resolving the metric ambiguity from various camera models.
We propose a canonical camera space transformation module, which explicitly addresses the ambiguity problems and can be effortlessly plugged into existing monocular models.
Our method enables the accurate recovery of metric 3D structures on randomly collected internet images.
arXiv Detail & Related papers (2023-07-20T16:14:23Z) - Benchmarking Visual-Inertial Deep Multimodal Fusion for Relative Pose
Regression and Odometry-aided Absolute Pose Regression [6.557612703872671]
Visual-inertial localization is a key problem in computer vision and robotics applications such as virtual reality, self-driving cars, and aerial vehicles.
In this work, we conduct a benchmark to evaluate deep multimodal fusion based on pose graph optimization and attention networks.
We show improvements for the APR-RPR task and for the RPR-RPR task for aerial vehicles and handheld devices.
arXiv Detail & Related papers (2022-08-01T15:05:26Z) - DeepRM: Deep Recurrent Matching for 6D Pose Refinement [77.34726150561087]
DeepRM is a novel recurrent network architecture for 6D pose refinement.
The architecture incorporates LSTM units to propagate information through each refinement step.
DeepRM achieves state-of-the-art performance on two widely accepted challenging datasets.
arXiv Detail & Related papers (2022-05-28T16:18:08Z) - RVMDE: Radar Validated Monocular Depth Estimation for Robotics [5.360594929347198]
An innate rigid calibration of binocular vision sensors is crucial for accurate depth estimation.
Alternatively, a monocular camera alleviates the limitation at the expense of accuracy in estimating depth, and the challenge exacerbates in harsh environmental conditions.
This work explores the utility of coarse signals from radar when fused with fine-grained data from a monocular camera for depth estimation in harsh environmental conditions.
arXiv Detail & Related papers (2021-09-11T12:02:29Z) - Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object
Detection [89.66162518035144]
We present a flexible and high-performance framework, named Pyramid R-CNN, for two-stage 3D object detection from point clouds.
We propose a novel second-stage module, named pyramid RoI head, to adaptively learn the features from the sparse points of interest.
Our pyramid RoI head is robust to the sparse and imbalanced circumstances, and can be applied upon various 3D backbones to consistently boost the detection performance.
arXiv Detail & Related papers (2021-09-06T14:17:51Z) - Instant Visual Odometry Initialization for Mobile AR [5.497296425129818]
We present a 6-DoF monocular visual odometry that initializes instantly and without motion parallax.
Our main contribution is a pose estimator that decouples estimating the 5-DoF relative rotation and translation direction.
Our solution is either used as a full odometry or as a preSLAM component of any supported SLAM system.
arXiv Detail & Related papers (2021-07-30T14:25:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.