RoMeO: Robust Metric Visual Odometry
- URL: http://arxiv.org/abs/2412.11530v3
- Date: Tue, 08 Apr 2025 13:16:35 GMT
- Title: RoMeO: Robust Metric Visual Odometry
- Authors: Junda Cheng, Zhipeng Cai, Zhaoxing Zhang, Wei Yin, Matthias Muller, Michael Paulitsch, Xin Yang,
- Abstract summary: Visual odometry (VO) aims to estimate camera poses from visual inputs -- a fundamental building block for many applications such as VR/AR and robotics.<n>Existing approaches lack robustness under this challenging scenario and fail to generalize to unseen data (especially outdoors)<n>We propose Robust Metric Visual Odometry (RoMeO), a novel method that resolves these issues leveraging priors from pre-trained depth models.
- Score: 11.381243799745729
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual odometry (VO) aims to estimate camera poses from visual inputs -- a fundamental building block for many applications such as VR/AR and robotics. This work focuses on monocular RGB VO where the input is a monocular RGB video without IMU or 3D sensors. Existing approaches lack robustness under this challenging scenario and fail to generalize to unseen data (especially outdoors); they also cannot recover metric-scale poses. We propose Robust Metric Visual Odometry (RoMeO), a novel method that resolves these issues leveraging priors from pre-trained depth models. RoMeO incorporates both monocular metric depth and multi-view stereo (MVS) models to recover metric-scale, simplify correspondence search, provide better initialization and regularize optimization. Effective strategies are proposed to inject noise during training and adaptively filter noisy depth priors, which ensure the robustness of RoMeO on in-the-wild data. As shown in Fig.1, RoMeO advances the state-of-the-art (SOTA) by a large margin across 6 diverse datasets covering both indoor and outdoor scenes. Compared to the current SOTA DPVO, RoMeO reduces the relative (align the trajectory scale with GT) and absolute trajectory errors both by >50%. The performance gain also transfers to the full SLAM pipeline (with global BA & loop closure). Code will be released upon acceptance.
Related papers
- Reasoning and Learning a Perceptual Metric for Self-Training of Reflective Objects in Bin-Picking with a Low-cost Camera [10.976379239028455]
Bin-picking of metal objects using low-cost RGB-D cameras often suffers from sparse depth information and reflective surface textures.
We propose a two-stage framework consisting of a metric learning stage and a self-training stage.
Our approach outperforms several state-of-the-art methods on both the ROBI dataset and our newly introduced Self-ROBI dataset.
arXiv Detail & Related papers (2025-03-26T04:03:51Z) - UniDepthV2: Universal Monocular Metric Depth Estimation Made Simpler [62.06785782635153]
We propose a new model, UniDepthV2, capable of reconstructing metric 3D scenes from solely single images across domains.
UniDepthV2 directly predicts metric 3D points from the input image at inference time without any additional information.
Our model exploits a pseudo-spherical output representation, which disentangles the camera and depth representations.
arXiv Detail & Related papers (2025-02-27T14:03:15Z) - Gravity-aligned Rotation Averaging with Circular Regression [53.81374943525774]
We introduce a principled approach that integrates gravity direction into the rotation averaging phase of global pipelines.
We achieve state-of-the-art accuracy on four large-scale datasets.
arXiv Detail & Related papers (2024-10-16T17:37:43Z) - ES-PTAM: Event-based Stereo Parallel Tracking and Mapping [11.801511288805225]
Event cameras offer advantages to overcome the limitations of standard cameras.
We propose a novel event-based stereo VO system by combining two ideas.
We evaluate the system on five real-world datasets.
arXiv Detail & Related papers (2024-08-28T07:56:28Z) - RD-VIO: Robust Visual-Inertial Odometry for Mobile Augmented Reality in
Dynamic Environments [55.864869961717424]
It is typically challenging for visual or visual-inertial odometry systems to handle the problems of dynamic scenes and pure rotation.
We design a novel visual-inertial odometry (VIO) system called RD-VIO to handle both of these problems.
arXiv Detail & Related papers (2023-10-23T16:30:39Z) - RGB-based Category-level Object Pose Estimation via Decoupled Metric
Scale Recovery [72.13154206106259]
We propose a novel pipeline that decouples the 6D pose and size estimation to mitigate the influence of imperfect scales on rigid transformations.
Specifically, we leverage a pre-trained monocular estimator to extract local geometric information.
A separate branch is designed to directly recover the metric scale of the object based on category-level statistics.
arXiv Detail & Related papers (2023-09-19T02:20:26Z) - Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image [85.91935485902708]
We show that the key to a zero-shot single-view metric depth model lies in the combination of large-scale data training and resolving the metric ambiguity from various camera models.
We propose a canonical camera space transformation module, which explicitly addresses the ambiguity problems and can be effortlessly plugged into existing monocular models.
Our method enables the accurate recovery of metric 3D structures on randomly collected internet images.
arXiv Detail & Related papers (2023-07-20T16:14:23Z) - Benchmarking Visual-Inertial Deep Multimodal Fusion for Relative Pose
Regression and Odometry-aided Absolute Pose Regression [6.557612703872671]
Visual-inertial localization is a key problem in computer vision and robotics applications such as virtual reality, self-driving cars, and aerial vehicles.
In this work, we conduct a benchmark to evaluate deep multimodal fusion based on pose graph optimization and attention networks.
We show improvements for the APR-RPR task and for the RPR-RPR task for aerial vehicles and handheld devices.
arXiv Detail & Related papers (2022-08-01T15:05:26Z) - DeepRM: Deep Recurrent Matching for 6D Pose Refinement [77.34726150561087]
DeepRM is a novel recurrent network architecture for 6D pose refinement.
The architecture incorporates LSTM units to propagate information through each refinement step.
DeepRM achieves state-of-the-art performance on two widely accepted challenging datasets.
arXiv Detail & Related papers (2022-05-28T16:18:08Z) - RVMDE: Radar Validated Monocular Depth Estimation for Robotics [5.360594929347198]
An innate rigid calibration of binocular vision sensors is crucial for accurate depth estimation.
Alternatively, a monocular camera alleviates the limitation at the expense of accuracy in estimating depth, and the challenge exacerbates in harsh environmental conditions.
This work explores the utility of coarse signals from radar when fused with fine-grained data from a monocular camera for depth estimation in harsh environmental conditions.
arXiv Detail & Related papers (2021-09-11T12:02:29Z) - Instant Visual Odometry Initialization for Mobile AR [5.497296425129818]
We present a 6-DoF monocular visual odometry that initializes instantly and without motion parallax.
Our main contribution is a pose estimator that decouples estimating the 5-DoF relative rotation and translation direction.
Our solution is either used as a full odometry or as a preSLAM component of any supported SLAM system.
arXiv Detail & Related papers (2021-07-30T14:25:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.