Dense Prediction Transformer for Scale Estimation in Monocular Visual
Odometry
- URL: http://arxiv.org/abs/2210.01723v1
- Date: Tue, 4 Oct 2022 16:29:21 GMT
- Title: Dense Prediction Transformer for Scale Estimation in Monocular Visual
Odometry
- Authors: Andr\'e O. Fran\c{c}ani and Marcos R. O. A. Maximo
- Abstract summary: This paper contributes by showing an application of the dense prediction transformer model for scale estimation in monocular visual odometry systems.
Experimental results show that the scale drift problem of monocular systems can be reduced through the accurate estimation of the depth map.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Monocular visual odometry consists of the estimation of the position of an
agent through images of a single camera, and it is applied in autonomous
vehicles, medical robots, and augmented reality. However, monocular systems
suffer from the scale ambiguity problem due to the lack of depth information in
2D frames. This paper contributes by showing an application of the dense
prediction transformer model for scale estimation in monocular visual odometry
systems. Experimental results show that the scale drift problem of monocular
systems can be reduced through the accurate estimation of the depth map by this
model, achieving competitive state-of-the-art performance on a visual odometry
benchmark.
Related papers
- Enhanced Scale-aware Depth Estimation for Monocular Endoscopic Scenes with Geometric Modeling [42.70053750500301]
We propose a novel scale-aware framework that only uses monocular images with geometric modeling for depth estimation.
Specifically, we first propose a multi-resolution depth fusion strategy to enhance the quality of monocular depth estimation.
By coupling scale factors and relative depth estimation, the scale-aware depth of the monocular endoscopic scenes can be estimated.
arXiv Detail & Related papers (2024-08-14T03:18:04Z) - CodedVO: Coded Visual Odometry [11.33375308762075]
We present CodedVO, a novel monocular visual odometry method that overcomes the scale ambiguity problem.
We demonstrate our method in diverse indoor environments and demonstrate its robustness and adaptability.
arXiv Detail & Related papers (2024-07-25T17:54:58Z) - Transformer-based model for monocular visual odometry: a video
understanding approach [0.9790236766474201]
We deal with the monocular visual odometry as a video understanding task to estimate the 6-F camera's pose.
We contribute by presenting the TS-DoVO model based on on-temporal self-attention mechanisms to extract features from clips and estimate the motions in an end-to-end manner.
Our approach achieved competitive state-of-the-art performance compared with geometry-based and deep learning-based methods on the KITTI visual odometry dataset.
arXiv Detail & Related papers (2023-05-10T13:11:23Z) - Improving Monocular Visual Odometry Using Learned Depth [84.05081552443693]
We propose a framework to exploit monocular depth estimation for improving visual odometry (VO)
The core of our framework is a monocular depth estimation module with a strong generalization capability for diverse scenes.
Compared with current learning-based VO methods, our method demonstrates a stronger generalization ability to diverse scenes.
arXiv Detail & Related papers (2022-04-04T06:26:46Z) - Scale-aware direct monocular odometry [4.111899441919165]
We present a framework for direct monocular odometry based on depth prediction from a deep neural network.
Our proposal largely outperforms classic monocular SLAM, being 5 to 9 times more precise, with an accuracy which is closer to that of stereo systems.
arXiv Detail & Related papers (2021-09-21T10:30:15Z) - Single View Metrology in the Wild [94.7005246862618]
We present a novel approach to single view metrology that can recover the absolute scale of a scene represented by 3D heights of objects or camera height above the ground.
Our method relies on data-driven priors learned by a deep network specifically designed to imbibe weakly supervised constraints from the interplay of the unknown camera with 3D entities such as object heights.
We demonstrate state-of-the-art qualitative and quantitative results on several datasets as well as applications including virtual object insertion.
arXiv Detail & Related papers (2020-07-18T22:31:33Z) - Kinematic-Structure-Preserved Representation for Unsupervised 3D Human
Pose Estimation [58.72192168935338]
Generalizability of human pose estimation models developed using supervision on large-scale in-studio datasets remains questionable.
We propose a novel kinematic-structure-preserved unsupervised 3D pose estimation framework, which is not restrained by any paired or unpaired weak supervisions.
Our proposed model employs three consecutive differentiable transformations named as forward-kinematics, camera-projection and spatial-map transformation.
arXiv Detail & Related papers (2020-06-24T23:56:33Z) - Appearance Learning for Image-based Motion Estimation in Tomography [60.980769164955454]
In tomographic imaging, anatomical structures are reconstructed by applying a pseudo-inverse forward model to acquired signals.
Patient motion corrupts the geometry alignment in the reconstruction process resulting in motion artifacts.
We propose an appearance learning approach recognizing the structures of rigid motion independently from the scanned object.
arXiv Detail & Related papers (2020-06-18T09:49:11Z) - Beyond Photometric Consistency: Gradient-based Dissimilarity for
Improving Visual Odometry and Stereo Matching [46.27086269084186]
In this paper, we investigate a new metric for registering images that builds upon the idea of the photometric error.
We integrate both into stereo estimation as well as visual odometry systems and show clear benefits for typical disparity and direct image registration tasks.
arXiv Detail & Related papers (2020-04-08T16:13:25Z) - D3VO: Deep Depth, Deep Pose and Deep Uncertainty for Monocular Visual
Odometry [57.5549733585324]
D3VO is a novel framework for monocular visual odometry that exploits deep networks on three levels -- deep depth, pose and uncertainty estimation.
We first propose a novel self-supervised monocular depth estimation network trained on stereo videos without any external supervision.
We model the photometric uncertainties of pixels on the input images, which improves the depth estimation accuracy.
arXiv Detail & Related papers (2020-03-02T17:47:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.