RAUM-VO: Rotational Adjusted Unsupervised Monocular Visual Odometry
- URL: http://arxiv.org/abs/2203.07162v1
- Date: Mon, 14 Mar 2022 15:03:24 GMT
- Title: RAUM-VO: Rotational Adjusted Unsupervised Monocular Visual Odometry
- Authors: Claudio Cimarelli, Hriday Bavle, Jose Luis Sanchez-Lopez, Holger Voos
- Abstract summary: We present RAUM-VO, an approach based on a model-free epipolar constraint for frame-to-frame motion estimation.
RAUM-VO shows a considerable accuracy improvement compared to other unsupervised pose networks on the KITTI dataset.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Unsupervised learning for monocular camera motion and 3D scene understanding
has gained popularity over traditional methods, relying on epipolar geometry or
non-linear optimization. Notably, deep learning can overcome many issues of
monocular vision, such as perceptual aliasing, low-textured areas, scale-drift,
and degenerate motions. Also, concerning supervised learning, we can fully
leverage video streams data without the need for depth or motion labels.
However, in this work, we note that rotational motion can limit the accuracy of
the unsupervised pose networks more than the translational component.
Therefore, we present RAUM-VO, an approach based on a model-free epipolar
constraint for frame-to-frame motion estimation (F2F) to adjust the rotation
during training and online inference. To this end, we match 2D keypoints
between consecutive frames using pre-trained deep networks, Superpoint and
Superglue, while training a network for depth and pose estimation using an
unsupervised training protocol. Then, we adjust the predicted rotation with the
motion estimated by F2F using the 2D matches and initializing the solver with
the pose network prediction. Ultimately, RAUM-VO shows a considerable accuracy
improvement compared to other unsupervised pose networks on the KITTI dataset
while reducing the complexity of other hybrid or traditional approaches and
achieving comparable state-of-the-art results.
Related papers
- ALOcc: Adaptive Lifting-based 3D Semantic Occupancy and Cost Volume-based Flow Prediction [89.89610257714006]
Existing methods prioritize higher accuracy to cater to the demands of these tasks.
We introduce a series of targeted improvements for 3D semantic occupancy prediction and flow estimation.
Our purelytemporalal architecture framework, named ALOcc, achieves an optimal tradeoff between speed and accuracy.
arXiv Detail & Related papers (2024-11-12T11:32:56Z) - SCIPaD: Incorporating Spatial Clues into Unsupervised Pose-Depth Joint Learning [17.99904937160487]
We introduce SCIPaD, a novel approach that incorporates spatial clues for unsupervised depth-pose joint learning.
SCIPaD achieves a reduction of 22.2% in average translation error and 34.8% in average angular error for camera pose estimation task on the KITTI Odometry dataset.
arXiv Detail & Related papers (2024-07-07T06:52:51Z) - Learning to Estimate Single-View Volumetric Flow Motions without 3D
Supervision [0.0]
We show that it is possible to train the corresponding networks without requiring any 3D ground truth for training.
In the absence of ground truth data we can train our model with observations from real-world capture setups instead of relying on synthetic reconstructions.
arXiv Detail & Related papers (2023-02-28T10:26:02Z) - Dyna-DepthFormer: Multi-frame Transformer for Self-Supervised Depth
Estimation in Dynamic Scenes [19.810725397641406]
We propose a novel Dyna-Depthformer framework, which predicts scene depth and 3D motion field jointly.
Our contributions are two-fold. First, we leverage multi-view correlation through a series of self- and cross-attention layers in order to obtain enhanced depth feature representation.
Second, we propose a warping-based Motion Network to estimate the motion field of dynamic objects without using semantic prior.
arXiv Detail & Related papers (2023-01-14T09:43:23Z) - Homography Decomposition Networks for Planar Object Tracking [11.558401177707312]
Planar object tracking plays an important role in AI applications, such as robotics, visual servoing, and visual SLAM.
We propose a novel Homography Decomposition Networks(HDN) approach that drastically reduces and stabilizes the condition number by decomposing the homography transformation into two groups.
arXiv Detail & Related papers (2021-12-15T06:13:32Z) - Unsupervised Scale-consistent Depth Learning from Video [131.3074342883371]
We propose a monocular depth estimator SC-Depth, which requires only unlabelled videos for training.
Thanks to the capability of scale-consistent prediction, we show that our monocular-trained deep networks are readily integrated into the ORB-SLAM2 system.
The proposed hybrid Pseudo-RGBD SLAM shows compelling results in KITTI, and it generalizes well to the KAIST dataset without additional training.
arXiv Detail & Related papers (2021-05-25T02:17:56Z) - Self-Supervised Multi-Frame Monocular Scene Flow [61.588808225321735]
We introduce a multi-frame monocular scene flow network based on self-supervised learning.
We observe state-of-the-art accuracy among monocular scene flow methods based on self-supervised learning.
arXiv Detail & Related papers (2021-05-05T17:49:55Z) - Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks [87.50632573601283]
We present a novel method for multi-view depth estimation from a single video.
Our method achieves temporally coherent depth estimation results by using a novel Epipolar Spatio-Temporal (EST) transformer.
To reduce the computational cost, inspired by recent Mixture-of-Experts models, we design a compact hybrid network.
arXiv Detail & Related papers (2020-11-26T04:04:21Z) - Learning Monocular Visual Odometry via Self-Supervised Long-Term
Modeling [106.15327903038705]
Monocular visual odometry (VO) suffers severely from error accumulation during frame-to-frame pose estimation.
We present a self-supervised learning method for VO with special consideration for consistency over longer sequences.
We train the networks with purely self-supervised losses, including a cycle consistency loss that mimics the loop closure module in geometric VO.
arXiv Detail & Related papers (2020-07-21T17:59:01Z) - Auto-Rectify Network for Unsupervised Indoor Depth Estimation [119.82412041164372]
We establish that the complex ego-motions exhibited in handheld settings are a critical obstacle for learning depth.
We propose a data pre-processing method that rectifies training images by removing their relative rotations for effective learning.
Our results outperform the previous unsupervised SOTA method by a large margin on the challenging NYUv2 dataset.
arXiv Detail & Related papers (2020-06-04T08:59:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.