Transformer Guided Geometry Model for Flow-Based Unsupervised Visual
Odometry
- URL: http://arxiv.org/abs/2101.02143v1
- Date: Tue, 8 Dec 2020 19:39:26 GMT
- Title: Transformer Guided Geometry Model for Flow-Based Unsupervised Visual
Odometry
- Authors: Xiangyu Li and Yonghong Hou and Pichao Wang and Zhimin Gao and
Mingliang Xu and Wanqing Li
- Abstract summary: We propose a method consisting of two camera pose estimators that deal with the information from pairwise images.
For image sequences, a Transformer-like structure is adopted to build a geometry model over a local temporal window.
A Flow-to-Flow Pose Estimator (F2FPE) is proposed to exploit the relationship between pairwise images.
- Score: 38.20137500372927
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing unsupervised visual odometry (VO) methods either match pairwise
images or integrate the temporal information using recurrent neural networks
over a long sequence of images. They are either not accurate, time-consuming in
training or error accumulative. In this paper, we propose a method consisting
of two camera pose estimators that deal with the information from pairwise
images and a short sequence of images respectively. For image sequences, a
Transformer-like structure is adopted to build a geometry model over a local
temporal window, referred to as Transformer-based Auxiliary Pose Estimator
(TAPE). Meanwhile, a Flow-to-Flow Pose Estimator (F2FPE) is proposed to exploit
the relationship between pairwise images. The two estimators are constrained
through a simple yet effective consistency loss in training. Empirical
evaluation has shown that the proposed method outperforms the state-of-the-art
unsupervised learning-based methods by a large margin and performs comparably
to supervised and traditional ones on the KITTI and Malaga dataset.
Related papers
- Transformer-based Clipped Contrastive Quantization Learning for
Unsupervised Image Retrieval [15.982022297570108]
Unsupervised image retrieval aims to learn the important visual characteristics without any given level to retrieve the similar images for a given query image.
In this paper, we propose a TransClippedCLR model by encoding the global context of an image using Transformer having local context through patch based processing.
Results using the proposed clipped contrastive learning are greatly improved on all datasets as compared to same backbone network with vanilla contrastive learning.
arXiv Detail & Related papers (2024-01-27T09:39:11Z) - Forgery-aware Adaptive Transformer for Generalizable Synthetic Image
Detection [106.39544368711427]
We study the problem of generalizable synthetic image detection, aiming to detect forgery images from diverse generative methods.
We present a novel forgery-aware adaptive transformer approach, namely FatFormer.
Our approach tuned on 4-class ProGAN data attains an average of 98% accuracy to unseen GANs, and surprisingly generalizes to unseen diffusion models with 95% accuracy.
arXiv Detail & Related papers (2023-12-27T17:36:32Z) - Generative Modeling in Sinogram Domain for Sparse-view CT Reconstruction [12.932897771104825]
radiation dose in computed tomography (CT) examinations can be significantly reduced by intuitively decreasing the number of projection views.
Previous deep learning techniques with sparse-view data require sparse-view/full-view CT image pairs to train the network with supervised manners.
We present a fully unsupervised score-based generative model in sinogram domain for sparse-view CT reconstruction.
arXiv Detail & Related papers (2022-11-25T06:49:18Z) - Paired Image-to-Image Translation Quality Assessment Using Multi-Method
Fusion [0.0]
This paper proposes a novel approach that combines signals of image quality between paired source and transformation to predict the latter's similarity with a hypothetical ground truth.
We trained a Multi-Method Fusion (MMF) model via an ensemble of gradient-boosted regressors to predict Deep Image Structure and Texture Similarity (DISTS)
Analysis revealed the task to be feature-constrained, introducing a trade-off at inference between metric time and prediction accuracy.
arXiv Detail & Related papers (2022-05-09T11:05:15Z) - A Hierarchical Transformation-Discriminating Generative Model for Few
Shot Anomaly Detection [93.38607559281601]
We devise a hierarchical generative model that captures the multi-scale patch distribution of each training image.
The anomaly score is obtained by aggregating the patch-based votes of the correct transformation across scales and image regions.
arXiv Detail & Related papers (2021-04-29T17:49:48Z) - Deep Online Correction for Monocular Visual Odometry [23.124372375670887]
We propose a novel deep online correction (DOC) framework for monocular visual odometry.
depth maps and initial poses are obtained from convolutional neural networks (CNNs) trained in self-supervised manners.
Our method achieves outstanding performance with relative transform error (RTE) = 2.0% on KITTI Odometry benchmark for Seq. 09.
arXiv Detail & Related papers (2021-03-18T05:55:51Z) - LEARN++: Recurrent Dual-Domain Reconstruction Network for Compressed
Sensing CT [17.168584459606272]
The LEARN++ model integrates two parallel and interactiveworks to perform image restoration and sinogram inpainting operations on both the image and projection domains simultaneously.
Results show that the proposed LEARN++ model achieves competitive qualitative and quantitative results compared to several state-of-the-art methods in terms of both artifact reduction and detail preservation.
arXiv Detail & Related papers (2020-12-13T07:00:50Z) - Deep Variational Network Toward Blind Image Restoration [60.45350399661175]
Blind image restoration is a common yet challenging problem in computer vision.
We propose a novel blind image restoration method, aiming to integrate both the advantages of them.
Experiments on two typical blind IR tasks, namely image denoising and super-resolution, demonstrate that the proposed method achieves superior performance over current state-of-the-arts.
arXiv Detail & Related papers (2020-08-25T03:30:53Z) - MetricUNet: Synergistic Image- and Voxel-Level Learning for Precise CT
Prostate Segmentation via Online Sampling [66.01558025094333]
We propose a two-stage framework, with the first stage to quickly localize the prostate region and the second stage to precisely segment the prostate.
We introduce a novel online metric learning module through voxel-wise sampling in the multi-task network.
Our method can effectively learn more representative voxel-level features compared with the conventional learning methods with cross-entropy or Dice loss.
arXiv Detail & Related papers (2020-05-15T10:37:02Z) - Learning Deformable Image Registration from Optimization: Perspective,
Modules, Bilevel Training and Beyond [62.730497582218284]
We develop a new deep learning based framework to optimize a diffeomorphic model via multi-scale propagation.
We conduct two groups of image registration experiments on 3D volume datasets including image-to-atlas registration on brain MRI data and image-to-image registration on liver CT data.
arXiv Detail & Related papers (2020-04-30T03:23:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.