Improving the generalization of network based relative pose regression:
dimension reduction as a regularizer
- URL: http://arxiv.org/abs/2010.12796v1
- Date: Sat, 24 Oct 2020 06:20:46 GMT
- Title: Improving the generalization of network based relative pose regression:
dimension reduction as a regularizer
- Authors: Xiaqing Ding, Yue Wang, Li Tang, Yanmei Jiao and Rong Xiong
- Abstract summary: State-of-the-art visual localization methods perform pose estimation using geometry based solver within the RANSAC framework.
End-to-end learning based regression networks provide a solution to circumvent the requirement for precise pixel-level correspondences.
In this paper, we explicitly add a learnable matching layer within the network to isolate the pose regression solver from the absolute image feature values.
We implement this dimension regularization strategy within a two-layer pyramid based framework to regress the localization results from coarse to fine.
- Score: 16.63174637692875
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual localization occupies an important position in many areas such as
Augmented Reality, robotics and 3D reconstruction. The state-of-the-art visual
localization methods perform pose estimation using geometry based solver within
the RANSAC framework. However, these methods require accurate pixel-level
matching at high image resolution, which is hard to satisfy under significant
changes from appearance, dynamics or perspective of view. End-to-end learning
based regression networks provide a solution to circumvent the requirement for
precise pixel-level correspondences, but demonstrate poor performance towards
cross-scene generalization. In this paper, we explicitly add a learnable
matching layer within the network to isolate the pose regression solver from
the absolute image feature values, and apply dimension regularization on both
the correlation feature channel and the image scale to further improve
performance towards generalization and large viewpoint change. We implement
this dimension regularization strategy within a two-layer pyramid based
framework to regress the localization results from coarse to fine. In addition,
the depth information is fused for absolute translational scale recovery.
Through experiments on real world RGBD datasets we validate the effectiveness
of our design in terms of improving both generalization performance and
robustness towards viewpoint change, and also show the potential of regression
based visual localization networks towards challenging occasions that are
difficult for geometry based visual localization methods.
Related papers
- Space-Variant Total Variation boosted by learning techniques in few-view tomographic imaging [0.0]
This paper focuses on the development of a space-variant regularization model for solving an under-determined linear inverse problem.
The primary objective of the proposed model is to achieve a good balance between denoising and the preservation of fine details and edges.
A convolutional neural network is designed, to approximate both the ground truth image and its gradient using an elastic loss function in its training.
arXiv Detail & Related papers (2024-04-25T08:58:41Z) - Learning Dual-Level Deformable Implicit Representation for Real-World Scale Arbitrary Super-Resolution [81.74583887661794]
We build a new real-world super-resolution benchmark with both integer and non-integer scaling factors.
We propose a Dual-level Deformable Implicit Representation (DDIR) to solve real-world scale arbitrary super-resolution.
Our trained model achieves state-of-the-art performance on the RealArbiSR and RealSR benchmarks for real-world scale arbitrary super-resolution.
arXiv Detail & Related papers (2024-03-16T13:44:42Z) - Recursive Generalization Transformer for Image Super-Resolution [108.67898547357127]
We propose the Recursive Generalization Transformer (RGT) for image SR, which can capture global spatial information and is suitable for high-resolution images.
We combine the RG-SA with local self-attention to enhance the exploitation of the global context.
Our RGT outperforms recent state-of-the-art methods quantitatively and qualitatively.
arXiv Detail & Related papers (2023-03-11T10:44:44Z) - Deep Generalized Unfolding Networks for Image Restoration [16.943609020362395]
We propose a Deep Generalized Unfolding Network (DGUNet) for image restoration.
We integrate a gradient estimation strategy into the gradient descent step of the Proximal Gradient Descent (PGD) algorithm.
Our method is superior in terms of state-of-the-art performance, interpretability, and generalizability.
arXiv Detail & Related papers (2022-04-28T08:39:39Z) - Poseur: Direct Human Pose Regression with Transformers [119.79232258661995]
We propose a direct, regression-based approach to 2D human pose estimation from single images.
Our framework is end-to-end differentiable, and naturally learns to exploit the dependencies between keypoints.
Ours is the first regression-based approach to perform favorably compared to the best heatmap-based pose estimation methods.
arXiv Detail & Related papers (2022-01-19T04:31:57Z) - Dual-Flow Transformation Network for Deformable Image Registration with
Region Consistency Constraint [95.30864269428808]
Current deep learning (DL)-based image registration approaches learn the spatial transformation from one image to another by leveraging a convolutional neural network.
We present a novel dual-flow transformation network with region consistency constraint which maximizes the similarity of ROIs within a pair of images.
Experiments on four public 3D MRI datasets show that the proposed method achieves the best registration performance in accuracy and generalization.
arXiv Detail & Related papers (2021-12-04T05:30:44Z) - Spatially-Adaptive Image Restoration using Distortion-Guided Networks [51.89245800461537]
We present a learning-based solution for restoring images suffering from spatially-varying degradations.
We propose SPAIR, a network design that harnesses distortion-localization information and dynamically adjusts to difficult regions in the image.
arXiv Detail & Related papers (2021-08-19T11:02:25Z) - Deep Amended Gradient Descent for Efficient Spectral Reconstruction from
Single RGB Images [42.26124628784883]
We propose a compact, efficient, and end-to-end learning-based framework, namely AGD-Net.
We first formulate the problem explicitly based on the classic gradient descent algorithm.
AGD-Net can improve the reconstruction quality by more than 1.0 dB on average.
arXiv Detail & Related papers (2021-08-12T05:54:09Z) - Cross-view Geo-localization with Evolving Transformer [7.5800316275498645]
Cross-view geo-localization is challenging due to drastic appearance and geometry differences across views.
We devise a novel geo-localization Transformer (EgoTR) that utilizes the properties of self-attention in Transformer to model global dependencies.
Our EgoTR performs favorably against state-of-the-art methods on standard, fine-grained and cross-dataset cross-view geo-localization tasks.
arXiv Detail & Related papers (2021-07-02T05:33:14Z) - Unsupervised Metric Relocalization Using Transform Consistency Loss [66.19479868638925]
Training networks to perform metric relocalization traditionally requires accurate image correspondences.
We propose a self-supervised solution, which exploits a key insight: localizing a query image within a map should yield the same absolute pose, regardless of the reference image used for registration.
We evaluate our framework on synthetic and real-world data, showing our approach outperforms other supervised methods when a limited amount of ground-truth information is available.
arXiv Detail & Related papers (2020-11-01T19:24:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.