Leveraging Equivariant Features for Absolute Pose Regression
- URL: http://arxiv.org/abs/2204.02163v1
- Date: Tue, 5 Apr 2022 12:44:20 GMT
- Title: Leveraging Equivariant Features for Absolute Pose Regression
- Authors: Mohamed Adel Musallam, Vincent Gaudilliere, Miguel Ortiz del Castillo,
Kassem Al Ismaeil, Djamila Aouada
- Abstract summary: We show that a translation and rotation equivariant Convolutional Neural Network directly induces representations of camera motions into the feature space.
We then show that this geometric property allows for implicitly augmenting the training data under a whole group of image plane-preserving transformations.
- Score: 9.30597356471664
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: While end-to-end approaches have achieved state-of-the-art performance in
many perception tasks, they are not yet able to compete with 3D geometry-based
methods in pose estimation. Moreover, absolute pose regression has been shown
to be more related to image retrieval. As a result, we hypothesize that the
statistical features learned by classical Convolutional Neural Networks do not
carry enough geometric information to reliably solve this inherently geometric
task. In this paper, we demonstrate how a translation and rotation equivariant
Convolutional Neural Network directly induces representations of camera motions
into the feature space. We then show that this geometric property allows for
implicitly augmenting the training data under a whole group of image
plane-preserving transformations. Therefore, we argue that directly learning
equivariant features is preferable than learning data-intensive intermediate
representations. Comprehensive experimental validation demonstrates that our
lightweight model outperforms existing ones on standard datasets.
Related papers
- Equi-GSPR: Equivariant SE(3) Graph Network Model for Sparse Point Cloud Registration [2.814748676983944]
We propose a graph neural network model embedded with a local Spherical Euclidean 3D equivariance property through SE(3) message passing based propagation.
Our model is composed mainly of a descriptor module, equivariant graph layers, match similarity, and the final regression layers.
Experiments conducted on the 3DMatch and KITTI datasets exhibit the compelling and robust performance of our model compared to state-of-the-art approaches.
arXiv Detail & Related papers (2024-10-08T06:48:01Z) - Improving Semantic Correspondence with Viewpoint-Guided Spherical Maps [39.00415825387414]
We propose a new approach for semantic correspondence estimation that supplements discriminative features with 3D understanding via a weak geometric spherical prior.
Compared to more involved 3D pipelines, our model only requires weak viewpoint information, and the simplicity of our spherical representation enables us to inject informative geometric priors into the model during training.
We present results on the challenging SPair-71k dataset, where our approach demonstrates is capable of distinguishing between symmetric views and repeated parts across many object categories.
arXiv Detail & Related papers (2023-12-20T17:35:24Z) - Enhancing Surface Neural Implicits with Curvature-Guided Sampling and Uncertainty-Augmented Representations [37.42624848693373]
We introduce a method that directly digests depth images for the task of high-fidelity 3D reconstruction.
A simple sampling strategy is proposed to generate highly effective training data.
Despite its simplicity, our method outperforms a range of both classical and learning-based baselines.
arXiv Detail & Related papers (2023-06-03T12:23:17Z) - VTAE: Variational Transformer Autoencoder with Manifolds Learning [144.0546653941249]
Deep generative models have demonstrated successful applications in learning non-linear data distributions through a number of latent variables.
The nonlinearity of the generator implies that the latent space shows an unsatisfactory projection of the data space, which results in poor representation learning.
We show that geodesics and accurate computation can substantially improve the performance of deep generative models.
arXiv Detail & Related papers (2023-04-03T13:13:19Z) - Revisiting Transformation Invariant Geometric Deep Learning: Are Initial
Representations All You Need? [80.86819657126041]
We show that transformation-invariant and distance-preserving initial representations are sufficient to achieve transformation invariance.
Specifically, we realize transformation-invariant and distance-preserving initial point representations by modifying multi-dimensional scaling.
We prove that TinvNN can strictly guarantee transformation invariance, being general and flexible enough to be combined with the existing neural networks.
arXiv Detail & Related papers (2021-12-23T03:52:33Z) - Geometry-Contrastive Transformer for Generalized 3D Pose Transfer [95.56457218144983]
The intuition of this work is to perceive the geometric inconsistency between the given meshes with the powerful self-attention mechanism.
We propose a novel geometry-contrastive Transformer that has an efficient 3D structured perceiving ability to the global geometric inconsistencies.
We present a latent isometric regularization module together with a novel semi-synthesized dataset for the cross-dataset 3D pose transfer task.
arXiv Detail & Related papers (2021-12-14T13:14:24Z) - Deformation Robust Roto-Scale-Translation Equivariant CNNs [10.44236628142169]
Group-equivariant convolutional neural networks (G-CNNs) achieve significantly improved generalization performance with intrinsic symmetry.
General theory and practical implementation of G-CNNs have been studied for planar images under either rotation or scaling transformation.
arXiv Detail & Related papers (2021-11-22T03:58:24Z) - NeuroMorph: Unsupervised Shape Interpolation and Correspondence in One
Go [109.88509362837475]
We present NeuroMorph, a new neural network architecture that takes as input two 3D shapes.
NeuroMorph produces smooth and point-to-point correspondences between them.
It works well for a large variety of input shapes, including non-isometric pairs from different object categories.
arXiv Detail & Related papers (2021-06-17T12:25:44Z) - Adjoint Rigid Transform Network: Task-conditioned Alignment of 3D Shapes [86.2129580231191]
Adjoint Rigid Transform (ART) Network is a neural module which can be integrated with a variety of 3D networks.
ART learns to rotate input shapes to a learned canonical orientation, which is crucial for a lot of tasks.
We will release our code and pre-trained models for further research.
arXiv Detail & Related papers (2021-02-01T20:58:45Z) - Convolutional Occupancy Networks [88.48287716452002]
We propose Convolutional Occupancy Networks, a more flexible implicit representation for detailed reconstruction of objects and 3D scenes.
By combining convolutional encoders with implicit occupancy decoders, our model incorporates inductive biases, enabling structured reasoning in 3D space.
We empirically find that our method enables the fine-grained implicit 3D reconstruction of single objects, scales to large indoor scenes, and generalizes well from synthetic to real data.
arXiv Detail & Related papers (2020-03-10T10:17:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.