Camera Pose Auto-Encoders for Improving Pose Regression
- URL: http://arxiv.org/abs/2207.05530v1
- Date: Tue, 12 Jul 2022 13:47:36 GMT
- Title: Camera Pose Auto-Encoders for Improving Pose Regression
- Authors: Yoli Shavit and Yosi Keller
- Abstract summary: We introduce Camera Pose Auto-Encoders (PAEs) to encode camera poses using APRs as their teachers.
We show that the resulting latent pose representations can closely reproduce APR performance and demonstrate their effectiveness for related tasks.
We also show that train images can be reconstructed from the learned pose encoding, paving the way for integrating visual information from the train set at a low memory cost.
- Score: 6.700873164609009
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Absolute pose regressor (APR) networks are trained to estimate the pose of
the camera given a captured image. They compute latent image representations
from which the camera position and orientation are regressed. APRs provide a
different tradeoff between localization accuracy, runtime, and memory, compared
to structure-based localization schemes that provide state-of-the-art accuracy.
In this work, we introduce Camera Pose Auto-Encoders (PAEs), multilayer
perceptrons that are trained via a Teacher-Student approach to encode camera
poses using APRs as their teachers. We show that the resulting latent pose
representations can closely reproduce APR performance and demonstrate their
effectiveness for related tasks. Specifically, we propose a light-weight
test-time optimization in which the closest train poses are encoded and used to
refine camera position estimation. This procedure achieves a new
state-of-the-art position accuracy for APRs, on both the CambridgeLandmarks and
7Scenes benchmarks. We also show that train images can be reconstructed from
the learned pose encoding, paving the way for integrating visual information
from the train set at a low memory cost. Our code and pre-trained models are
available at https://github.com/yolish/camera-pose-auto-encoders.
Related papers
- SRPose: Two-view Relative Pose Estimation with Sparse Keypoints [51.49105161103385]
SRPose is a sparse keypoint-based framework for two-view relative pose estimation in camera-to-world and object-to-camera scenarios.
It achieves competitive or superior performance compared to state-of-the-art methods in terms of accuracy and speed.
It is robust to different image sizes and camera intrinsics, and can be deployed with low computing resources.
arXiv Detail & Related papers (2024-07-11T05:46:35Z) - Cameras as Rays: Pose Estimation via Ray Diffusion [54.098613859015856]
Estimating camera poses is a fundamental task for 3D reconstruction and remains challenging given sparsely sampled views.
We propose a distributed representation of camera pose that treats a camera as a bundle of rays.
Our proposed methods, both regression- and diffusion-based, demonstrate state-of-the-art performance on camera pose estimation on CO3D.
arXiv Detail & Related papers (2024-02-22T18:59:56Z) - Learning to Localize in Unseen Scenes with Relative Pose Regressors [5.672132510411465]
Relative pose regressors (RPRs) localize a camera by estimating its relative translation and rotation to a pose-labelled reference.
In practice, however, the performance of RPRs is significantly degraded in unseen scenes.
We implement aggregation with concatenation, projection, and attention operations (Transformers) and learn to regress the relative pose parameters from the resulting latent codes.
Compared to state-of-the-art RPRs, our model is shown to localize significantly better in unseen environments, across both indoor and outdoor benchmarks, while maintaining competitive performance in seen scenes.
arXiv Detail & Related papers (2023-03-05T17:12:50Z) - ImPosIng: Implicit Pose Encoding for Efficient Camera Pose Estimation [2.6808541153140077]
Implicit Pose.
(ImPosing) embeds images and camera poses into a common latent representation with 2 separate neural networks.
By evaluating candidates through the latent space in a hierarchical manner, the camera position and orientation are not directly regressed but refined.
arXiv Detail & Related papers (2022-05-05T13:33:25Z) - Rethinking and Improving Relative Position Encoding for Vision
Transformer [61.559777439200744]
Relative position encoding (RPE) is important for transformer to capture sequence ordering of input tokens.
We propose new relative position encoding methods dedicated to 2D images, called image RPE (iRPE)
arXiv Detail & Related papers (2021-07-29T17:55:10Z) - Visual Camera Re-Localization Using Graph Neural Networks and Relative
Pose Supervision [31.947525258453584]
Visual re-localization means using a single image as input to estimate the camera's location and orientation relative to a pre-recorded environment.
Our proposed method makes few special assumptions, and is fairly lightweight in training and testing.
We validate the effectiveness of our approach on both standard indoor (7-Scenes) and outdoor (Cambridge Landmarks) camera re-localization benchmarks.
arXiv Detail & Related papers (2021-04-06T14:29:03Z) - Learning Neural Representation of Camera Pose with Matrix Representation
of Pose Shift via View Synthesis [105.37072293076767]
How to effectively represent camera pose is an essential problem in 3D computer vision.
We propose an approach to learn neural representations of camera poses and 3D scenes.
We conduct extensive experiments on synthetic and real datasets.
arXiv Detail & Related papers (2021-04-04T00:40:53Z) - Paying Attention to Activation Maps in Camera Pose Regression [4.232614032390374]
Camera pose regression methods apply a single forward pass to the query image to estimate the camera pose.
We propose an attention-based approach for pose regression, where the convolutional activation maps are used as sequential inputs.
Our proposed approach is shown to compare favorably to contemporary pose regressors schemes and achieves state-of-the-art accuracy across multiple benchmarks.
arXiv Detail & Related papers (2021-03-21T20:10:15Z) - Do We Really Need Scene-specific Pose Encoders? [0.0]
Visual pose regression models estimate the camera pose from a query image with a single forward pass.
Current models learn pose encoding from an image using deep convolutional networks which are trained per scene.
We propose that scene-specific pose encoders are not required for pose regression and that encodings trained for visual similarity can be used instead.
arXiv Detail & Related papers (2020-12-22T13:59:52Z) - Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by
Implicitly Unprojecting to 3D [100.93808824091258]
We propose a new end-to-end architecture that directly extracts a bird's-eye-view representation of a scene given image data from an arbitrary number of cameras.
Our approach is to "lift" each image individually into a frustum of features for each camera, then "splat" all frustums into a bird's-eye-view grid.
We show that the representations inferred by our model enable interpretable end-to-end motion planning by "shooting" template trajectories into a bird's-eye-view cost map output by our network.
arXiv Detail & Related papers (2020-08-13T06:29:01Z) - Rethinking the Distribution Gap of Person Re-identification with
Camera-based Batch Normalization [90.9485099181197]
This paper rethinks the working mechanism of conventional ReID approaches.
We force the image data of all cameras to fall onto the same subspace, so that the distribution gap between any camera pair is largely shrunk.
Experiments on a wide range of ReID tasks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-01-23T17:22:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.